Resources for Lab 3

Practice 1

Download the employees.xlsx.
Create an R Project where to analyse the employees.xlsx dataset and move the xlsx file within the project directory.
Get the path to the data using the here() function.
Import the dataset in R with the read_xlsx() function.

How many observations are collected in the dataset? Use the dim() function.
What’s the average salary in the company? Use either the function mean()or summary()

Use filter() and arrange() to check who are the three top earners in the I.T department.
Use group_by(), summarise() and median() to get the median salary in the I.T. department.

Using geom_point(), construct a scatterplot for salary and seniority, colored by department. Store it in my_plot. Try adding a theme to the plot, for example:

my_plot + theme_minimal()

What happens?

Use labs() to modify the labels of the plot. For example:

my_plot + theme_minimal() + labs(x="Seniority (years)", y="Salary", color="Department:")

Use facet_grid() to create a multi-window plot

my_plot + facet_grid(~department)

Export your plot with the function ggsave()

my_great_plot <- my_plot + theme_minimal() + labs(x="Seniority (years)", y="Salary", color="Department:")+ facet_grid(~department)

ggsave(my_great_plot, filename = here("great_plot.jpeg"))

Use group_by() and summarise() to compute the average and total expenditure for salaries by department

How many employees have just been hired in the company?
How many employees have just been hired in the I.T. department?
Compute the average salary of employees from the Finance department with at least 8 years seniority.

Create a summary table where you report:

the median salary by department (using the function median()),
the number of employees by department (using the function n()),
the number of new hired by department.

Save it as an excel file via the write_xlsx() function.

Given the summary table in the previous exercise, reproduce the following bar plot of the median salary by department (hint: use geom_col()).

You have to create an Excel file where employees from different department are collected in different sheets.

In R, you can create multi-sheets Excel files by feeding write_xlsx() with a list of datasets. Assume you stored the original employees.xlsxdataset in the dataframe dt. Now, try running

my_departments <- split(dt, dt$department)

What does the split()function do? Run ?split() to read the documentation.
What kind of object is stored in my_departments? Run typeof(my_departments) to investigate.
Feed write_xlsx() with my_departmentsto create a multi-sheet Excel file.

Practice 2

The HR department sent you the employees file from a different company.
Download the employees_company2.xlsx and import it in R.
As you can see, the file has a structure that is very similar to the previous database. Nevertheless, it presents two critical issues:
- Column names do not follow a consistent notation. Sometimes they are capitalized, sometimes they are not. Some times they includes spaces, sometimes they do not.
- There are missing values

Use the function clean_names() from the janitor package to tidy up column names automatically

How many missing values are present in the salary column? Use the summary() function.
Replace missing values with the median salary of department the missing value belongs to.