Resources for Lab 3

Practice 1

  • Download the employees.xlsx.
  • Create an R Project where to analyse the employees.xlsx dataset and move the xlsx file within the project directory.
  • Get the path to the data using the here() function.
  • Import the dataset in R with the read_xlsx() function.
  • How many observations are collected in the dataset? Use the dim() function.

  • What’s the average salary in the company? Use either the function mean()or summary()

  • Use filter() and arrange() to check who are the three top earners in the I.T department.

  • Use group_by(), summarise() and median() to get the median salary in the I.T. department.

Using geom_point(), construct a scatterplot for salary and seniority, colored by department. Store it in my_plot. Try adding a theme to the plot, for example:

my_plot + theme_minimal()

What happens?

  • Use labs() to modify the labels of the plot. For example:
my_plot + theme_minimal() + labs(x="Seniority (years)", y="Salary", color="Department:")
  • Use facet_grid() to create a multi-window plot
my_plot + facet_grid(~department)
  • Export your plot with the function ggsave()
my_great_plot <- my_plot + theme_minimal() + labs(x="Seniority (years)", y="Salary", color="Department:")+ facet_grid(~department)

ggsave(my_great_plot, filename = here("great_plot.jpeg"))
  • Use group_by() and summarise() to compute the average and total expenditure for salaries by department
  • How many employees have just been hired in the company?

  • How many employees have just been hired in the I.T. department?

  • Compute the average salary of employees from the Finance department with at least 8 years seniority.

Create a summary table where you report:

  • the median salary by department (using the function median()),

  • the number of employees by department (using the function n()),

  • the number of new hired by department.

Save it as an excel file via the write_xlsx() function.

Given the summary table in the previous exercise, reproduce the following bar plot of the median salary by department (hint: use geom_col()).

You have to create an Excel file where employees from different department are collected in different sheets.

In R, you can create multi-sheets Excel files by feeding write_xlsx() with a list of datasets. Assume you stored the original employees.xlsxdataset in the dataframe dt. Now, try running

my_departments <- split(dt, dt$department)
  • What does the split()function do? Run ?split() to read the documentation.

  • What kind of object is stored in my_departments? Run typeof(my_departments) to investigate.

  • Feed write_xlsx() with my_departmentsto create a multi-sheet Excel file.

Practice 2

  • The HR department sent you the employees file from a different company.
  • Download the employees_company2.xlsx and import it in R.
  • As you can see, the file has a structure that is very similar to the previous database. Nevertheless, it presents two critical issues:
    • Column names do not follow a consistent notation. Sometimes they are capitalized, sometimes they are not. Some times they includes spaces, sometimes they do not.
    • There are missing values

Use the function clean_names() from the janitor package to tidy up column names automatically

  • How many missing values are present in the salary column? Use the summary() function.
  • Replace missing values with the median salary of department the missing value belongs to.