BYU Student Author: @Kylie_Larsen17 @Klayton
Reviewers: @Saul_Esplin, @Kaden_Sandberg @Nate, @MitchFrei, @summers
Estimated Time to Solve: 20 Minutes
We provide the solution to this challenge using:
- Excel
- Python
Need a program? Click here.
Overview
Your boss, the Chief Human Resource Officer, has asked you to provide a list of employees for a client. Before you share the data with the client, however, she has asked you to transform some of the sensitive information to protect the employees. The client is eager for the data, and doing this task manually would take too long for their liking. With the right Python script, you should be able to impress both your boss and make the client happy.
Instructions
Below is a list of requirements from your boss:
-
The client wants a separate column for both first and last names. Use Python to split the names into the First Name and Last Name columns in columns F and G respectively.
-
The client also wants to only show the birth year of each employee. Extract the birth year from the BirthDate column into a new Birth Year column (column H).
-
The client doesn’t think each employee’s SSN should be visible. Replace the first five digits of each SSN with an “x” while keeping the final four digits, and store the result in the SSN Last 4 column (column I).
-
Finally, from the HireDate column, parse the date into three different columns: Hire Day, Hire Month, and Hire Year.
Write a Python script that reads the provided Excel file, performs the transformations, and outputs the cleaned data into the existing Excel file.
Data Files
Suggestions and Hints
-
Pandas provides powerful tools like .apply(), .str.split(), and .astype(str) to efficiently clean and transform data.
-
A lambda function is useful for applying simple transformations, such as extracting parts of a string or modifying values within a column.
-
When working with Excel files, Pandas may automatically change data types. Use dtype=str in read_excel() to preserve formatting and avoid unexpected conversions.
Solution