136|ALTERYX – To Spend or Not to Spend

BYU Student Author: @Boston
Reviewers: @Spencer, @Christian
Estimated Time to Solve: 45 Minutes

We provide the solution to this challenge using:

  • Excel
  • Alteryx

Need a program? Click here.

Overview
You have been newly hired as an intern at Rango Incorporated and your boss has approached you with an important task. Your resume boasts adept skills in Excel and Alteryx, so your boss thought you would be perfect to help go through data to answer important questions. However, there are a few problems with the dataset. Within the excel workbook, there are three sheets related to Employee, Trip, and Purpose data. The person who input the employee data used unprofessional capitalization when each row of employee data should appear like: “John Smith, Sales, 000, Miami, Florida.” In the Trip data, the same employee accidentally input the first 68 rows of data into column “H” and did not split up the data into their proper rows or columns. Your boss would like you to organize, upload, and analyze the data within Alteryx to answer three important questions:

  • How many employees spent more than their estimated costs on trips?
  • Is there a correlation between excess spending and the number of complaints for an employee?
  • Which trip purpose had the greatest excess spending?

Instructions

  1. Parse the Trip data using Alteryx into the correct rows and columns.
  2. Clean the Employee data using Alteryx. Your boss does not want to use any rows that have any null or empty values. For example, you can exclude any row of data that has a null employeename, department, employeeid, city, state, or complaints. There are some duplicate employees in the Employee data that need to be combined and their complaints summed together.
  3. Join these tables so that you can do further calculations.
  4. Use Alteryx tools to find out if employees spent more than their estimated costs by creating a new column “difference” that calculates the difference between the average estimated and actual costs.
  5. For question 3, use all available data (regardless of difference amount) to find the total difference for each Purpose ID.
  6. You should finish with two final tables with the following headers:
    Picture2
    Picture1

Data Files

Suggestions and Hints

The Left, Mid, Right functions or delimiter tool may be helpful to parse the Trip data into the correct columns. It may be helpful to remove the whitespace in the Trip data and unwanted columns with the data cleansing tool.

Solution

Check Figures

The “employeename” table should have 9 employees/rows and the “ExpPurposeDesc” table should have 5 rows.

Solution Image

Challenge136_Solution.xlsx
Challenge136_Solution.yxmd
Solution Video: Challenge 136|ALTERYX – To Spend or Not to Spend

This is my solution to this problem. I actually ended with different answers than the solution proposed above. When cleaning the Trip table I found it necessary to parse the columns into their respective rows and then union the top half of the data with the bottom half of the data. By doing so I was able to use data from all of the trips taken throughout the year. By cleaning both halves of the data my numbers came out significantly different.

Overall, I found this challenge very applicable to situations that would be presented in real-world applications. Between cleaning the data and ensuring that all data was accounted for there were many small mistakes that could be made that would produce inaccurate results.

Great challenges!