128|PYTHON – Paycheck Power

BYU Student Author: @Spencer
Reviewers: @Mike_Paulsin, @MitchFrei
Estimated Time to Solve: 50 Minutes

We provide the solution to this challenge using:

  • Python

Need a program? Click here.

Overview
You have recently graduated from a prestigious university with a bachelor’s degree in accounting and are now looking for a job. Although you have enjoyed growing up in Denver your whole life, you are not sure if it is the best place to settle down. To ensure that you can maximize your earnings potential, you have started researching accountant salaries across the United States. Your goal is to adjust each salary according to Denver’s cost of living. By equalizing each salary’s cost of living, you will be able to determine how much real income you would earn in cities across the US.

Instructions
You will be adjusting the salaries of the following cities to Denver’s cost of living:

  • Atlanta, Boston, Dallas, Los Angeles, Philadelphia, Seattle, Washington DC, Chicago, Denver, San Diego, Houston, New York City, San Francisco

You will need to do the following to calculate the adjusted salary:

  • Copy the contents of the “Start_Here.txt” file and paste it into your python platform. This contains the libraries you will need to import, a pandas numbering format, and a dictionary you will need later.
  • Scrape the most recent CPI data from bls.gov for each city.
    • You have been provided with a dictionary (seriesid_dict) that contains the name of each city and its corresponding “series ID”.
    • The URL you will need to for each city is as follows: https://data.bls.gov/pdq/SurveyOutputServlet?data_tool=dropmap&series_id={series id}
    • For example, Atlanta’s series ID is “CUURS35CSA0,CUUSS35CSA0”. Therefore, its URL is “https://data.bls.gov/pdq/SurveyOutputServlet?data_tool=dropmap&series_id=CUURS35CSA0,CUUSS35CSA0”.
    • Scrape the data table of CPI information for each city.
    • Extract the most recent CPI number from this data table. This will require data manipulation within a pandas dataframe. Do not hardcode months. City CPI data is published every other month, and some cities do not start on the same month. For example, Atlanta publishes CPI data for every even month (Feb, Apr, etc.), while Boston publishes CPI data for every odd month (Jan, Mar, etc.). Thus, your code needs to be dynamic enough to extract the most recent data, regardless of the month it corresponds to.
  • Extract salary information for each city from the data excel sheet (this data was also taken from bls.gov).
    • Load this sheet into a pandas data frame.
    • The column we will be extracting data from is called “A_MEAN” (average annual salary). Extract the A_MEAN for each city in our dictionary.
  • Calculate the cost of living-adjusted salary for each city
    • Once you have extracted each city’s CPI and average salary, you will need to use the following formula to calculate each city’s adjusted salary: (denver’s cpi * selected city salary) / selected city cpi
    • This equation will give us each city’s adjusted salary based on Denver’s cost of living.
  • Display each city’s adjusted salary in a data frame and horizontal bar chart
    • Once you have calculated each city’s adjusted salary, create a simple data frame and horizontal bar chart to display your results.
  • If it interests you, explore bls.gov and try adjusting the salary of a new city!

Data Files

Suggestions and Hints
  • Denver’s adjusted salary will remain unchanged since it is already adjusted for the Denver cost of living.
  • Ensure your “Paycheck Power” Excel sheet is in the same folder as your Python file.

Solution

This felt like it was going to be easier than it actually was for me to dust off the Python skills, but I managed to figure it out. Fun challenge with lots of practicality especially when deciding where to start a career! I like to segment my Python problems using Jupyter Notebook so my thinking process is below.

I first defined a function that could extract a city’s CPI. The hardest challenge here was figuring out how to dynamically drop years without any data since that was throwing me off from wanting to drop all columns with null values:

I had to refresh myself a lot on Dataframe actions for this next step. I decided to loop through the dictionary and add all my data to a dataframe. The key thing I learned was that it makes more sense and saves more time to add all your data to a list within the loop and create the Dataframe after the fact.

From there, the rest wasn’t too bad. I created a calculated column in the dataframe to get each City’s Adjusted Salary and then used basic MatPlotLib to plot the data and display the Dataframe.

Here is what my final results looked like (I flipped the X and Y axis labels by mistake…but the results are still fairly easy to see):