Overview
Congratulations on your internship at the Utah State Department of Commissions! Your boss has been very impressed with you during your first week and has noticed your technological inclination. With your background in accounting and your technological skill, your supervisor has assigned you the task of cleaning data and answering important questions related to budget analysis for Salt Lake City during the upcoming quarter. There is a lot of data available, but you need to clean it up and focus solely on Salt Lake City for the meeting. Don’t worry though! You only have to focus on the city, NAICS, and value of loan columns. A data dictionary is provided next to the excel file.
Instructions
Clean up the data for Salt Lake City to answer some questions for your boss!
In the ‘City’ column, change any variation of “Salt Lake City” to Salt Lake City, ex. NORTH SALT LAKE should become SALT LAKE CITY
Remove any data that does not have a NAICS code or has a blank business type
Using a Pivot Table, answer the following questions:
a. Which types of businesses are receiving the most and least loans in Salt Lake City? (Number of businesses with loans, not monetary value of loans)
b. Which NAICS code receives the most loans? (Number of businesses with loans, not monetary value of loans)
I spent way too much time on this, but I think I found a cool way to solve this. I wanted to make a way to automatically update the spreadsheet as I found more dirty data (i.e., alternatives of “Salt Lake City”). So here is what I did:
Deleted the data that had missing values
Created a lookup table in a different tab.
Added a new column “City2” to the original data.
In the “City2” column, I used an xlookup function to see if the City Name was contained in my lookup table. Turning it into a table allowed the formula to easily update so I wouldn’t have to continually change xlookup cell references if I added more data to my lookup table.
Create my PivotTable…be careful you have to refresh your PivotTable anytime you add more data to your lookup table, or else it doesn’t work.
This was a quick and simple way to replace spelling variations in a table. I really liked it. I ended up making another Pivot table that used the NAICS description column to see which industries were receiving loans, not just the code. Most of the results were not what I expected.
Great challenge!! I made my second pivot table on NAICS description instead, but the result is the same.
To clean the data, I first got rid of the missing data then wrote a formula to fix all the dirty uses of Salt Lake City in a new column called “Cleaned City.” Here is my formula:
=IF(AND(ISERROR(FIND(“SALT”,E2)),ISERROR(FIND(“LAKE”,E2)),ISERROR(FIND(“SLC”,E2))),E2,“SALT LAKE CITY”)
Great challenge! In addition to the tables you created, I wanted to see Which businesses got loans in in the $5 -10 million. It seems that Limited Service Restaurants were the winners as shown in the screenshot below.
This is seriously such a cool challenge. I really like @David_Wood’s idea of having a separate table to add any new name variations to. That’s probably the most robust way to clean the City name; I wish I had thought of it!
The only thing I did differently from anyone was using wildcard characters in the SEARCH() or FIND() functions. I feel like the wildcards aren’t nearly as well known by users at they should be. They’re super helpful–kind of like an SQL-lite–but you sometimes need to be careful that your wildcards aren’t overly inclusive, giving you data you don’t need.
At first I used SEARCH(“SAL? LAKE”,A2) since I noticed there was a typo in some of the entries (“SALE LAKE”). But after, I noticed there was also a “SLC” in the data, and I couldn’t find a way to make the wildcards work for all of the variations of Salt Lake while excluding everything else. Using SEARCH(“S*L*”,A2) is too inclusive and would change cities such as “Kaysville” or “Salina” (any city where an S preceeds an L), and using SEARCH(“S*L*C*”,A2) is too exclusive, not changing “North Salt Lake”, “Salt Lake”, etc.
Moral of the story–Excel wildcards are great, but can only get you so far! They definitely aren’t as robust as RegEx, but can be useful for simple text parsing. I’ve included my final formula that I used to clean the City below. It definitely isn’t as robust as using a separate table and lookups, but it worked for the data presented in this training!
=IF(IFERROR(SEARCH(“SAL? LAKE”,A2),0)>0,“SALT LAKE CITY”,IF(IFERROR(SEARCH(“SLC”,A2),0)>0,“SALT LAKE CITY”,A2))
I was able to make the required changed to the data using a simple table. I added only had to look through the possible values for a few seconds to find everything I thought was a variation of Salt Lake City and was able to change all of the values very easily from there. I also filtered out blanks in the other two fields. From there it was easy to create a PivotTable and find the answers!
I wish I had thought to use wildcard characters or an XLOOKUP like other people did. I ended up using the Find and Replace Feature on the home tab. Had to go through it a few times but it worked nicely!
Cleaned up city data by writing formula =IF(ISNUMBER(FIND(“SALT”, E2)), “SALT LAKE CITY”, E2). Expanded the formula to replace the variations of Salt Lake City.
Found all the blank cells using “go to special” and then blank, then deleted the rows with no NAICS code or business type.
Created pivot tables of the row labels for each business type, the loan amounts, and made a count of the numbers of each loan amount to go into the pivot table. I honestly couldn’t figure out how to get a count of the business types’ loans but I did my best.
Simple and great review of pivot tables! I also learned some valuable methods to clean and filter tables within Excel. After completing the tasks listed in the instructions, I went to watch the solution video and found the “CTRL -” command to be really helpful and much simpler than the methods I employed.
I loved this challenge! It was great being reminded of how effectively Excel cleans data. I got 541110 for question 3b! I had a challenge getting the same answer for the first Pivot Table, but overall I enjoyed this project.