232|ALTERYX – Normalization Station

Kyle_Nilsen · June 12, 2024, 6:36pm

BYU Student Author: @Kyle_Nilsen
Reviewers: @Carter_Lee, @Dalling_Gardner
Estimated Time to Solve: 40 Minutes

We provide the solution to this challenge using:

Alteryx

Need a program? Click here.

Overview
You are considering applying for a high-achieving accounting PhD program. To ensure your success, you decide to perform some preliminary investigation into what research topics and methodologies have the highest concentration of publications in the professional world. You stumble on a flat file that includes much of this information, however, the data is not normalized. Because of this, it will be inefficient to navigate the data within your Access database. To facilitate your research, you decide to take the time to normalize the data within Alteryx by creating the necessary tables and relationships. Your goal is ultimately to produce a recreation of the flat file with updated IDs for each significant component.

Instructions

Download the Authors_Data.xlsx and Challenge232_NormalizationStation.yxmd files below.
Open the Alteryx workflow and upload the correct file path to the input tool (this is the Authors_Data file you just downloaded).
Once this file is input, run the flow to load the data into the environment. Once here, take a moment to explore and analyze the data. You will notice there are empty, labeled containers in the flow. These are the “tables” you will create for the normalized relationships. You might find it helpful to draw out the tables by hand before trying to design them in Alteryx.
Create third normal form tables for Method, Topic, Journal, and University. After doing this, update the flat file with the new IDs.
Create second normal form tables for Article and Author. After doing this, update the flat file with the new IDs.
Create second normal form tables for ArticleMethod, ArticleTopic, and AuthorArticle. These tables are all concatenated primary keys, so you will not give them unique IDs or use them to update the flat file.
Output the normalized, updated dataset to a csv file called “PublishingsNormalized.csv”

Data Files

Suggestions and Hints

In order to build out the normalization effectively, you need to understand the relationships between each of the columns in the flat file and assign only what is necessary to each table (hence, first, second, third normal form). If you have not done any normalization before, you can read about it here Database Normalization – Normal Forms 1nf 2nf 3nf Table Examples
The first (and most useful) step for each table is to first determine what elements from the flat file are relevant and using a select tool to filter out the irrelevant columns. Then, each table follows a similar pattern of finding unique values and assigning them a unique ID.
The University table is one of the exceptions to the above. Authors have both a “PhdUnivName” and a “WorkUnivName.” Do one select tool for each, then use a union tool, and then you can find the unique values before assigning each a unique ID.
Use a join tool to merge the IDs back into the flat file’s data. You should use multiple in succession before moving on to the next section (i.e., between steps 4 and 5 in the instructions).
This is the relationship diagram I created for the tables:

Solution

Challenge232_Solution.yxmd
Solution Video: Challenge 232|ALTERYX – Normalization Station

Solution Images

Finlay_Lofthouse · March 31, 2025, 4:19pm

Time: 40 minutes
Difficulty: Intermediate
This was a great challenge. The hardest part was getting started, but once I got going it wasn’t too difficult to figure out each step after that.

Henry_Chen · April 1, 2025, 2:27am

Level: Challenging

Jacob_Critchfield · April 2, 2025, 11:17pm

Time: 60
Difficulty: Difficult

Andrew_Spiers · April 3, 2025, 3:07pm

Time to complete 1 hour
Diffiulcuty Very Hard

Christian_Townsend · April 3, 2025, 4:48pm

Took me an hour and a half. The hardest one I have done yet, if you are thinking about doing this one run.

Abigail_Woodbury · April 3, 2025, 5:39pm

Time to complete: 60 minutes
Difficulty: Intermediate
Solution:

Scott_Johnston · April 3, 2025, 5:42pm

Time: 60 min
Difficulty: Difficult
Solution:

Topic		Replies	Views
121\|ALTERYX – Data Flow Development Challenges beginner , alteryx , advanced-filtering , charts , corporate-accounting	0	129	May 13, 2023
86\|ALTERYX – Smarty’s School Supplies Challenges excel , beginner , alteryx , audit , database , output	98	1029	April 3, 2025
52\|ALTERYX – Balance Sheet Boost Challenges beginner , alteryx , audit	55	1199	April 3, 2025
129\|ALTERYX – Passion for Payables Challenges alteryx , intermediate	9	188	April 3, 2025
160\|ALTERYX – The Formula to Success Challenges alteryx , intro , data-manipulation , formula-tool	70	408	April 18, 2025

232|ALTERYX – Normalization Station

Related topics