159|ALTERYX – Parsing Patterns with Regex

BYU Student Author: @Sterling_Lane
Reviewers: @Marta_Ellsworth, @Andrew_Wilson
Estimated Time to Solve: 40 Minutes

This is an intro challenge that is part of the Alteryx Learning Path.

We provide the solution to this challenge using:

  • Alteryx

Need a program? Click here.

Overview
Welcome to your data analyst internship at Simple Screenprinting! At Simple Screenprinting, we prioritize having comprehensive and accurate data to uncover insights that keep us ahead in the screenprinting market. Your task is to support our marketing team in achieving its strategic goals.

Jenny, our Director of Marketing, is concerned about Simple Screenprinting’s branding efforts and seeks assistance in optimizing her team’s spending. She has granted you access to the company’s order data and authorized the use of Alteryx to address her inquiries, recognizing the potential of Alteryx’s parsing tools to help in your analysis. Your objective is to create an Alteryx workflow that addresses all her questions, enabling you to discuss the findings with Jenny.

Instructions

  1. One of our partners, Harmons Grocery, has undergone a branding update, transitioning to the new name of Harmony Grocery. To uphold our commitment to precision and brand integrity, it’s imperative that we reflect this change across all touchpoints, including our orders database. In our system, job entries for Harmons Grocery are denoted as “HARMONS #xx,” with xx indicating the store location. Help us ensure our branding alignment by updating the Job Names for each Harmons location.

    • Some Harmons JobNames have some apostrophes at the end of the name (e.g., HARMON’S). Use the Replace Output method on the Regex tool to take out apostrophes in the Harmons name if they exist.
    • Use another Regex tool to replace all instances of “HARMONS #xx” (where x = a number) with “HARMONY #xx”. Assume the store number has at least one number but could have more.
  2. We are starting to sell a lot more prints on items that come in multiple sizes. We want to expand our consumer base outside of Utah to increase our brand presence. Our Item SKU numbers have the following format, which should help you in locating these for us:

    • SKUs beginning with a letter are stored in Utah
    • SKUs beginning with a number are stored outside of Utah
    • SKUs with a dash (‘-’) denote that the item has multiple sizes

    Based on this information, prepare a report showing me which items are sold outside of Utah that have multiple sizes.

    • From the Items table, use the Match Output method on the Regex tool to identify which SKUs are stored in Utah
    • Use a second Regex tool to identify which SKUs represent items with multiple sizes
    • Use a filter tool to include only the items that fit Jenny’s criteria
    • Use a select tool to only exclude the columns not in the Items table.
  3. We want to further expand our consumer base to focus more on college age kids in Utah who enjoy BYU football, since we feel that segment of the market has untapped potential. As such, we want to do A/B testing on BYU students to see which prints are more impactful to this demographic. Provide me a report showing BYU gear we have screen-printed in the past which was sold during the football season (September – December) so we can get an idea of what we have done in the past.

    • Use the Parse Output method on the JobName column to create a new column called “School” which will show the word “BYU” if the word exists in any Job Name.
    • Use a filter tool to only include rows which have BYU items
    • Use a DateTime Parse tool to create a new column called MonthSold which contains the month from the OrderDate.
    • Use another filter tool to only include jobs that were ordered during the months Jenny specified.

Data Files

Suggestions and Hints
  • Alteryx has built in examples of each use of the Regex Tool. Click on the Regex tool in the tool palette and click Open Example to view examples of how each Regex output method works.
  • For the Replace output method, don’t forget about how Alteryx handles Regex grouping. When defining a pattern, you can group characters in parentheses () and call those groups in the Replacement Text as follows: $1 (group 1) $2 (group 2).
  • If a column contained a word followed by a number and I wanted to return the word, followed by a comma and a space, and then the number, I would format my Replacement Text as $1,\s$2. See the image below for a visual representation of this.
    Challenge159_Image
  • For the Match output method, remember that the tool is looking for cells which match the entire pattern, not just a portion of the pattern.
  • For the Parse output method, remember to use the same grouping strategy as the replace tool. Each ‘group’ represents a new column of data that will be generated.
  • When specifying the format in the DateTime parse tool, you will need to use a custom format to just get the month number. Looking at the suggestions given in the DateTime tool may help you see what custom format will give you the month number.
  • Keep in mind the month number will be output as a string by default. This may be important for your filter on the months.

Solution

Great beginner challenge. I was reminded of the usefulness of RegEx and the syntax of that in Alteryx. After reviewing your solution, it looks like I took a little bit longer route and made my RegEx a little more explicit. Overall a great challenge!

This was a good challege to help me practice my RegEx. I learned some new things I could do with it that I hadn’t before.

image