SWAG 1|DUE 19APR23 – Sequence Checks in AIS and Auditing

Challenge Sponsor: This challenge is sponsored by the BYU School of Accountancy
BYU Author: @David_Wood
Reviewers: @summers
Estimated Time to Solve: 20 Minutes

There is no prescribed software for this challenge. Consider using any of the apps listed here. Or choose another app and introduce it to the community.

SWAG Details
We will award 5, $100 Amazon gift cards to challenge winners. To be eligible, you must do all the following:

  • Post a reply to this challenge with a solution. Learn about posting solutions here.
  • Follow TechHub.training on LinkedIn, which will inform you of new challenge postings.
  • Complete both conditions above by midnight, April 19, 2023.

Whoever has the most likes for their post will win one gift card. We will select 4 additional people at random for the gift cards. The winners will be announced on this post and contacted via their LinkedIn email. Please note that we will not be assessing answers for efficiency or accuracy. However, as the posts are connected to your professional profiles, we expect you to exhibit quality in your responses. Please do not copy someone else’s work and submit it as your own, as this will be apparent and reflect poorly on you. That said, we understand that many solutions may be similar. We encourage you to use “likes” to highlight excellent submissions and to help select one of the winners.

Overview
Sequence checks are an important internal control taught in accounting information systems (AIS) courses and a common auditing technique used to verify the accuracy and completeness of data. They involve checking the sequential order of prenumbered transactions, documents, or other records to provide confidence that nothing has been omitted, duplicated, or tampered with. Auditors use sequence checks to detect errors or fraud in a company’s transaction records, as well as to identify potential control weaknesses. Consider the following examples:

  • A company uses prenumbered checks for all payments. An auditor might review the check numbers to see if any checks are missing or if there are duplicate checks.
  • At a restaurant, the server enters a customer’s meal selection into the computer as an order, which is then transmitted to the kitchen. Later, the server deletes the order after delivering the food to the customer (who is a friend). If the system assigns sequential numbers to each order, the auditor can detect any deleted, and therefore unpaid, orders by looking for a missing number in the order sequence.
  • Assuming a company’s sales orders are numbered sequentially, a sequence check can be performed to detect any missing or duplicate sales order numbers. A duplicate sales order number may mean that the system is not configured correctly or there may be an intentional attempt to inflate the number of sales orders.

Instructions
You are to perform a sequence check on each of the three tabs of data included in the data file. Each tab is unique and must be separately evaluated to determine if any numbers in the series are missing, duplicated, or problematic in any other way. You may use any technique or software to perform your sequence check. To be eligible for the drawing, your posted solution must include the following:

  • In your reply, describe what you did to find any problems. Include a description of any software you used. Include all problematic numbers that you discovered and why they were problematic.
  • Upload either a solution file showing how you solved this challenge or the computer code (e.g., VBA or Python) used to solve it. Computer code is best posted using the code tags. You may include an image of your solution if desired.

We realize that visual scanning or similar could be used for this challenge. We hope that you will use something more algorithmic in your solution as algorithms can be easily reproduced on larger datasets and are superior in most auditing settings.

Good luck!

Data Files

  • SWAG1_SequenceChecks.xlsx. There are three tabs of data in this file. The tabs should have one instance of every number between the ranges given below.
    – Tab 1: 1 to 38863
    – Tab 2: 4331 to 45250
    – Tab 3: 65895 to 541002

Solution
The final answer is:

  • Tab 1: Missing: 3048, 4023, 12649, 36386
  • Tab 2: Missing: 10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080 Duplicate: 4489,
    13071, 39789,
  • Tab 3: Missing: 65895, 517920, 517921, 517922; Duplicate: 368255; Triplicate: 66870, 541002;
    Out of Range: 541003, 541004

Congratulations to our winners:

4 Likes

This was a great challenge! I solved it using Excel. For each tab I performed the following steps:

  1. Sort the values in ascending order
  2. View first and last number in list to check if it is within the given ranges.
  3. Enter the following formula into the cells adjacent to the numbers: =IF(A3-1=A2,TRUE,FALSE). If a number is out of sequence or duplicates a prior number, the formula will return FALSE.
  4. Use the Find tool in Excel to review each number that returned False. Rather than just filtering, this method lets me see the number in relation to numbers next to it in the list. This gives me a clear picture of whether the number was out of sequence or is a duplicate.
  5. Filter each tab for FALSE numbers and copy them into a new tab for a clean display.

Here is a screenshot of my solution and the problematic numbers I found:

SWAG 1 Answer

3 Likes

Nice challenge! I decided to use Alteryx to solve it. I thought I had the solution pretty quickly, but while reviewing the results, I realized I was slightly off. Below is a description of my updated solution. You can also download and review my solution with the attached file.

I used the same work flow for all three tabs. First I used a “Select” tool to make sure that the data was the proper type and I renamed the column. Next, I used a “Data Cleansing” tool to remove the NULL rows from the data. I then used the “Sort” tool to put the data in order (this step was not necessary, but helpful for running a visual check at the end). I then used a “Unique” tool to identify any duplicate numbers. Next, I used the “Summarize” tool to pull a max and min value from the unique values. I used “Append Fields” to join these two values into the same row. I used the “Generate Rows” tool to create a new column of numbers starting from the min and going to the max incremented by one. This ensured that I had a complete numbered list encompassing the same range as the list of numbers provided. I then joined the new list of numbers to the unique list of numbers with the “Join” tool. The list of numbers that came out of the left output were the numbers from my complete list that did not find a match with the provided list. These are the missing numbers.

Here is the list of problem numbers that I found:

Tab 1
Missing Numbers: 3048, 4023, 12649, 36386
Duplicate Numbers: None
Tab 2
Missing Numbers: 10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080
Duplicate Numbers: 4489, 13071, 39789
Tab 3
Missing Numbers: 517920, 517921, 517922
Duplicate Numbers: 66870, 66870, 368255, 541002, 541002

SWAG1_TylerBooth.yxzp (5.5 MB)

3 Likes

Thank you for the challenge! I used Alteryx to solve it. Needed to pay attention to details and after figuring those details out, it was a smooth process. Here is my solution:

Solution

Also I realize new users can’t upload attachments so I am going to post the link to the file on my file (you cannot see the preview of it on Google since it’s a .yxmd file, you can download it and run it on Alteryx :slight_smile: )

5 Likes

This was a fun a challenge! I decided to use Alteryx to to complete it.

Tab 1:
I used the unique tool to find any duplicates. Next, I sorted the data into ascending order to make sure my next tool would work correctly. I used a Multi-Tool Formula to make find any cells that were not equal to the the previous cell +1. Afterwards, I filtered out any of the “Errors”. Then I subtracted the leftover numbers by 1 to find the missing numbers.

I also did a double check to make sure I wasn’t missing anything on this one. I did that by making a record number for each number, and then filtered out the numbers that did not match their record numbers. Then I re-used the record number tool and filtered out any numbers that did not match their repsective record number. I did this until the end. It was not the most efficient approach, but it helped me make sure that I didn’t miss anything.

Tab 2:
I started with the unique tool to find any repeats. Then I filtered any non-null cells out.
To find any missing numbers, followed the same steps as I did in Tab 1. However, I then joined the unique tool results and the filtering for “Error” results so that in “Join” I would have the duplicates, and in “Right” whatever was left over was not a duplicate, but had a missing number in the prior cell.

Tab 3: I essentially did the same thing as I did in Tab 2.

I got the following results:

Tab 1:
Missing Numbers: 3048, 4023, 12649, and 36386
Duplicates: None

Tab 2:
Missing Numbers: 10958, 11174, 41345, and 45080
Duplicates: 4489, 39789, 13071

Tab 3:
Missing Numbers: 517922
Duplicates: 66870 (there were 3 of these total), 541002 (there were also 3 of these total), and 368255
@Margaret_Christ

Alteryx YXZP

4 Likes


I decided to just use excel to solve this problem. Here are the steps I took to solve this problem:

  1. I first sorted each data set smallest to largest.
  2. I then applied an IF function to each data set to identify three issues (missing values, duplicate values, and values outside the ranges given). Any values that violated the correct pattern of the data were initially labeled as “False” and the issue was specified in the column beside it.
  3. I then determined the number of duplicates and missing values that corresponded to each value in another column using a COUNTIF function.
  4. To sort through each set of data, I applied a filter to each table of data and selected all of the “False” records.
  5. Duplicates and Out of Range Values were identified for each tab. Missing Values were calculated based on the span of missing numbers. For example, the number of missing values could be calculated as “(A3-A2)-1”. A return of “1” would indicate one missing value which could be calculated as “A3-1”. However, anything greater than 1 would require multiple calculations (a return of “2” would indicate multiple missing values which could be calculated with “A3-2” for the first value and “A3-1” for the second value.

If my VBA coding skills weren’t so rusty I might have attempted that haha. Sorry for my confusing description of steps haha. Let me know if I can clarify anything for you! Thanks for the practice!

3 Likes

Interested to see how others did this. I decided to use Alteryx to do my checking.

For all three tabs, I first used the Unique tool to isolate and remove any duplicates found in the dataset. I then utilized the Multi Row Formula tool, which I’d been doing some practice with. It has some dynamic capabilities to check values against prior rows which I found most helpful. I created a column that would flag all rows where the number value was not sequential. I then filtered each column to isolate all sequence violations and subtracted each number by 1 to identify the missing value. I used a Join Multiple Tool to Summarize my findings. The results are at the end of my post.

Some interesting problems I used Alteryx to overcome:

  • Had to tweak the multi row formula tool to not flag the first row as a violation for tabs 2 and 3 (since the tool assumes the prior row to the first row is 0)
  • When combining my results into a single table, I had to ensure the data type of the numbers was string. This was so I could use the concatenante function in the summary tool for each of my results.

I cannot upload a file as a new user, so here is a screenshot of my workflow instead (including my findings noted in the browse output):

8 Likes

This was a fun little challenge. I have done this in a couple of my graduate classes so I was completely prepared for this!

I decided to use Excel for my solution. For each of the tabs with numbers, I performed the following steps:

[spoiler]

[spoiler]

  1. I sorted the values in ascending order using the Sort tool
  2. I checked to see if the numbers were in order after that
  3. I entered the following formula in the cells adjacent to the numbers: =IF(A3-1=A2,TRUE,FALSE). This formula makes it so if a number is out of sequence or is a duplicate of a previous number, the formula will return FALSE.
  4. I then filtered the column by “FALSE” to isolate all of the FALSE numbers that the formula returned.
  5. I pasted all of the FALSE numbers into a new tab to isolate from the long list of numbers that were in each tab.
  6. I used Ctrl F to find the FALSE numbers in the original tab and labeled in the adjacent cell of if it was a duplicate number or a missing sequence of numbers

Here is a screenshot of the problematic numbers that I found:

(I realized that new users can’t upload attachments so I’ve posted the link to my file so that you can view my solution!)
:slight_smile:

1 Like

Awesome Challenge!

I used Alteryx to get the following answers;

Tab 1:
Missing Numbers: 3048, 4023, 12649, 36386
Numbers Exceeding Upper & Lower Limits: None
Duplicates: None

Tab 2:
Missing Numbers: 10958, 11171,11172, 11173, 11174, 41344, 41345, 45080
Numbers Exceeding Upper & Lower Limits: None
Duplicates: 4489, 13071, 39789

Tab 3:
Missing Numbers: 65895, 517920, 517921, 517922
Numbers Exceeding Upper & Lower Limits: 541003, 541004
Duplicates: 66870, 66870, 368255, 541002, 541002

This process was used for all 3 cases.

Link to My Alteryx Package is below;

@Margaret_Christ

4 Likes

This challenge was great! I used Alteryx to detect the following errors within the three tabs of data:

Tab 1
(4) Missing Values
(0) Duplicates

Missing Numbers: 3048, 4023, 12649 & 36386
Duplicates: N/A

Tab 2
(8) Missing Values
(3) Duplicates

Missing Numbers: 10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080
Duplicate Numbers: 4489, 13071, 39789

Tab 3
(3) Missing Values
(5) Duplicates

Missing Numbers: 517920, 517921, 517922
Duplicate Numbers: 66870 (twice), 368255, 541002 (twice)

I’ve attached a screenshot of my Alteryx workflow below:

@Margaret_Christ

2 Likes

Thank you for this challenge,

The solution I provided was in excel, I sorted in ascending order and simply did, if(A1+1=A2,1,0). This formula allowed me to find duplicates and transactions which were missing by filtering by the 0. I added 1 to the values which were missing in order to get the exact transaction missing. The values that were missing on Tab 1 includes: 3048, 4023, 12649, and 36386. On Tab 2: 4490, 10958, 11171, 13072, 39790, 41344, and 45080. Tab 3 features duplicate values which includes: 66870 and 541002. The missing values include: 368256 and 517920.

The link is posted: Swag Challenege - Google Sheets

@Margaret_Christ

3 Likes

I used Excel to find the following results:

  1. Problem checks: 3049, 4024, 12650, 36387
  2. Problem orders: 4489, 10959, 11175, 13071, 39789, 41346, 45081
  3. Problem sales orders: 66870, 368255, 541002
    I used if statements to find breaks and duplicates and then a pivot table to show the problem numbers.

@margaret_christ

2 Likes

I used Excel to solve this challenge. I attacked this problem using =IF(A3-A2=1,“”,IF(A3-A2=0,“Double”,IF(A3-A2>2,“Missing more than one”,“Missing”))). This is done once the numbers are sorted numerically.

  1. The numbers missing in Tab 1’s sequence are 3048, 4024, 12649, and 36386.
  2. The numbers missing in Tab 2’s sequence are 10958, 11171, 11172, 11173, 11174, 41344, 41345, and 45080. Double numbers in the sequence are 4489, 13071, and 39789.
  3. The numbers missing in Tab 3’s sequence are 517920, 517921, 517921, and 517922. There are two doubles of 66870, one double of 368225, and two doubles of 541002.

@Margaret_Christ

1 Like

I did this in Excel, I used the same formula for each tab. It was an if statement that checked that the current cell is equal to the next cell minus one. The formula was as follows: =IF(A977-1=A976,“”,“SEQUENCE ERROR”). Over the three tabs I found 20 examples of out of sequence numbers.

1 Like

Alex, can you post your answers to be eligible for the drawing?

1 Like

I implemented a solution in python. I used numpy.setdiff1d to find missing numbers, comparing an expected array to the actual values. numpy.unique helped me find duplicates, and I used numpy.isreal to find non-numeric values, which there weren’t any. (This actually breaks if there are non-numeric values, but at least then you’d know that there are some :sweat_smile:)

import numpy as np
import pandas as pd

tabs = [
  { 'name': 'Tab 1', 'expected': np.arange(1, 38863, 1) },
  { 'name': 'Tab 2', 'expected': np.arange(4331, 45250, 1) },
  { 'name': 'Tab 3', 'expected': np.arange(65895, 541002, 1) } 
]

for tab in tabs:
  df = pd.read_excel('data.xlsx', sheet_name=tab['name'])
  print(tab['name'], ':\n')
  print(df.dtypes, '\n')
  actual = df.iloc[:, 0]

  # find missing
  missing = np.setdiff1d(tab['expected'], actual)
  print(f'missing: {missing}')

  # find duplicates
  unique, counts = np.unique(actual, return_counts=True)
  duplicates = unique[counts > 1]
  print(f'duplicates: {duplicates}')

  # find non-numeric
  non_numeric = [a for a in actual if not np.isreal(a)]
  print(f'non-numeric: {non_numeric}', '\n')

Output:

Tab 1 :

Tab 1 List of Numbers    int64
dtype: object 

missing: [ 3048  4023 12649 36386]
duplicates: []
non-numeric: [] 

Tab 2 :

Tab 2 List of Numbers    int64
dtype: object 

missing: [10958 11171 11172 11173 11174 41344 41345 45080]
duplicates: [ 4489 13071 39789]
non-numeric: [] 

Tab 3 :

Tab 3 List of Numbers    int64
dtype: object 

missing: [ 65895 517920 517921 517922]
duplicates: [ 66870 368255 541002]
non-numeric: [] 

5 Likes

My solution is definitely not the prettiest, but it gets the job done. I used Python using the openpyxl library in order to easily work with the Excel doc (assuming that if it were to scale, it would continue to be in Excel docs). The benefit of my program is that with just a little bit of tweaking you could add a lot more tabs of data.

The program outputted:

Tab 1: No duplicates, but skips are: [3048, 4023, 12649, 36386]
Tab 2: Duplicates: [4489, 13071, 39789], and Skips: [4488, 10958, 11174, 13070, 39788, 41345, 45080]
Tab 3: No skips, but duplicated values are: [541002, 66870, 541002, 66870, 368255]

The process was first to import all data from each Excel sheet, then to search for any duplicates. Duplicates where then removed from the list and the list was sorted from smallest to largest. Each number was then checked, and if it was not one greater than the previous, it was flagged and added to a list of skipped numbers.

The code I used is:

from openpyxl import load_workbook

workbook = load_workbook(filename="SWAG1_SequenceChecks.xlsx")

def load_data(tab):
    sheet = workbook[tab]
    data = []
    for x in sheet["A"]:
        if x.value != None:
            data.append(x.value)
    return data
    

def check_dups(info):
    seen = set()
    dups = []
    for i in info:
        if i in seen:
            dups.append(i)
        seen.add(i)
    return dups

correct_tab1 = [x for x in range(1,38864)]
correct_tab2 = [x for x in range(4331,45251)]
correct_tab3 = [x for x in range(65895,541003)]

def return_problems(tab, correct):
    data = load_data(tab)[1:]
    duplicates = check_dups(data)
    for x in data:
        x = int(x)
        if x in duplicates:
            data.remove(x)
    for x in duplicates:
        data.append(x)
    data.sort()
    if len(data) == len(correct):
        return f"No skips, but duplicated values are: {duplicates}"
    else:
        count = 0
        skips = []
        for x in data:
            if count != 0:
                if x != data[count-1]+1:
                    skips.append(x-1)
            count += 1
    if len(duplicates) == 0:
        return f"No duplicates, but skips are: {skips}"
    else:
        return f"Duplicates: {duplicates}, and Skips: {skips}"
    

print("Tab 1: "+ return_problems("Tab 1", correct_tab1))
print("Tab 2: "+ return_problems("Tab 2", correct_tab2))
print("Tab 3: "+ return_problems("Tab 3", correct_tab3))
            
4 Likes

I used Alteryx to solve this challenge.

Tab 1: First, I used a Unique tool to identify any duplicates. I also added a Summarize tool to summarize by minimum and maximum values. I then used a Generate Rows tool to create a loop expression and added the Select tool. Then I joined the Select tool to the right side of a Join tool and the Input Data tool to the left side of the Join tool.

Tab 2: I used the same approach as used for Tab 1.

Tab 3: I used the same approach as used for Tabs 1 and 2, but I also cleansed the data, sorted the numbers in ascending order, and changed the Tab 3 data type from V_String to Double.

Tab 1 Numbers:
3048, 4023, 12649, 36386 - Out of Sequence

Tab 2 Numbers:
4489, 13071, 39789 - Duplicate
10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080 - Out of Sequence

Tab 3 Numbers:
66870, 368255, 541002 - Duplicate
517920, 517921, 517922 - Out of Sequence

SWAG Challenge Workflow

@Margaret_Christ

2 Likes

I used Excel to find the following results:

  1. Problem checks: 3049, 4024, 12650, 36387
  2. Problem orders: 4489, 10959, 11175, 13071, 39789, 41346, 45081
  3. Problem sales orders: 66870, 368255, 541002
    I used if statements to find breaks and duplicates and then a pivot table to show the problem numbers. I also sorted number 3 so that I could use a similar formula to 1 and 2. I did this by determining if the number equaled the previous number plus 1.

@margaret_christ

1 Like

This was a great challenge! I used Excel to complete this task.

-For each tab I went in and made sure the numbers were in ascending order.
-I then added a column that used a simple IF statement that took each number and subtracted it from the number preceding it and ensured that it equaled 1. If it did not equal one then it would return a “Missing” message.
-I then filtered for all of the Missing messages on each tab.

Below I attached a screenshot of the results that show the numbers returned as either duplicates or missing.

@Margaret_Christ

3 Likes