SWAG 1|DUE 19APR23 – Sequence Checks in AIS and Auditing

It was fun applying my technical knowledge to real-life control that we will likely see in the workplace. I used Alteryx to complete this challenge.

General Process
I ended up tweaking small things for all three tabs but the basic format for each one was: Start off by finding the duplicates using a unique tool. Following it was a multi-row formula that subtracted the previous field from the current field. This generated a new field that would be “-1” if no row was skipped. Filtering that out gave me a list of skipped numbers+1. I then created a basic formula that subtracted the tab number to generate a list of the actual numbers that were skipped.

Unique things for each tab
Tab 1- This turned out to be pretty basic and used the format described above.
Tab 2- While finding duplicates and missing values I needed to filter out nulls. Additionally, I used a sample tool to take off the first record as, due to the limitations of my method, the first record was simply the first record of the data.
Tab 3- I needed to change the data type as well as sort the data to be ascending like the other data.

Results
Tab 1- Duplicates- None
Missing values- 3048, 4023, 12649, 36386

Tab 2- Duplicates- 4489, 13071, 39789
Missing Values- 10958, 11174, 41345, 45080

Tab 3- Duplicates- 368255, 541002, 541002, 66870, 66870
Missing Values- 65895, 517922

I’ll share my flow below as well.

2 Likes

For my solution, I used Excel. Adjacent to the given list, I created my own list of the numbers in the desired range, using a IF formula that stopped once the maximum number was reached. Then, I used a VLOOKUP in the next column to determine if the numbers I calculated existed in the given field. You can check which numbers do not exist by looking at the Pivot Table, or by applying the filter and searching for “#N/A” results. I used this process on all three sheets. Below you can find my code for the IF statement, as well as the code for my VLOOKUP. Also, my screenshots are attached, since I am experiencing diffiuclties preventing me from attaching them.

=IF(B2+1>$G$6,B2+1)

=IF(VLOOKUP(B2,$A$2:$A$38860,1,FALSE)=B2,“Match”,“No Match”)

@Margaret_Christ

2 Likes

I did this challenge in Excel. I first started by sorting each tab in ascending order so that my equations would correlate. I then created a four-part “if” statement to output whether a number was a duplicate, missing, out of the range, or correct. I then found which number was the “incident” by doing another “if” statement to output the number. I found results of multiple incidents in each tab. These are my results:

  1. Missing Values 3048
    Missing Values 4023
    Missing Values 12649
    Missing Values 36386

2). Duplicate 4489
Missing Values 10958
Missing Values 11174
Duplicate 13071
Duplicate 39789
Missing Values 41345
Missing Values 45080

3). Duplicate 66870
Duplicate 66870
Duplicate 368255
Missing Values 517922
Duplicate 541002
Duplicate 541002

@Margaret_Christ

1 Like

these were the missing values that i found
Missing Values 3048
Missing Values 4023
Missing Values 12649
Missing Values 36386

1 Like

This challenge was awesome! It helped me to practice sorting data so well!

I used Excel to sort data and find outliers within the list of order #'s.

I began by using an IF( formula using rows (=IF(ROW(A2)=A2+1,“True”,“False”)) but it was a pretty time consuming process as the formula had to be changed every time there was an error in the data (A2+1 → A2 + 2) depending on each error. I switched to a vertical formula afterwards (Thanks @Jonathan_Weston) and it was much faster.

I then filtered all the False responses, which then gave me the faulty data. I then arranged the false order numbers into a table on a separate tab and added another field into the table sorting the false order numbers by type of data fault (missing number, duplicate, etc.) I was then able to create a PivotTable based on the table data.

3 Likes

I used Excel in terms of solving which checks were duplicates/non-sequential. For all tabs, I inserted a new column to determine whether or not checks were sequential using an IF function. Next, I determined the number of checks that were missing using another if function based on the Check column that I created. Next I determined if any checks were by figuring out if the number of checks columns missing gave a negative number. Finally, I input another if function to yield the actual check number that was missing and the check numbers that were duplicates.

@Margaret_Christ

For this challenge, I used Excel to locate out-of-sequence numbers. Once I had the file opened, I sorted each tab so that numbers were in ascending order. Next, in cell B3 I wrote an ‘IF’ statement that checked to see if the number in cell A3 was equal to the cell ‘above’ (i.e. A2) +1. The formula returned “In Sequence” or “Out of Sequence” depending on the result of the test. I then created a PIVOT table for each tab that summarized the numbers that were out of sequence:

@Margaret_Christ

I chose to do my sequence check in Excel. I found the following problem numbers in each workbook.

  1. I added a row of numbers in Column A to give each line a unique number.
  2. I added an “IF Statement” in Column C to decide if the check numbers were missing or a duplicate. I had to subtract the number from the one before it to decide if it was missing or duplicated. If I subtracted the numbers and they were 0, I knew they were duplicates. If I subtracted the numbers and they were anything except 1, I knew they were missing a check (or multiple check) number(s).
  3. I concatenated my Column A with Column C to give each “Missing” or “Duplicated” their own unique identifier.
  4. I ran an “If Statement” to find out if there were more than one “Missing” Checks for each “Missing” result I found in #2.
  5. I ran a “Unique Formula” over my concatenated formula to find all of the unique Duplicates or Missings.
  6. I used an X-Lookup to add either Duplicate or Missing next to my unique formulas.
  7. I used an X-Lookup to add how many missing checks came from each Missing Result.
  8. I Subtracted my Amount Of Missing Checks from my Unique Formula to find out which check number started the line of missing checks.

My file was too large to input onto this comment, so here is a screenshot of Tab 3 to show all of the additional lines I added. If you would like to see my finished sheet with all of the formulas, feel free to reach out to me at kke41499@byu.edu.

2 Likes

I tried 3 software programs: Alteryx, SQL on Access, and then VBA. It took me less time on Alteryx than it did for SQL and for VBA.

SQL: I unfortunately just gave up. I did not know how to create a record sequence that would help me come up with the missing values in the given dataset.

Pros: It was awesome to get a review of the structure and syntax of SQL.

Cons: time-consuming, and there were many constraints (I personally noticed) that required more intermediate to advanced knowledge. I tried to use the internet for help on some of the syntax structures, but it was very confusing. I enjoyed it though.

VBA: I couldn’t finish; however, I did solve some of the missing values from my code. If you have time, I was wondering if any of you would be able to review my VBA code and give me some advice. I was able to do a “for loop” which went through the data set with the Min and Max. However, I’m not an expert on VBA, but I know that this practice really refreshed my memory on VBA.

Pros: It refreshed my memory on VBA code and how to set variables, etc.

Cons: It was super slow and time-consuming, I suggest not using this for large data sets

Alteryx: It took me less time and I was able to solve the puzzle and problem. It was fun and it didn’t feel like a hassle. I was able to use containers to speed up the process and become more efficient.

Pros: Handles large datasets, you can cut down hours on hand and complete rigorous problems using these large data sets

Cons: Data manipulation requires critical thinking, but I’ve noticed once I get pass that, everything else is just peaches.

Answers:

Missing Values (Tab1)

  • 3048
  • 4023
  • 12649
  • 36386

Missing Values (Tab2)

  • 10,958
  • 11,171
  • 11,172
  • 11,173
  • 11,174
  • 41,344
  • 41,345
  • 45,080

Missing Values (Tab3)

  • 517,920
  • 517,921
  • 517,922

Exceeds Limit (Tab3)

  • 541,003
  • 541,004

**Duplicates (Tab2)

  • 4,489
  • 13,071
  • 39,789

Duplicates (Tab3)

  • 66,870
  • 368,255
  • 541,002

Alteryx (.yxmd)

Swag 1.yxmd (97.1 KB)

VBA CODE
SWAG1_VBA.xlsx (7.2 MB)

Sub Tab1()

Dim Column As Range
Dim outrange As Range
Dim i As Long
Dim n As Long
Dim found As Range

Set Column = Application.InputBox(prompt:="Select column", Type:=8)
Set outrange = Application.InputBox(prompt:="Output column", Type:=8)

n = Application.Min(Column)

For i = 1 To Application.Max(Column) - n + 1
    Set found = Column.Find(n, LookIn:=xlValues, LookAt:=xlWhole)
    If found Is Nothing Then
        outrange.Value = n
        Set outrange = outrange.Offset(1)
    End If
    n = n + 1
Next i

End Sub

Thanks guys!

  • Luke
10 Likes

To complete this challenge, I decided to use Excel. I first sorted the data and then checked to see if there was a difference between the current number and the following number. If there was a difference other than 1, I then checked to see if the difference was 0 in which the number was a duplicate, was 2 in which there was a missing number, or >2 in which there were multiple numbers missing. I then also checked to see if there were any extra numbers listed that were below or above the given range of numbers. I put a filter on the data but also summarized my findings to the side.

Tab 1: I had missing numbers of 3048, 4023, 12649, and 36386.

Tab 2: I had duplicates of 4489, 13071, and 39789 and missing numbers of 10958, 45080, 11171, 11172, 11173, 11174, 41344, and 41345.

Tab 3: I had duplicates of 66870, 66870, 368255, 541002, and 541002 and missing numbers of 65895, 517920, 517921, and 517922. I also found extra numbers of 541003 and 541004 that were above the given range.

SWAG_SequenceChecks

@Margaret_Christ

@Margaret_Christ SWAG Weekly Challenge Submission

I used Excel to complete this challenge. I sorted the data on each tab to ensure the values were in the correct order then I applied the following formula to each tabe to identify missing numbers, duplicates, and other possible errors.

=IF(A3-A2=1,“Ok”,IF(A3-A2=0,“Duplicate”,IF(A3-A2>1,“Missing”,“#N/A”)))

The only numbers that were assigned #N/A were the last numbers in each tab, so these did not indicate any issues in the sequence. I created a filter on each ta to more easily identify missing and duplicate values. With missing values, I examined numbers assigned with a “Missing” title to determine if there were any additional bordering values that were also missing. The following image shows the duplicates and missing values I found in each tab

D26D96DC-F008-4A0D-AB00-9899387497D9_4_5005_c

Tab 1- Missing: 3048, 4023, 12649, 36386
Duplicates: N/A

Tab 2- Missing: 10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080
Duplicates: 4489,13071, 39789

Tab 3- Missing: 517920, 517921, 517922
Duplicates: 66870, 66870, 638255, 541002, 541002

@Margaret_Christ

2 Likes

I have attached my solution! I used excel, and uploaded my solution to google drive. AudreySmith SWAG.xlsx - Google Sheets
@Margaret_Christ

I used excel to solve this! For each tab I completed the following steps:

  1. filter the numbers from low to high
  2. created an excel formula starting on row 3 IF(A3=A2+1,“Normal”,“Issue”) and i copied that formula all the way down the list
  3. i created a pivot table to show just the numbers with an issue

For tab one i found the following issues: 3049, 4024, 12650, 36387
For tab two i found the following issues: 4489, 10959, 11175, 13071, 39789, 41346, 45081
For tab three i found the following issues: 66870, 368255, 517923, 541002

The excel file was too big to upload, so here is a link to a google sheets SWAG1_SequenceChecks.xlsx - Google Sheets
@Margaret_Christ

This was a great challenge and I used Excel to complete it!

For each tab I followed the same general steps:

  1. First, I made sure all the checks were in ascending order.
  2. Next, I used an IF function to find if there was a lapse in the sequence. I used the IF statement and told it to take each check number and subtract the preceding check number. If it equaled 1, I said there was no error in the sequence order. If it did not equal 1, I said there was an error in the sequencing order.
  3. I then filtered the results to show where there were missing checks or duplicates

I have attached a screenshot of my results below because my excel file is too large!

@Margaret_Christ

I used Excel to solve this challenge. First, I sorted the sequences in numerical order. I used the following if statement to do the sequence check: =IF(A3-A2=1,“”,IF(A3-A2=0,“Duplicate”,IF(A3-A2>2,“Missing Value(s)”,“Missing”))).

  1. The numbers missing in Tab 1: 3047, 4022, 12648, and 36385.
  2. The numbers missing in Tab 2: 10957, 11170, 41343, 45079. Duplicate numbers in the sequence are 4489, 13071, and 39789.
  3. The numbers missing in Tab 3: 517919. There are two duplicates of 66870, one duplicate of 368225, and two duplicates of 541002.

@Margaret_Christ

Fun challenge to work on! I used R Studio to complete it (attaching code below):

After reading the files into the software, I made sure to arrange the data in ascending order.
I created a new sequence which was to run from the lower bound to the upper bound of the range that was provided in the instructions. This was then used to compare which values from the original list did not exist in the “expected list”, essentially tackling the missing values requirement.
I then used the unique identifier function to highlight any values that were duplicates in the original list.
Finally, I checked to see if any of the values from the original list were outside the range provided in the instructions by checking for values that were less than or greater than the boundaries.

Tab 1:
Missing = 3048, 4023, 12649, 36386
Duplicates = None
Out of Range = None

Tab 2:
Missing = 10958, 11171, 11172, 11173, 11174, 41344, 41345, 45080
Duplicates = 4489, 13071, 39789
Out of Range = None

Tab 3:
Missing = 517920, 517921, 517922
Duplicates = 66870 (2), 368255, 541002 (2)
Out of Range = 541003, 541004

@Margaret_Christ



2 Likes

For this challenge, I decided to use Excel, as that is the program that I am most confident using. For tabs 1 and 2, I created a new column called “Difference” and input a formula that subtracted the current value from the previous value. Numbers that had no deviations should return the number 1 for this formula. I then added a filter on the column and found which numbers deviated from the renumbering sequence. For the third tab, I first had to put the renumbering in sequential order, and then applied the same process as the first two tabs. For each tab, here are the number that I found that violated the sequential renumbering:

Tab 1: 3047, 4022, 12648, 36385
Tab 2: 4489, 10957, 11170, 13071, 39789, 41343, 45079
Tab 3: 66870, 368255, 517919, 541002

These numbers are problematic because they either skip a number or are a duplicate. I have attached a picture of my Excel file below, due to the fact that the actual file size was too big.

Screen Shot 2023-04-18 at 1.41.22 PM

Screen Shot 2023-04-18 at 1.41.40 PM

Screen Shot 2023-04-18 at 1.42.42 PM

@Margaret_Christ

1 Like

This was a challenge to fully automate. I used Excel VBA as I wanted to increase my proficiency with that software.

My code queries the sequence range, input range, and output range from the user. It then checks the input range for any missing, duplicate, or out or range values and outputs those numbers to the output range selected by the user.

I can verify that the code works from running it on a smaller sample set. I was able to get the following results from Tab 1 and Tab 2. Tab 3 contained too many values for my computer to iterate.

[spoiler]
Tab 1
image
Tab 2
image
Tab 3
Crashed after several minutes
VBA code

'
' Sequence_Check Macro
' Applies a sequence check on the range given by the user
'

'
    Dim seqStart, seqEnd, countZ, currentValue, outputColumn, countC, countJ As Double
    Dim seqRange, outputRange, itRange As Range
    Dim uniqueRange As Variant
    
    
    seqStart = Application.InputBox(Prompt:="Input first number in sequence", Title:="Sequence Begin", Type:=1)
    seqEnd = Application.InputBox(Prompt:="Input last number in sequence", Title:="Sequence End", Type:=1)
    Set seqRange = Application.InputBox(Prompt:="Select range to validate", Title:="Sequence Check", Type:=8)
    Set outputRange = Application.InputBox(Prompt:="Select output range", Title:="Select cells", Type:=8)
    

    
    countJ = 1
    outputColumn = 2
    countC = 2
    outputRange.Cells(1, outputColumn).Value = "Duplicate Values"
    uniqueRange = WorksheetFunction.Unique(seqRange)
    
    For countZ = LBound(uniqueRange) To UBound(uniqueRange)
    
        If WorksheetFunction.CountIfs(seqRange, uniqueRange(countZ, 1)) > 1 Then
            outputRange.Cells(countC, outputColumn).Value = uniqueRange(countZ, 1)
            countC = countC + 1
        End If
        
        countJ = countJ + 1
    Next countZ

    outputColumn = 3
    countC = 2
    outputRange.Cells(1, outputColumn).Value = "Out of Range Values"

    For Each itRange In seqRange
        currentValue = itRange.Value
        If currentValue > seqEnd Or currentValue < seqStart Then
            outputRange.Cells(countC, outputColumn).Value = currentValue
            countC = countC + 1
        End If
        
        countJ = countJ + 1
    Next
    
    outputColumn = 1
    countC = 2
    outputRange.Cells(1, outputColumn).Value = "Missing Values"
    
    For countZ = seqStart To seqEnd
        
        Set currentValue = seqRange.Find(countZ, LookIn:=xlValues, LookAt:=xlWhole)
        If currentValue Is Nothing Then
            outputRange.Cells(countC, outputColumn).Value = countZ
            countC = countC + 1
        End If
        
    Next countZ
    

   
End Sub```
/spoiler
3 Likes


@Margaret_Christ

3 Likes