154|PYTHON – Fraudster Finder

Trent_Barlow · March 12, 2024, 12:40am

BYU Student Author: @Trent_Barlow
Reviewers: @Kyle_nilsen, @Millie_K_B
Estimated Time to Solve: 20 Minutes

This is an intro challenge that is part of the Python Learning Path.

We provide the solution to this challenge using:

Python

Need a program? Click here.

Overview
As a beginner Python programmer working for an audit firm, you’ve been tasked with checking some employee transactions for fraud. Because it is more engaging, and the client is a visual learner, your boss has asked you to do so using matplolib. You have already written some code to extract the data from the csv file where it is stored, create a master dataframe, as well as split it into three separate dataframes by type of transaction. Open up the Jupyter Notebook file, and get to work on the visualizations!

Instructions

Create a matplolib boxplot visualization using the Amount column of the df.
Now that we can see the outliers, let’s find out what transactions they are. First calculate the upper bound of the boxplot. That can be calculated as the 3rd quartile times 150% of the interquartile range. We have some code below to calculate q3 and the iqr for you. After you have the upper bound, use it to create a new dataframe called bad_apples by filtering df to only those transactions which have an amount above the upper bound. That will give us a dataframe with only the outliers we see above.
Create a variable called manager_counts, then use that variable to create a bar chart that shows the number of transactions approved by each manager. Set the color to #AD1B02.
Make a a new variable called “average_amounts” using “df.groupby()”. Then use that to create a bar graph below which shows the average amount approved by each Manager. Set the color to #86BC25
Filter the df to only those instances in which Manager D was the approver.
At the beginning, we split our data into three dfs: df_credit_card, df_reimbursement, and df_invoice. Make a histogram for each of these to see how well the transactions form around the bell curve. Use 20 bins, ensure the edgecolor is set to white, and color is set to #051141. The first one is done for you.
Make another histogram, with 24 bins, range=[0,24]. I got you started down below by extracting the hour component of Time in the df. Ensure edgecolor is set to #333333, and color is set to #ffe600.

Data Files

Solution

Step by Step Solution

#Create a box plot to visualize outliers in the “Amount” column
plt.boxplot(df[‘Amount’], vert=False)
plt.title(‘Box Plot of Transaction Amounts’)
plt.xlabel(‘Transaction Amount’)
plt.show()
#Calculate the interquartile range (IQR)
q1 = df[‘Amount’].quantile(0.25)
q3 = df[‘Amount’].quantile(0.75)
iqr = q3 - q1

#Calculate the upper bound of outliers (Upper Fence)
upper_bound = q3 + 1.5 * iqr
bad_apples = df[df[‘Amount’]>upper_bound]
bad_apples
#Count the number of transactions approved by each manager
manager_counts = df[‘Approver’].value_counts()

#Create a bar chart to visualize the results
plt.bar(manager_counts.index,manager_counts.values, color= “#AD1B02”)
plt.title(‘Number of Transactions Approved by Each Manager’)
plt.xlabel(‘Manager’)
plt.ylabel(‘Number of Transactions’)
plt.show()
average_amounts = df.groupby(‘Approver’)[‘Amount’].mean()

#Now, let’s plot this information
plt.bar_label(plt.bar(average_amounts.index, average_amounts.values, color = “#86BC25”)) # Adjust padding as needed

plt.title(‘Average Transaction Amount Approved by Each Manager’)
plt.xlabel(‘Manager’)
plt.ylabel(‘Average Transaction Amount’)
plt.show()
Manager_D = df[df[‘Approver’]==‘Manager D’]
#Reimbursement
plt.figure(figsize=(10, 6))
plt.hist(df_reimbursement[‘Amount’], bins=20, color=‘#051141’, edgecolor=‘white’)

plt.title(‘Distribution of Reimbursement Amounts’)
plt.xlabel(‘Amount’)
plt.ylabel(‘Frequency’)
plt.show()

#CC
plt.figure(figsize=(10, 6))
plt.hist(df_credit_card[‘Amount’], bins=20, color=‘#051141’, edgecolor=‘white’)

plt.title(‘Distribution of Credit Card Amounts’)
plt.xlabel(‘Amount’)
plt.ylabel(‘Frequency’)
plt.show()

#Invoice
plt.figure(figsize=(10, 6))
plt.hist(df_credit_card[‘Amount’], bins=20, color=‘#051141’, edgecolor=‘white’)

plt.title(‘Distribution of Invoice Amounts’)
plt.xlabel(‘Amount’)
plt.ylabel(‘Frequency’)
plt.show()

Challenge154_Solution.ipynb
Solution Video: Challenge 154|PYTHON – Fraudster Finder

Topic		Replies	Views
130\|PYTHON – Quick Calculation Challenges beginner , audit , python , pandas	3	114	June 27, 2024
96\|PYTHON – Investment Visualizations Challenges intermediate , python , visualization	2	119	December 14, 2023
65\|PYTHON – Investment Investigator Challenges intermediate , python , finance , pandas , stocks	0	213	May 27, 2023
134\|PYTHON – Monte Python and Valuation Madness Challenges intermediate , python , pandas , valuation , monte-carlo	2	116	January 14, 2024
64\|PYTHON – Bitcoin vs. S&P 500 with Matplotlib Basics Challenges beginner , python , stocks , matplotlib , data-analysis , bitcoin , charts	4	195	December 12, 2023

154|PYTHON – Fraudster Finder

Related Topics