BYU Student Author: @klayton
Reviewers: @Andrew, @Mike_Paulsin
Estimated Time to Solve: 75 Minutes
We provide the solution to this challenge using:
- Python
This challenge utilizes Pandas, Glob, and PyPDF2 packages.
Need a program? Click here.
Overview
You work in the accounting department of a small business, and you need to perform remittance reconciliation. This is a monthly process, and it can take you upwards of two hours to complete depending on how activity happened during the month.
The goal of remittance reconciliation is to make sure that the payments your business receives match up with the invoices or bills that were paid. This process helps ensure that all transactions are accounted for and that any discrepancies are identified and resolved.
Another step in the process is to ensure that the remittances received were posted to the ledger correctly. This typically involves downloading a report and comparing that information to the remittances received.
Your company’s Accounts Receivable department typically receives payment via ACH and sends you the information in an email so you can match the payments to the correct invoices. Those emails are saved as PDF files as proof of remittance. For the purposes of this task, assume that all the remittances are correct and match their invoices. This reconciliation is comparing the remittances to what was uploaded to the general ledger.
You’ve also already downloaded the report that shows which remittances have been posted to the general ledger. It is the excel file titled Challenge119_Data.xlsx.
The information in the remittances includes the client’s name, amount of the remittance, and account code telling Accounts Receivables where to book the payment. The most important parts are the amounts and the first part of the account code that starts with an R followed by 7 numbers. This is the account that ties to each individual client and matches the naming of the remittance.
You know that this reconciliation is important, yet you want to be able to finish it quickly and accurately. You have a basic understanding of python and know that it might prove useful looking into writing some code that might be able to perform this task for you.
Instructions
If you haven’t already done so, install the PyPDF2 and pandas libraries. To do so use the following code
!pip install PyPDF2
!pip install pandas
Your code should be able to accomplish the following:
- Loop through all the remittances and read them in as text.
- Use the code pdf_file = open(file_name,‘rb’) where file_name is the entire file path.
- Then use pdf_reader = PyPDF2.PdfReader(pdf_file) to create a reader object that can be used to compare the text of the pdf to other string.
- Prompt the user to enter file paths for the folder containing all remittances and the folder containing the Excel sheet.
- Check to make sure that any input file path includes a “\” at the end. This can be done by prompting the user to add it, or checking for it in the code and adding it if it isn’t there.
- Read in the Excel file as a pandas dataframe
- Check whether the amount and account in the PDF text match the values in the corresponding row of the Excel sheet
- The code should automatically find the amount and account from the dataframe and compare it to the remittance. Do not manually input the account and amount for each remittance.
- Some of the amounts on the remittances have commas and some do not. The code should be able to account for this.
- It should also be able to loop through the dataframe and find the correct account to match with the remittance file name.
- There are four possible outcomes from the remittance:
- Both the amount and account number match
- Only the amount matches
- Only the account number matches
- Neither the amount nor the account number match
- Create a new results text file in the same location as the Excel sheet. This txt file should contain the information for each remittance.
So, does everything look good? Are there any mistakes?
Data Files
Suggestions and Hints
- The following code will take the pdf_reader and transform it into text that can be viewed and compared.
page = pdf_reader.pages[0] text = page.extract_text()
- To find a match, the amount and account of a particular remittance only need to be found in the pdf text. i.e.
if (amount in text) and (account in text):
‘Amount’ and ‘Account’ variables taken from the dataframe.
- Use .replace(‘,’,’’) with the text variable to remove all commas from the amounts.
- You should have one remittance with a mistake in the end.
Solution
Solution Code
def remit_recon():
import PyPDF2
from glob import glob
import pandas as pd
# Set up the directories for the remittances and the excel file
remitdirectory = input(f'''Enter the file path to the folder containing all remittances (be sure to add a '\\' to the end): \n''')
while True:
if remitdirectory[-1] == '\\':
break
else:
remitdirectory = input(f'''Oh no! You forgot the '\\'! Try again: \n''')
exceldirectory = input(f'''Enter the file path to the folder containing the excel sheet (be sure to add a '\\' to the end): \n''')
while True:
if exceldirectory[-1] == '\\':
break
else:
exceldirectory = input(f'''Oh no! You forgot the '\\'! Try again: \n''')
# input for the excel file name
excel_file = input('what is the name of the excel file? (Do Include the extension): \n')
# read in the excel file as a pandas dataframe
df = pd.read_excel(exceldirectory+excel_file)
# Create new results text file in same location as excel sheet
results = open(exceldirectory+'results.txt','w')
results.close()
# iterate through all the remittances found in the remittance folder
for file_name in glob(remitdirectory+'*'):
pdf_file = open(file_name,'rb')
name = file_name.split('\\')[-1]
# This is the client number that we will be matching in the dataframe (excel file)
r_number = 'R0'+name.split('#')[0]
# print(r_number)
# Create a PDF reader object
pdf_reader = PyPDF2.PdfReader(pdf_file)
# Extract text from the first page
page = pdf_reader.pages[0]
text = page.extract_text()
text = text.replace(',','')
# Print the text
# print(text)
# Make sure the file is actually read, and if not, make a note of it in the txt file.
if text == "":
results = open(exceldirectory+'results.txt','a')
results.write(f'''Unable to read {name} file\n''')
else:
results = open(exceldirectory+'results.txt','a')
# Check to make sure the r_number is even in the excel file
if r_number in df['Account'].values:
# go through entire df to find the correct line
for i,row in df.iterrows():
if row['Account'] == r_number:
x = i
break
# Take amount and account from df
amount = str(df['Amount'][x])
account = str(df['Account'][x])
# Print whether both amount and r number, one, or neither are found in text
if (amount in text) and (account in text):
results.write(f'''Both {amount} and {account} found in {name}\n''')
elif (amount in text) and (account not in text):
results.write(f'''Only {amount} found in {name}\n''')
elif (amount not in text) and (account in text):
results.write(f'''Only {account} found in {name}\n''')
else:
results.write(f'''Neither {amount} nor {account} found in {name}\n''')
else:
# this is going back up to the if statement checking for r_number in df
results.write(f'''{r_number} not found in the excel file\n''')
# close the txt file
results.close()
# Close the pdf file
pdf_file.close()
# loop until all the pdfs have been read.
# run the created function
remit_recon()
The only issue is a transposition error found with the 202519#7 Payment.
Challenge119_Solution.ipynb
Challenge119_Results.txt
Solution Video: Challenge 119|PYTHON – Remittance Rodeo