BYU Student Author: @Nate
Reviewers: @Mike_Paulsin, @Brett_Lowe
Estimated Time to Solve: 20 - 30 Minutes
We provide the solution to this challenge using:
- Python
Need a program? Click here.
Overview
The Securities and Exchange Commission (SEC) regularly corresponds with companies after reviewing their financial statements and other disclosure documents. These letters may raise questions, concerns, or requests for additional information related to a Company’s implementation of the standards found within the Accounting Standards Codification (ASC).
As accounting professionals, we help clients navigate the complexities of GAAP and ensure compliance with the latest standards, so knowing which ASC topics are under the greatest SEC scrutiny can enable us to better advise our clients on these issues. Fortunately, a background in data analytics can save us time spent searching through comment letters to identify these trends.
Instructions
In this challenge, you’ll write Python script that can parse through multiple SEC inquires to recognize explicit mentions of ASC topics, identifying which are most commonly mentioned.
Included below is a folder containing all 12 comment letters sent by the SEC in the first quarter of 2023 that reference the ASC. All mentions of ASC topics follow the format: ASC ### (e.g., ASC 321). Extract and count these mentions, grouped by topic.
A file containing the numbers and names of all ASC topics is also included. Use this file to add names to the topics you extracted from the SEC comment letters. Export or print a simple report of your mentioned topics count. Are you surprised which standards receive the most scrutiny?
Suggestions and Hints
Many mentions are to specific subsections (e.g., ASC 321-10-35-2). For this challenge you only need to extract the first three digits, which indicate the topic area.
The native Python function open()
paired with methods .readlines()
or .read()
allows you to save .txt
files as string variables.
Python’s native Operating System interfacing package can help you reference directories and iterate through files in a folder using the following methods:
os.getcwd()
-
os.listdir()
Import this package withimport os
Python’s native Regex package contains the following methods that can help identify patterns in string:
-
re.findall()
- creates a list of all substrings matching the specified pattern within a string -
re.sub()
- replaces all instances of the specified pattern within a string with a specified string -
re.match()
- if any instances of the specified pattern are found within a string, returns a match object, otherwise, returnsNone
Import this package withimport re
Data Files
Solution
Solution Code
import os #import packages
import re
import pandas as pd
#x = '\\' #for Windows users
x = '/' #for Mac users
directory = os.getcwd() + f'{x}Uploads' #save folder holding SEC correspondence letters - Windows
pattern = r'ASC\s+\d{3}' #save regex pattern to search for refernces to ASC topics
topics_dict = {} #initialize dictionary to save mentions of each topic
for upload in os.listdir(directory): #iterate through files in 'Uploads' folder
text = open(f'{directory}{x}{upload}').read() #save contents of letter to 'text'
text = re.sub(r'\n\s*',r' ',text) #replace newline and extra whitespace characters with single spaces
asc_topics = re.findall(pattern,text,re.IGNORECASE) #save all matches for 'pattern' to 'asc_topics' list
for topic in asc_topics: #iterate through all matches in 'asc_topics' list
if topic in topics_dict: #check for previously mentioned topics
topics_dict[topic] += 1 #add an additional mention to existing topic in dictionary
else:
topics_dict[topic] = 1 #add new topic to dictionary with 1 mention
df = pd.DataFrame(data=topics_dict.items(),columns=['Topic','Mentions']) #create dataframe with mentioned topics
df.sort_values(by=['Mentions'],ascending=False,inplace=True) #sort dataframe with most-mentioned topics first
topics_list = open(os.getcwd() + f'{x}ASC Topics.txt').readlines() #create list of all topics in the ASC
for topic in topics_list: #iterate through all topics in the ASC
for i, row in df.iterrows(): #iterate through all rows of the mentioned topics dataframe
if re.match(row['Topic'][-3:],topic): #check if topic from list matches mentioned topic
topic = re.sub('\n|\d{3}','',topic) #delete numbers from topic
df.loc[i,'Topic'] = df.loc[i,'Topic'] + topic #concatenate topic name with topic number and saves to 'df'
print(df) #display 'df' for viewing
df.to_excel('SEC Topics of Interest.xlsx',index=False) #export 'df' to Excel report
Challenge30_Solution.txt
Solution Video: Challenge 30|PYTHON – SEC and ASC