# 72|PYTHON – True Blue Investing

BYU Student Author: @Parker_Sherwood
Reviewers: @Nate, @Mike_Paulsin, @Spencer
Estimated Time to Solve: 60 Minutes

We provide the solution to this challenge using:

Overview
As an employee at True Blue Financial Solutions, a small financial advisory firm, you’re responsible for calculating the total return for multiple stocks over a given time period. With large datasets, this process can be complicated and time-consuming. That’s why you’re seeking to simplify your workload by automating the process using Python. In this challenge, you’ll develop a Python program to automate the calculation of total returns, saving yourself time and increasing the efficiency of True Blue Financial Solutions.

Your manager has tasked you with calculating the total dollar return and total percentage return for each stock, as well as identifying the stock with the highest return. However, you want to go above and beyond by also determining what percentage of the stocks in the dataset outperformed the S&P 500 index. To visualize this information, you plan to use a histogram to display the distribution of percentage returns, like the image below:

Instructions

1. Calculate the total dollar return and total percentage return of each stock.
Suggestions and Hints
• Import the necessary packages, including pandas, numpy, and pyplot from matplotlib.
• Read in the CSV file containing the stock data and create a Pandas dataframe.
• Convert the ‘date’ column to a datetime object and sort the dataframe by ticker and date.
• Drop any duplicate rows.
• Create a summary dataframe that contains the starting price (opening price on the earliest day that ticker appears in the original data) and the ending price (closing price on the latest day that ticker appears in the original data) for each ticker.
• Create a list of tickers to iterate through.
• Create dictionaries to store the starting and ending prices.
• Iterate through the list of tickers and find the starting and ending prices.
• Create dataframes with the starting and ending prices.
• Merge the starting and ending dataframes on the ‘Name’ column.
• Create ‘\$ return’ (ending price – starting price) and ‘% return’ (\$ return / starting price) columns, drop any that include divide by 0 errors.
2. Identify the stock with the highest return.
Suggestions and Hints
• Print out the stock with the highest ‘% return’ by first identifying the highest % return by using the .max() method. The solution file then uses the .loc method to find the name associated with that return.
3. Identify what percentage of stocks in this dataset outperformed the S&P 500 (0.7856 or 78.56% for this time period).
Suggestions and Hints
• Store the S&P 500’s return over this same time period as a variable.
• Create a dataframe of stocks that outperformed the S&P 500 and print the percentage of the sample this represents.
4. Create a histogram of ‘% return’ including a title, labels, and a vertical line to show where the S&P 500 return lies on the distribution.
Suggestions and Hints
• Use your summary data, not the outperformers, as the data for the histogram.
• Use plt.hist() to create the histogram.
• Use plt.axvline to add a vertical line for the S&P return.
• You can use “bins=” to change the number of bars in the histogram. This may help you visualize the data better.

Data Files

Solution

Solution Code
``````import pandas as pd
import numpy as np
# Import pyplot to create the histogram later
from matplotlib import pyplot as plt

#Read in the CSV and create dataframe

# Convert the date column to a datetime object
stock_data['date'] = pd.to_datetime(stock_data['date'])

#Sort dataframe by ticker and date
stock_data = stock_data.sort_values(by = ['Name', 'date'], ascending = [True, True], ignore_index=True)

# Drop any duplicate rows
stock_data.drop_duplicates(subset=['date', 'Name'], inplace=True)

#Print out the first five rows of the dataframe

#create a list of tickers to iterate through
tickers = []
for ticker in stock_data.Name:
if ticker not in tickers:
tickers.append(ticker)

# Create dictionaries to store the starting and ending prices
starting_prices = {}
ending_prices = {}

# Iterate through the list of tickers and find the starting and ending prices
for ticker in tickers:
first_date = stock_data.loc[stock_data['Name'] == ticker, 'date'].min()
last_date = stock_data.loc[stock_data['Name'] == ticker, 'date'].max()
starting_price = stock_data.loc[(stock_data['Name'] == ticker) & (stock_data['date'] == first_date), 'open'].values[0]
ending_price = stock_data.loc[(stock_data['Name'] == ticker) & (stock_data['date'] == last_date), 'close'].values[0]
starting_prices[ticker] = starting_price
ending_prices[ticker] = ending_price

# Create dataframes with the starting and ending prices
starting_data = pd.DataFrame({'Name': list(starting_prices.keys()), 'starting_price': list(starting_prices.values())})
ending_data = pd.DataFrame({'Name': list(ending_prices.keys()), 'ending_price': list(ending_prices.values())})

# Merge the starting and ending dataframes on the 'ticker' column
summary_data = pd.merge(starting_data, ending_data, on='Name')

# Print the summary dataframe

# Create the \$ return and % return columns. Drop any that include divide by 0 errors
summary_data['\$ return'] = summary_data['ending_price']-summary_data['starting_price']
summary_data['% return'] = summary_data['\$ return']/summary_data['starting_price']
summary_data['% return'] = summary_data['% return'].replace('nan%', np.nan)
summary_data.dropna(subset=['% return'], inplace=True)

# Find the stock and the % return of the stock with the highest return
max_return = summary_data['% return'].max()
max_return_stock = summary_data.loc[summary_data['% return'] == max_return, 'Name'].values[0]
print(f"The stock with the highest % return is {max_return_stock} with a return of {float(max_return):.2%}")

#Over this same time period, the S&P 500 had a total return of 78.56%.
#How many stocks outperformed the S&P 500? What percentage of stocks in this sample does that represent?
sp_return = 0.7856

# Create a dataframe of stocks that outperformed the S&P 500 and print the percentage of the sample this represents
high_performers = summary_data[summary_data['% return'] > sp_return]
high_performers = high_performers.sort_values(by = ['% return'], ascending = [False], ignore_index = True)

print(f"{float(len(high_performers)/len(summary_data)):.2%} of stocks in the dataset outperformed the S&P 500")

# Create a histogram based on % return including a title and labels
plt.hist(summary_data['% return']*100, bins = 200)
plt.title('% Returns')
plt.xlabel('% return')
plt.ylabel('Count of Stocks')

# Add a vertical line using axvline() method to show where the S&P 500 return lies on the distribution
plt.axvline(sp_return*100, color='black', linestyle='dashed', linewidth=0.5)
``````

Great challenge!

This is how I created the summary data. Great challenge!

Great challenge. it looks like 43.56% of stocks outperformed the S&P 500! and NVDIA had the highest return with 1749.64% (wow haha. I remember back when I got my first computer like that). Here is the historgram I got.

I’m having trouble uploading the whole chart for the returns and am not sure how to upload the whole dataframe but here’s the head of the dataframe
Name dollar_return percentage_return
0 A 22.9900 51.009541
1 AAL 36.3300 241.074983
2 AAP 31.5900 40.324228
3 AAPL 91.8258 135.607893
4 ABBV 77.2500 212.400330

Great Challenge Parker! I was able to practice my pandas, matplotilb skills with this! Thanks!

``````import pandas as pd
import matplotlib.pyplot as plt
import os
pd.options.display.float_format = '{:,.2f}'.format # Display option to include 2 decimal places on all floats
pd.set_option('display.max_columns', None) # Display option to show all columns
pd.set_option('display.max_rows', None) # Display option to show all rows

name_list = []
df_list = []
sp_ret = 0.7856
name_list = list(set(df["Name"].values.tolist()))

for name in name_list:
iso_df = df[(df['Name'] == name)]
max_date = iso_df["date"].max()
min_date = iso_df["date"].min()
max_min_iso_df = iso_df[(iso_df["date"] == max_date)|(iso_df["date"] == min_date)]
max_min_iso_df.sort_values(by=["date"], ascending=[True])
ret = max_min_iso_df.iloc[1,4] - max_min_iso_df.iloc[0,1]
percent_ret = ret / max_min_iso_df.iloc[0,1]
new_df = pd.DataFrame({"Name" : [name], "Return" : [ret], "Percentage Return" : [percent_ret]})
df_list.append(new_df)

ultimate_df = pd.concat(df_list, ignore_index=True)
ultimate_df.dropna(inplace=True)

max_ret = ultimate_df["Return"].values.max()
max_ret_name = ultimate_df[ultimate_df["Return"] == max_ret].iloc[0,0]
max_percent_ret = ultimate_df["Percentage Return"].values.max()
max_percent_ret_name = ultimate_df[ultimate_df["Percentage Return"] == max_percent_ret].iloc[0,0]

print(f"""The company with the highest dollar return is {max_ret_name} with \${max_ret} per share.
The company with the highest percentage return is {max_percent_ret_name} with {round(max_percent_ret * 100, 2)}%.""")

total_comp = len(ultimate_df)
outperformed_comp = len(ultimate_df[ultimate_df["Percentage Return"] > sp_ret])
print(f"{round((outperformed_comp / total_comp) * 100, 2)}% of companies outperformed the S&P 500.")

plt.figure(figsize=(10,6))
plt.hist(ultimate_df["Percentage Return"] * 100, bins=100)
plt.title("% Return of Companies' Stocks")
plt.xlabel("% Return", fontsize = 12)
plt.ylabel("Count of Stocks", fontsize = 12)
plt.axvline(x=sp_ret * 100, color = "red", linestyle = "--", linewidth = 1)
plt.show()
``````