BYU Student Author: @DylanKing, @Benjamin_Lau
Reviewers: @Marco, @Boston, @Carter_Lee
Estimated Time to Solve: 45 Minutes
We provide the solution to this challenge using:
Need a program? Click here.
Overview
You’ve been working in public accounting for years now and you have decided you have had enough. Somehow, someway, you have a connection with ESPN and are taking a onceinalifetime opportunity as Stephen A. Smith’s cohost for the hit sports show, First Take. As a huge basketball fan, you want to prepare for your first day on the show when you will debate Lebron’s legacy. Using your data analytics background, you want to find and create stats to outsmart Stephen A.
Instructions

You want to focus all of your stats on LeBron James’ first stint with Cleveland which was from 20032010. The NBA has five major key stats: points, assists, rebounds, steals, and blocks. You want to create a new metric that displays Lebron’s overall dominance. Add up these five key stats to create a new column called Key Stats. Group by player and find what each player’s average was for this metric through the stated time period. Your output should have two columns: Player and Key Stats Avg. Sort by Key Stats Avg in descending order and round Key Stats Avg to two decimal places.

Next, you want to show Lebron’s consistent dominance at such a young age. Rather than looking at averages you want to look at totals for these key stats. For example, if you were to try and find the total points a player scored in a season, you would multiply the number of games they played by their average points per game. Once you calculate the totals for the key stats for each season and player, then calculate the total for each player during this time period. Create a ranking of players for each key stat. Additionally, calculate the total MVPS for each player during this time period and create a ranking of players based on how many MVPs they won. After compiling all of this information, create an output that shows Lebron’s name with all of his totals and rankings across these six categories: points, assists, rebounds, steals, blocks, and MVPs.

Lastly, you want to emphasize that although Lebron lacked quality help, he was able to have consistent team success. Calculate each team’s average winning percentage from 20032010 (consistent with the time period we have been using). Round the average winning percentage to three decimal places and rank the teams in order from highest winning percentage to lowest winning percentage. Your output should have three columns: Average Winning Percentage Rank, Team, and Average Winning Percentage. Only show Lebron’s team which was the Cavaliers (your output should only have one row).
Data Files
Suggestions and Hints
 Consider using pandas Dataframes to solve this problem.
 When you are solving this, try to use display(df.head()) throughout the process to see if you are getting what you want to get with the codes.
 Google or ChatGPT code suggestions as this is often what coders do. Do not feel guilty about using these valuable resources.
Solution
This was a good challenge for me! I had to use some Chat GPT to help with formating the year column on the Historical NBA Performance Data.
Here is my solution:
import pandas as pd
df = pd.read_csv('data/Challenge118_NBA_Player_Stats.csv')
df.head()
#Question 1
#Create Key Stats Column
df['Key_Stats'] = df['PTS'] + df['AST'] + df['TRB'] + df['STL'] + df['BLK']
#FIlter by year and create new average dataframe
df = df[(df.Season == "200304")(df.Season == "200405")(df.Season == "200506")(df.Season == "200607")(df.Season == "200708")(df.Season == "200809")(df.Season == "200910")]
new_df = df[['Player', 'Key_Stats']]
new_df = new_df.groupby('Player').mean()
new_df = new_df.sort_values(by='Key_Stats', ascending=False)
new_df['Key_Stats'] = new_df['Key_Stats'].round(2)
print(new_df)
#Question 2
#Create new total colums
df['PTS_total'] = df['PTS'] * df['G']
df['AST_total'] = df['AST'] * df['G']
df['TRB_total'] = df['TRB'] * df['G']
df['BLK_total'] = df['BLK'] * df['G']
df['STL_total'] = df['STL'] * df['G']
#Create new total dataframes
PTS_df = df.groupby('Player')[['PTS_total']].sum().reset_index().sort_values(by='PTS_total', ascending=False)
AST_df = df.groupby('Player')[['AST_total']].sum().reset_index().sort_values(by='AST_total', ascending=False)
TRB_df = df.groupby('Player')[['TRB_total']].sum().reset_index().sort_values(by='TRB_total', ascending=False)
BLK_df = df.groupby('Player')[['BLK_total']].sum().reset_index().sort_values(by='BLK_total', ascending=False)
STL_df = df.groupby('Player')[['STL_total']].sum().reset_index().sort_values(by='STL_total', ascending=False)
MVP_df = df.groupby('Player')[['MVP']].sum().reset_index().sort_values(by='MVP', ascending=False)
#Order new total dataframes
PTS_df['PTS_Rank'] = PTS_df['PTS_total'].rank(ascending=False).astype(int)
AST_df['AST_Rank'] = AST_df['AST_total'].rank(ascending=False).astype(int)
TRB_df['TRB_Rank'] = TRB_df['TRB_total'].rank(ascending=False).astype(int)
BLK_df['BLK_Rank'] = BLK_df['BLK_total'].rank(ascending=False).astype(int)
STL_df['STL_Rank'] = STL_df['STL_total'].rank(ascending=False).astype(int)
MVP_df['MVP_Rank'] = MVP_df['MVP'].rank(ascending=False).astype(int)
#Create LeBron filtered dataframes and then merge them
LeBron_PTS_df = PTS_df[PTS_df['Player'] == 'LeBron James']
LeBron_AST_df = AST_df[AST_df['Player'] == 'LeBron James']
LeBron_TRB_df = TRB_df[TRB_df['Player'] == 'LeBron James']
LeBron_BLK_df = BLK_df[BLK_df['Player'] == 'LeBron James']
LeBron_STL_df = STL_df[STL_df['Player'] == 'LeBron James']
LeBron_MVP_df = MVP_df[MVP_df['Player'] == 'LeBron James']
LeBron_df = pd.merge((pd.merge((pd.merge((pd.merge((pd.merge(LeBron_PTS_df, LeBron_AST_df, on='Player')), LeBron_TRB_df, on='Player')), LeBron_BLK_df, on='Player')), LeBron_STL_df, on='Player')), LeBron_MVP_df, on='Player')
print(LeBron_df)
#Question 3
#Read in and filter team dataframes
team_df = pd.read_excel('data/Challenge118_Historical _NBA_Performance.xlsx', usecols=["Year", "Team", "Winning Percentage"], converters={'Year': lambda x: str(x)[:7]})
team_df = team_df[(team_df.Year == "200304")(team_df.Year == "200405")(team_df.Year == "200506")(team_df.Year == "200607")(team_df.Year == "200708")(team_df.Year == "200809")(team_df.Year == "200910")]
#create the average dataframe and rank column then the Cavaliers dataframe
team_avg_df = team_df.groupby('Team').mean('Winning Percentage')
team_avg_df_df = team_avg_df.sort_values(by='Winning Percentage', ascending=False)
team_avg_df['Average Winning Percentage'] = team_avg_df['Winning Percentage'].round(3)
team_avg_df['Average Winning Percentage Rank'] = team_avg_df['Average Winning Percentage'].rank(ascending=False).astype(int)
team_avg_df = team_avg_df[['Average Winning Percentage', 'Average Winning Percentage Rank']]
cavaliers_df = team_avg_df[team_avg_df.index == 'Cavaliers']
print(cavaliers_df)