BYU Student Author: @Benjamin_Lau
Reviewers: @Marta_Ellsworth , @Jimmy_Han
Estimated Time to Solve: 30 Minutes
We provide the solution to this challenge using:
- ChatGPT
- SQL
Need a program? Click here.
Overview
This challenge aims to test your skills in both data generation using ChatGPT and SQL querying with ChatGPT’s assistance. The challenge is designed to enhance your proficiency in data manipulation and SQL query formulation. It can hopefully help you learn SQL through ChatGPT and understand how powerful this tool can be.
Instructions
- Use ChatGPT to generate a fictitious firm expense dataset with information for transactionID, date, category, description, amount, currency and other relevant information you want to include. Remember to ask ChatGPT to output that in a .csv format.
a. Usually, ChatGPT 3.5 will only generate 10 rows of data but if you continue to ask for more, it will spit out more rows. For example, you can prompt “Can you generate more?” Until you generate as much data as you want. In the solution data, I asked 4 more times to generate 40+ rows. Do NOT close your ChatGPT afterward. - After generating the dataset, you can copy the entire csv output by pressing “copy code” on the top right of the output box. Paste the dataset onto an Excel worksheet. Then, parse the dataset into various columns. See Hint #2.
- Open Microsoft Access and import the dataset by first creating a blank database. Then, under “External Data”, press “New Data Source” > “From File” > “Excel” to import the newly created Excel dataset.
- Write an SQL to find the transactions with amounts equal to or larger than $1,500 and sort the output by amount descending. Display the columns Date, Category, Description, and Amount for these transactions.
- Ask ChatGPT to check your answer. Paste your SQL and prompt ChatGPT to check if it achieves the purpose. An example prompt can be found under Hint #3. Or you can try it out yourself and experiment with what is best to say when interacting with ChatGPT in order to have ChatGPT check your codes to achieve your desired purpose.
- Write another SQL to find the sum of the transaction amounts for each category. Sort the output by amount descending. Display Category and Sum_Amount for each category. Use ChatGPT to check your answers again.
- Take a screenshot of the heading and first row your dataset. Upload the screenshot and Access database file (.accdb) to this TechHub challenge post. Also, comment on the power of generative AI on database query and answer the question “do we still need to learn SQL given this power of Gen AI?”
Data Files
- You will create your own data file
Suggestions and Hints
- Select all the data including the headings. Go to Data > Text-to-Columns. Check the box for Delimited and click next. Check the box for comma and uncheck all other boxes. Click Next. Set destination as $A$1 and click Finish.
- You can use prompts like, “Can you generate an expense dataset by an AI tech company and output the data in a .csv format with primary keys? Include columns for date, category, description, amount, currency and other relevant information.”
- An example prompt for checking answers can be “Here is my SQL. Would it achieve the purpose of finding the transactions with amounts equal to or larger than 1500 and sort the output by amount descending? And displaying the date, category, description, and amount for these transactions. Assuming I have parsed this dataset into multiple columns in a sheet named Sheet1.”
Solution