mirror of
https://github.com/soconnor0919/f1-race-prediction.git
synced 2026-02-05 00:06:39 -05:00
39 lines
2.8 KiB
Plaintext
39 lines
2.8 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Formula One Project: Data Preparation and EDA\n",
|
|
"\n",
|
|
"DUE: November 22nd, 2024 (Fri) \n",
|
|
"Name(s): Sean O'Connor, Connor Coles \n",
|
|
"Class: CSCI 349 - Intro to Data Mining \n",
|
|
"Semester: Fall 2024 \n",
|
|
"Instructor: Brian King "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Assignment Description\n",
|
|
"Create your first notebook file, DataPrep_EDA.ipynb. Use both markdown and code cells to convey the following:\n",
|
|
"- What problem are you working on? Summarize in a single cell.\n",
|
|
"- What data are you using to understand the problem? Describe the data in a very general sense. Where did it come from? You should understand what every observation in the data represents, and what each variable represents.\n",
|
|
"- Remember that the key to achieving good machine learning outcomes is understanding how each real-world entity in your data will be represented as a fixed length vector of attributes in your dataset! Preprocessing your data will be a big part of this challenge. If you do not expect to spend quality time cleaning and prepping your data, you will not get good results. Once you have established how each data object is represented in a form ready for a data mining algorithm, and the data are clean, you will have a substantial part of your battle toward modeling solved.\n",
|
|
"- Strive to generate good summary statistics, show what the data looks like, and include good EDA and visualizations with boxplots, barcharts, density plots for key variables, or whatever other plots you want that are specific to your data and problem to help the reader understand basic distributions of important variables. Visualizations can help you convey general info about your data and are extremely helpful.\n",
|
|
"- In your final cells, discuss the modeling methods you expect to use. Start by clearly explaining if this is a classification, regression, clustering, or association rule mining problem? Justify. You have much of the framework to apply most algorithms, even those beyond what we covered in class. Feel free to explore different methods if you have good justification for doing so. If there are any papers of significance that have been published with these data, then discuss the ones most interesting/relevant to the team.\n",
|
|
"- Finally, what is your overarching aim with this project? What are you hoping to learn? Or, what hypothesis are you using the data to confirm or disprove? What challenges do you foresee on this project? Discuss your concerns. How will you get your work done? Give a reasonable list of milestones to reach to arrive at the final deadline for the project."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|