This commit is contained in:
cwc023
2024-12-10 19:11:06 -05:00
parent 0e83fa6bff
commit a34dc4befb

View File

@@ -13,42 +13,6 @@
"Instructor: Brian King " "Instructor: Brian King "
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assignment Description\n",
"This is your final report! Consider the scenario of you giving a report to a client or your supervisor on your study. Include good reporting techniques. Use good tables and other visualizations. Structure your notebook using proper headers for major sections. THIS NOTEBOOK SHOULD BE A COMPLETE STANDALONE NOTEBOOK FROM START TO FINISH! But, ONLY INCLUDE THE BEST OUTCOMES FROM YOUR PREVIOUS NOTEBOOKS!\n",
"\n",
"Include the following sections:\n",
"\n",
"1. **Introduction**\n",
" - This should have mostly been done in your first notebook. Just copy over relevant cells from your first part of the project, and add any new information you have learned. Your aim is to motivate the reader with the importance and relevance of your project.\n",
"\n",
"2. **Data**\n",
" - Again, most of this was likely done in your first part of the project. So, feel free to copy the important cells over.\n",
" - After your introduction, you should be introducing the original, raw data. Where was it collected? When? How? Explain the meaning of the variables. What does each observation represent? And be sure to explain the key target variable (assuming you are doing classification/regression).\n",
" - PLEASE DO NOT SHOW PAGES AND PAGES OF YOUR DATA! Display only a few observations so that the reader can see what your raw, original data look like.\n",
"\n",
"3. **Data Preparation**\n",
" - While the previous section shows the raw data, this section is going to carefully explain the steps you followed to clean and preprocess the data in a form suitable for analysis, visualization, and modeling. You should be setting proper variable types, dealing with missing data, etc. Preprocessing steps should be explained with justification. Include any dimensionality reduction techniques you might have done. Summarize what you needed to do to clean it, and show some example observations from your final, cleaned data.\n",
"\n",
"4. **EDA**\n",
" - We expect good EDA to understand your data. Visualizations after preprocessing will do far more to convey your summary statistics than just numbers. Discuss the distributions, correlations with the target variable, etc.\n",
"\n",
"5. **Modeling**\n",
" - What modeling methods did you try? And, which method(s) did you ultimately determine were the best? What hyperparameters were selected? Justify the selection of parameters. (i.e., did you do a grid search? You cannot simply say, \"XGBoost was the only one I evaluated, and I used default parameters.\" Boring, and unlikely to obtain the best results. You are expected to evaluate different models and different hyperparameters. It is very rare that default parameters are the best in machine learning.)\n",
"\n",
"6. **Performance Results**\n",
" - Once you've selected the best model, clearly convey the results of your model. I expect to see ROC curves, precision/recall curves, confusion matrices, tables with prediction performance by class, (or, if regression, use appropriate regression measures), etc.\n",
"\n",
"7. **Discussion**\n",
" - Reflect on your project. For example: discuss any challenges you had with cleaning and preparing the data. Did you find any surprises during your modeling? Compare and contrast the methods and hyperparameters you evaluated. And, it's often useful to discuss the features that you thought were the most predictive, and those that were least useful. (Search for feature importance scikit-learn for more info!) Any info that might be of interest to me related to your project goes here.\n",
"\n",
"8. **Conclusions**\n",
" - Write a short summary of your project goes here."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 56, "execution_count": 56,