(Hands-On) Debug Prior State from Snapshot
1. Add a print command
Add a print command to this cell:
print(acc_random_forest)
2. Run the active cell
Run the active cell by pressing Shift + Return
to retrain the random forest and print the score. It is 100:
3. Add a new cell
Now it’s time to see if there is something out of the ordinary in the training data. To explore and fix this issue, add a cell above the Random Forest markdown by selecting the previous cell and clicking the plus icon (+)
:
4. Print the training set
Add the following text and execute the cell to print the training set:
train_df
Oops! The column with the training labels (“Survived”) has mistakenly been included as input features! The model has learned to focus on the “Survived” feature and ignore the rest, polluting the input. This column exactly matches the model’s goal and is not present during prediction, it needs to be removed from the training dataset to let the model learn from the other features.