Solution - Lab: Create
To incorporate the Prep Data section as a step in our Kubeflow pipeline, please modify your copy of our notebook to meet the following requirements:
- Create a new pipeline step.
- Set the step name to
- Specify the correct step on which
prep_datadepends as the Depends on parameter.
- As part of this annotation, include only cells that contain code that is core to this step.
- Exclude cells in the Prep Data section that are not core to the functionality of this step using the Skip Cell annotation.
Requirements 1, 2, 3, and 4:
- Apply the Pipeline Step annotation to the first cell in the Prep Data section to create a new pipeline step.
prep_dataas the value for the Step name parameter.
clean_dataas the step on which
prep_datadepends. We use
clean_datahere rather than
read_databecause we want the data in the
df_autodata frame after it has been cleaned up by the operations in the
- These cells select just the significant columns we identified during data analysis. It is only these that we want to use in training our model, so these two cells are essential to the prep_data step.
Our pipeline can now be depicted as:
Requirement 5: This cell simply produces some diagnostic output used when we were developing this notebook to ensure that the data frame was being modified to include only the significant columns. We annotate it as a skip cell.
Requirement 4: This cell transforms the categorical variables in our data set
so that they can be used in training our model. It is essential to the
Requirement 5: This cell produces diagnostic output. We annotate it as a skip cell.