Define a Pipeline Step

You will start with the logistic_regression_pipeline.py file which has several code components:

An ML engineer would write similar functions, test them separately, and in the end, group them to launch an ML experiment. To this end, you can define a fourth function that calls each step and weaves everything together. This is the ml_pipeline function, the main entry point. Finally, we add the standard Python boilerplate code to parse user arguments and call the ml_pipeline function with those arguments. To convert a Python file, or any code, into a Kubernetes Pipeline with the Kale SDK you must tag the code just as you would do in the Kale UI in a JupyterLab Notebook. These tags will indicate to Kale how to construct and execute the Kubeflow pipeline. Throughout this course you will continue to manipulate the tags and build up your pipeline. If you are not yet familiar with the Kale Tags we recommend first taking Kale 101 where these concepts are explained in depth.

Tip

Please follow along in your own copy of our notebook as we complete the steps below.

1. Download Python Code

For the first section of this course, you will start with a code sample and build up the code taking advantage of Kale SDK functionality.

Download the code by clicking here and then upload your code to the Notebook Server.

2. Import Step from Kale

Add the following to the imports section of the code to import the step decorator from the Kale SDK

from kale.sdk import step

3. Add @step decorator to the load function

Adding the @Step decorator above a function indicates that this function is to be used as a step in the pipeline and can be used to provide additional details as you will see throughout this course. When defining a step make sure to provide a name using name= when defining the step. Add the following above the load(random_state) function to indicate both that this code block is a step and the associate step name.

@step(name="data_loading")