Pipeline Creation w/ Kale Overview
What is Kale JupyterLab Extension?
KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows. Kale is built right into Kubeflow as a Service and provides a simple UI for defining Kubeflow Pipelines directly from your JupyterLab notebook. There is no need to change a single line of code, build and push container images, create KFP components, or write KFP DSL code to define the pipeline DAG.
With Kale, you annotate cells (which are logical groupings of code) inside your Jupyter Notebook with tags. These tags tell Kuebflow how to interpret the code contained in the cell, what dependencies exist, and what functionality is required to execute the cell.
To create a Kubeflow Pipeline (KFP) from a Jupyter Notebook using Kale, annotate the cells of your notebook selecting from six Kale cell types.
Imports
Are a block of code that imports the packages your project needs. Make it a habit to gather your imports in a single place.
Functions
These represent functions or global variable definitions other than pipeline parameters to be used later in the machine learning pipeline. These can also include code that initializes lists, dictionaries, objects, and other values used throughout your pipeline. Kale prepends the code in every Functions cell to each Kubeflow Pipeline step.
Pipeline Parameters
These are used to identify blocks of code that define hyperparameter variables that are used to finely tune models.
Pipeline Metrics
Kubeflow Pipelines supports the export of scalar metrics. You can write a list of metrics to a local file to describe the performance of the model. The pipeline agent uploads the local file as your run-time metrics. You can view the uploaded metrics as a visualization on the Runs page for a particular experiment in the Kubeflow Pipelines UI.
Pipeline Step
A step is an execution of one of the components in the pipeline. The relationship between a step and its component is one of instantiation, much like the relationship between a run and its pipeline. In a complex pipeline, components can execute multiple times in loops, or conditionally after resolving an if/else like clause in the pipeline code.
Skip Cell
Use Skip to annotate notebook cells that you want Kale to ignore as it defines a Kubeflow pipeline.