MLOps and Continuous Integration
While the Model Development Life Cycle is self-contained within a Kubeflow Pipeline and the associated supporting Continuous Training process, the actual migration of model creation components across environments is addressed by a Continuous Integration process. The purpose of Continuous Integration is to make sure model creation code changes are regularly made available and propagated from personal Data Scientist environments in Kubeflow to staging environments and ultimately to production.
Several pieces have to come together to facilitate this Continuous Integration process:
- Artifact Store: stores the component/step output of each model creation pipeline step.
- Model Registry: stores the actual model created by the pipeline.
- Container Registry: stores the container images that make up the pipeline steps.
- Snapshot Registry: stores the snapshots that are taken during model execution.
- Machine Learning Metadata: stores metadata about the artifacts generated through the components/steps of the ML pipeline as well as the execution of the components/steps and the lineage of the pipelines that are executed.
The Continuous Integration process is responsible for taking the relevant pieces of the puzzle and using the pieces to rebuild the solution in a new environment that will be eventually deployed via a continuous delivery tool. However this is not the only aspect of Continuous Integration - there must also be authorization, either manually or programmatically, before any integration of solution can take place. Therefore in addition to the above, a Continuous Integration process must also have a checkpoint for either manual or automatic approval of the model for integration before the model is delivered to the appropriate environment. There may be established programmatic thresholds that trigger integration based on quality metrics or disallow them based on poor performance. This approach ensures that changes can be rapidly rolled back if necessary, with minimal impact on production. While the Continuous Integration process may appear to be a simple step it is perhaps the most important since it serves as the gatekeeper between development and production environments.