Moving Forward with MLOps

MLOps and Continuous Processes

When you develop a model using Kubefow Pipelines, you create a stand-alone process that itself is dedicated to training the model. The model needs to be continually updated in response to changes in the data or feedback from the monitoring service observing the model in production. The Continuous Training process in an MLOps environment is designed to maintain the model that ultimately makes its way to production. The steps you took in this course directly follow the steps of the Continuous Training process that will ultimately be utilized in an MLOps deployment.

Initially, you downloaded the data, performed data exploration, and identified the features necessary for building the model. In a mature environment the data would have been pulled from an External Data Source, the features would havebeen pulled from a Feature Store and the model build would have been based on these standard inputs. These standard inputs would also have cleaned data made available for the model development life cycle. The Kubeflow Pipeline that creates the model is simply just a stage in a larger Continuous Training process that repeatedly triggers on a scheduled basis, in response to changes in the data set or to updates to the Feature Store. Since the Kubeflow Pipeline is snapshotted and stored in Rok, re-training the model is simply a matter of automation, loading the snapshot with the cleaned data, and executing the training pipeline.

MLOps and Continuous Training

Continuous Training is not just about repeatedly creating a model - it is about repeatedly creating, testing, and validating the ideal model. In this course, you did the same thing - once a suitable model was identified based on the features selected you performed Hyperparameter Tuning to create the ideal model. Since Hyperparameter Tuning is done with Kubeflow Pipelines which push execution output to an Artifact Store the ideal model can be programmatically identified. Through automation, Continuous Training is not just creating a model, but creating the ideal model by iterating through many options and identifying the best fit to ship to production.

However, just having the ideal model selected based on Hyperparameter Tuning is not enough to close out the Continuous Training process. For a model to be truly considered trained and ready to move to integration it must be tested. This is what is referred to as “shift-left” in a DevOps practice and an approach that MLOps borrows. By performing as much testing as early as possible, future maintenance or production issues can be avoided. This is often referred to as a “smoke test”

Finally, a Continuous Training process must push the Kubeflow Pipeline to a central repository so that a Continuous Integration process can migrate the model across environments. Rok Registry allows for seamless migration of Kubeflow Pipeline snapshots so that the model creation process can be identically replicated in any destination environment separate from where the Continuous Training is taking place.

The significance of a Continuous Training process cannot be understated - it is the result of the happy marriage between the Model Development Life Cycle and DevOps principles that support MLOps.

MLOps and Continuous Integration

While the Model Development Life Cycle is self-contained within a Kubeflow Pipeline and the associated supporting Continuous Training process, the actual migration of model creation components across environments is addressed by a Continuous Integration process. The purpose of Continuous Integration is to make sure model creation code changes are regularly made available and propagated from personal Data Scientist environments in Kubeflow to staging environments and ultimately to production.

Several pieces have to come together to facilitate this Continuous Integration process:

Artifact Store: stores the component/step output of each model creation pipeline step.
Model Registry: stores the actual model created by the pipeline.
Container Registry: stores the container images that make up the pipeline steps.
Snapshot Registry: stores the snapshots that are taken during model execution.
Machine Learning Metadata: stores metadata about the artifacts generated through the components/steps of the ML pipeline as well as the execution of the components/steps and the lineage of the pipelines that are executed.

The Continuous Integration process is responsible for taking the relevant pieces of the puzzle and using the pieces to rebuild the solution in a new environment that will be eventually deployed via a continuous delivery tool. However this is not the only aspect of Continuous Integration - there must also be authorization, either manually or programmatically, before any integration of solution can take place. Therefore in addition to the above, a Continuous Integration process must also have a checkpoint for either manual or automatic approval of the model for integration before the model is delivered to the appropriate environment. There may be established programmatic thresholds that trigger integration based on quality metrics or disallow them based on poor performance. This approach ensures that changes can be rapidly rolled back if necessary, with minimal impact on production. While the Continuous Integration process may appear to be a simple step it is perhaps the most important since it serves as the gatekeeper between development and production environments.

MLOps and Continuous Deployment

Continuous Training and Continuous Integration are focused on making sure that models are of the highest caliber and that there is a single version of the truth for creating the models. This single version of the truth for the Kubeflow Pipeline that drives model creation is based on the Feature Store, the Model Registry, the Container Registry, the Artifact store, and the Machine Learning Metadata. In traditional DevOps, the Continuous Deployment process is pushing the most recent changes from the developer base to production. In MLOps the Continuous Deployment process has the same responsibility as in DevOps. The Machine Learning Metadata Data enables MLOPs to release engineering to better understand what it took to build a model and how the iterations were affected from model to model. This will serve as the basis for “model lineage” tracking, the core to this all being immutable artifacts that are referenced via hermetic build definitions so we can have reproducible and replicable results across environments. The Continuous Deployment process picks up where the Continuous Integration process leaves off and takes the pieces necessary to rebuild the Kubeflow Pipeline in any environment from the centralized repository that the Continuous Integration process writes to. Continuous Deployment rounds out these three continuous processes and makes sure that the model is actually pushed to and used in a production inference service.

Marching towards MLOps Utopia

Kubeflow Pipelines give you the connective tissue to train models with various frameworks, iterate on them, and then eventually expose them for the purpose of serving. This means our entire Model Development Lifecycle lives within our KubeFlow pipeline components and definitions. We now have the power to be intentional and declarative with how we want our models to be developed as well as how we provide feedback loops to our data science teams. This gives us the capacity to further improve not only the data science function code but the pipeline descriptions themselves in order to respond to our ever-growing business demands. This lays the foundation for the Continuous Integration and Continuous Deployment processes which ultimately push and support models in production. This can become quite daunting if you are taking the manual approach. What works for your organization today might suffer the technical debt-ridden test of time as you begin to scale and introduce essential complexity that comes from improved velocity and feature offerings. Adopting an effective and self-sustainable MLOps culture and platform solution introduces technical stability, platform longevity, and a higher degree of team collaboration and model quality into an enterprise. The principles and concepts presented in this course are just the foundation for a much larger journey that Kubeflow will facilitate.

We hope you enjoyed this course and are as excited about the future of MLOps as we are.

If you would like to explore further solutions by attending Instructor-Led presentations: https://www.arrikto.com/kubeflow-mlops-events

If you would like to explore further solutions by completing additional Self-Service courses: https://academy.arrikto.com/explore

If you would like to explore MLOps further please reach out to us for a free workshop: https://academy.arrikto.com/contact