(Hands-On) Reproduce Prior a State from Snapshot

1. Check out the logs

Have a look at the logs for the second-to-last pipeline step “results”. Notice that all the predictors show a score of 100%. An experienced data scientist should immediately find this suspicious. This is a good indication that our models are not generalizing, either we are overfitting on the training dataset or there might be some other mistake in the input features. This is likely caused by an issue with the data consumed by the models.

2. Restore Notebook

Fortunately, Rok takes care of data versioning and reproducing the whole environment as it was the time you clicked the Compile and Run button. This way, you have a time machine for your data and code. Let’s resume the state of the pipeline before training one of the models and see what is going on.

Choose one of the following options, based on your version.

a. Go to the randoforest step

Take a look at the randomforest step, then click on Snapshots.

b. Restore state from pipeline

Click on the first Restore Workbench button to restore the state of pipeline run before the randomforest step.

c. Check the volume information

The volume information should have been filled automatically:

d. Name your notebook

Specify a name for your notebook:

a. Go to the randomforest step

Take a look at the randomforest step, then click on Visualizations.

b. View the snapshot in Rok UI

Follow the steps in the Markdown. View the snapshot in the Rok UI by clicking on the corresponding link.

c. Copy the URL

Copy the Rok URL.

d. Go to Notebooks

Navigate to the Notebooks tab.

e. Add new Notebook

Click on New Notebook:

f. Paste the Rok URL

Paste the Rok URL you copied previously:

All the snapshot details, including notebook image and volumes, will be retrieved automatically, and you will see this message:

g. Use the default Docker image

Make sure you are using the default Docker image. This image will have the following naming scheme:

gcr.io/arrikto/jupyter-kale-py36@sha256:<IMAGE_TAG>

h. Check the volume information

The volume information should have been filled automatically:

i. Name your notebook

Specify a name for your notebook:

2. Create the notebook

Click LAUNCH to create the notebook:

3. Connect to the notebook

When the notebook is available, click CONNECT to connect to it:

Note that the notebook opens at the exact cell of the pipeline step you have initiated:

In the background, Kale has resumed the notebook’s state by importing all the libraries and loading the variables from the previous steps.