MLOps 101 with Kubeflow and Vertex AI

Raju Dawadi
5 min readOct 31, 2024

--

kubeflow+vertexAI

If you’ve ever tried taking an ML model from your local environment to production, you know it’s not a simple copy-paste. MLOps brings some much-needed structure to that process, and Kubeflow makes it all a lot easier. Today, I’ll share how I’ve been building ML pipelines using Kubeflow and some tricks I’ve picked up along the way.

Why Kubeflow?

Kubeflow is like a toolbox for ML workflows on Kubernetes where we get tools for everything: building data pipelines, training models, deploying them, and even monitoring how they’re performing in production. One of the best parts? Kubeflow Pipelines, which helps you create modular, reusable workflows.

Building the Pipeline

I have got a simple example of pipeline which has following four components:

  1. Preprocess: The first component simply pulls few lines of dataset from huggingface and write it to an output file
  2. Train: This step reads the file from path provided by Preprocess and writes it to another file
  3. Evaluate: This step evaluates the model by taking the artifact input and also outputs the metrics
  4. Deploy: This step is for deployment of model

Pipeline File

In high level, the pipeline file looks like below which calls the respective component with input/output paths. The pipeline.yaml file which is our main pipeline for kubflow is generated by simply executing the python file.

import kfp
from kfp import dsl, local
from kfp.dsl import InputPath, OutputPath

# Define the preprocess component with a Docker image
@dsl.component(
base_image='dwdraju/mlops:kubeflow-pipeline-v10'
)
def preprocess_component(output_data_path: OutputPath(str)):
import subprocess
import os

# Ensure the output directory exists
os.makedirs(os.path.dirname(output_data_path), exist_ok=True)

# Run the preprocess.py script, passing the output path
result = subprocess.run(['python', 'preprocess.py', '--output', output_data_path])

# Check if the script executed correctly
if result.returncode != 0:
raise RuntimeError(f"Preprocessing failed with error: {result.stderr}")

print(f"Preprocessing completed. Output saved to {output_data_path}")

# Define the train component with a Docker image
@dsl.component(
base_image='dwdraju/mlops:kubeflow-pipeline-v10'
)
def train_component(input_data_path: InputPath(str), model_output_path: OutputPath(str)):
import subprocess
subprocess.run(['python', 'train.py', '--input', input_data_path, '--output', model_output_path])

# Define the evaluate component with a Docker image
@dsl.component(
base_image='dwdraju/mlops:kubeflow-pipeline-v10'
)
def evaluate_component(model_path: InputPath(str), metrics_output_path: OutputPath(str)):
import subprocess
subprocess.run(['python', 'evaluate.py', '--model', model_path, '--metrics', metrics_output_path])

# Define the deploy component with a Docker image
@dsl.component(
base_image='dwdraju/mlops:kubeflow-pipeline-v10'
)
def deploy_component(model_path: InputPath(str)):
import subprocess
subprocess.run(['python', 'deploy.py', '--model', model_path])

# Define the pipeline
@dsl.pipeline(
name="HF NLP Pipeline",
description="Pipeline for fine-tuning a Hugging Face model."
)
def pipeline():
# Create the preprocess step
preprocess_task = preprocess_component()
output_data_path = preprocess_task.outputs['output_data_path']

# Create the train step that depends on the preprocess step
train_task = train_component(
input_data_path=output_data_path
)

# Create the evaluate step that depends on the train step
evaluate_task = evaluate_component(
model_path=train_task.output
)

# Create the deploy step that depends on the evaluate step
deploy_task = deploy_component(
model_path=train_task.output
)

# Compile the pipeline
if __name__ == '__main__':
kfp.compiler.Compiler().compile(pipeline, 'pipeline.yaml')

Here, I have used docker image to ease shipping code from code repository to kubeflow with following Dockerfile:

# Use a lightweight Python base image
FROM python:3.12-slim

# Set the working directory inside the container
WORKDIR /app

# Install any necessary Python packages
RUN pip install --no-cache-dir kfp datasets

# Copy all necessary files into the container
COPY pipeline.py preprocess.py train.py evaluate.py deploy.py ./

Kubeflow Setup

I did the setup both in linux(ubuntu — 8 core, 32GB) and Docker Desktop on Mac(M1) with pretty straight forward steps through helm/kustomize manifest.

  1. Linux(Ubuntu): I have created a gist with all the installation steps which includes increasing file system events, docker installation, creating cluster with kind and accessing the cluster: https://gist.github.com/dwdraju/042950bde69d6c5a9ee67365dfa1f77b
  2. Docker Desktop on Mac(M1): The default docker app installation was set to use 8GB of memory which is not sufficient, so I had to increase it to 16GB. Also, changed the istio-ingress service type to NodePort from ClusterIP

Initiating Pipeline

After the installation, the kubeflow dashboard can be accessed from http://localhost:31430/ and by entering the default username/password, we can see nice dashboard.

kubeflow dashboard

Upload Pipeline

First step would be to create a pipeline. Head over to Pipelines -> Upload Pipeline and after giving suitable name, Upload the pipeline yaml file generated from above step. The pipeline visualization looks like this:

kubeflow pipeline visualization

Create Experiment​

Now we are executing the pipeline for first time. Click the “Create experiment” and give suitable name to it. In the next step we have few custom options:

  1. Service account: Which service account to use in our Kubernetes cluster
  2. Run type: Once or recurring(could be periodic or cron syntax)
  3. Parameters: If the pipeline expect us to pass any extra parameters

For first run, we can simply select the run type as “once” and leave other fields default. As soon as the experiment is submitted, we will be seeing a flow in the the pipeline visualization where we can see the log of each step, any artifact if produced by any step and success/failure.

pipeline experiment in kubeflow

By this way, we executed our first simple kubeflow pipeline in experiment. The final step would be to start a run by associating the experiment so that the run will be equivalent of the experiment but with new data if passed.

Executing Locally

If we follow all of the above steps each time through dashboard, it will be time consuming. So, there is local execution of kubeflow as well. By simply initializing local runner which could be SubprocessRunner or DockerRunner, we can call respective component locally and debug.

local.init(runner=local.SubprocessRunner())

Making it Simpler with Vertex AI in GCP

There is serverless approach for all the kubeflow steps which can be achieved by Vertex AI pipeline in Google Cloud Platform. By simply uploading the same pipeline yaml file, we can execute the pipeline with zero setup in serverless environment.

Vertex AI

It took around 5 minutes to complete the simple pipeline among which most of the time was consumed waiting for the background resource to be ready. Just as other Google Cloud Platform services, the logs can be observed or filtered on Logs Explorer.

That’s it for this post. If you are interested to stay in sync with me, feel free to connect on Linkedin, Twitter.

--

--

Raju Dawadi
Raju Dawadi

No responses yet