Serving Models for Production - MLflow for Beginners

Once a model is in the registry, you need to expose it as an API. MLflow makes this easy with built-in serving tools.

1. Local HTTP Serving¶

You can spin up a REST API server with a single command. This is great for testing or low-latency local applications.

mlflow models serve -m "models:/Iris_Classifier@champion" --port 5001 --no-conda

2. Querying the Model¶

Use curl or Python’s requests library to get predictions.

import requests
import pandas as pd

data = {
    "dataframe_split": {
        "columns": ["age", "income"],
        "data": [[28, 55000]]
    }
}

response = requests.post("http://127.0.0.1:5001/invocations", json=data)
print(f"Prediction: {response.json()}")

3. Docker Deployment¶

For industrial-scale deployment (Kubernetes, AWS SageMaker), you should use Docker. MLflow can generate a Dockerfile for you.

mlflow models build-docker -m "models:/Iris_Classifier@champion" -n "iris-classifier-image"

Then run it:

docker run -p 8080:8080 iris-classifier-image