# Install dependencies if running in Google Colab
try:
    import google.colab
    !pip install mlflow scikit-learn pandas matplotlib
except ImportError:
    pass

Model Packaging and Signatures¶

In a production pipeline, a model is more than just a binary file. It needs a Signature (input/output schema) and sometimes custom code to handle preprocessing.

1. Model Signatures and Input Examples¶

A signature tells the deployment server exactly what data types to expect. This prevents “garbage in, garbage out” errors.

import mlflow
import pandas as pd
from sklearn.linear_model import LinearRegression
from mlflow.models import infer_signature

X = pd.DataFrame({"age": [25, 30, 35], "income": [50000, 60000, 70000]})
y = pd.Series([2000, 2500, 3000])

model = LinearRegression().fit(X, y)

# Infer the signature
signature = infer_signature(X, model.predict(X))

with mlflow.start_run():
    mlflow.sklearn.log_model(
        sk_model=model, 
        artifact_path="housing_model",
        signature=signature,
        input_example=X.iloc[:1] # Log the first row as an example
    )

2. Custom PyFunc Models¶

Sometimes your model requires a special preprocessing step (e.g., standardizing text or handling missing values) that isn’t part of the sklearn pipeline. You can wrap it in an mlflow.pyfunc.

class CustomModelWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        # Load additional data like vocabularies if needed
        pass

    def predict(self, context, model_input):
        # Custom logic: Preprocessing -> Prediction -> Postprocessing
        processed_input = model_input.apply(lambda x: x * 1.05) # dummy preprocessing
        return self.model.predict(processed_input)

# Log the custom wrapper
# mlflow.pyfunc.log_model(artifact_path="custom_model", python_model=CustomModelWrapper())