# Install dependencies if running in Google Colab
try:
import google.colab
!pip install mlflow scikit-learn pandas matplotlib
except ImportError:
passModel Packaging and Signatures¶
In a production pipeline, a model is more than just a binary file. It needs a Signature (input/output schema) and sometimes custom code to handle preprocessing.
1. Model Signatures and Input Examples¶
A signature tells the deployment server exactly what data types to expect. This prevents “garbage in, garbage out” errors.
import mlflow
import pandas as pd
from sklearn.linear_model import LinearRegression
from mlflow.models import infer_signature
X = pd.DataFrame({"age": [25, 30, 35], "income": [50000, 60000, 70000]})
y = pd.Series([2000, 2500, 3000])
model = LinearRegression().fit(X, y)
# Infer the signature
signature = infer_signature(X, model.predict(X))
with mlflow.start_run():
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="housing_model",
signature=signature,
input_example=X.iloc[:1] # Log the first row as an example
)2. Custom PyFunc Models¶
Sometimes your model requires a special preprocessing step (e.g., standardizing text or handling missing values) that isn’t part of the sklearn pipeline. You can wrap it in an mlflow.pyfunc.
class CustomModelWrapper(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Load additional data like vocabularies if needed
pass
def predict(self, context, model_input):
# Custom logic: Preprocessing -> Prediction -> Postprocessing
processed_input = model_input.apply(lambda x: x * 1.05) # dummy preprocessing
return self.model.predict(processed_input)
# Log the custom wrapper
# mlflow.pyfunc.log_model(artifact_path="custom_model", python_model=CustomModelWrapper())