Model Serialization and Versioning

Theory 45 min Module 2

Introduction

Training a model can take minutes, hours, or even days. Once you have a model that performs well, you need to save it so you can reuse it later — for prediction, deployment, or sharing with teammates. This process is called serialization.

Real-World Analogy

Serialization is like putting a cooked meal in the freezer. You spent hours preparing a complex stew (training the model). Instead of starting over each time, you freeze it (serialize) and reheat it (deserialize) when you need it. The dish retains all its flavors without having to cook again.

1. Why Serialize Models?

Reason	Detail
Deployment	A model in Python memory cannot serve an API. It must be saved to disk.
Reproducibility	Being able to recreate the exact same predictions 6 months later.
Collaboration	Sharing a model with a colleague without asking them to retrain.
Versioning	Keeping multiple versions and being able to roll back.
Efficiency	Avoiding retraining an expensive model at every server restart.

2. Pickle — The Python Standard

pickle is Python's native serialization module. It converts a Python object into a byte sequence and vice versa.

Save a Model with Pickle

import pickle
from sklearn.ensemble import RandomForestClassifier

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Serialize (save)
with open('model_rf.pkl', 'wb') as f:
    pickle.dump(model, f)

print("Model saved successfully!")

Load a Model with Pickle

# Deserialize (load)
with open('model_rf.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

# Verify it works
predictions = loaded_model.predict(X_test)
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")

Save a Complete Pipeline

Pipeline, Not Just the Model!

In production, you must save the complete pipeline (preprocessing + model), not just the model. Otherwise, you will have to manually reproduce the preprocessing steps, which is error-prone.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(n_estimators=100, random_state=42))
])
pipeline.fit(X_train, y_train)

# Save the ENTIRE pipeline
with open('pipeline_rf.pkl', 'wb') as f:
    pickle.dump(pipeline, f)

# Load and predict — no manual preprocessing needed!
with open('pipeline_rf.pkl', 'rb') as f:
    loaded_pipeline = pickle.load(f)

predictions = loaded_pipeline.predict(X_test)  # scaler + model applied

Pickle Security Risks

Critical Security Alert

NEVER load a pickle file from an untrusted source. pickle.load() can execute arbitrary code. A malicious pickle file can delete files, install malware, or steal data.

# ⚠️ EXAMPLE OF MALICIOUS PICKLE — DO NOT ACTUALLY USE
import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        return (os.system, ('rm -rf /',))  # Deletes everything!

# If someone gives you this pickle file and you load it...
# pickle.load(malicious_file) → YOUR SYSTEM IS COMPROMISED

Security Rules:

Never load a pickle from an unknown source
Validate file integrity (SHA256 checksum)
Use isolated environments (Docker containers)
Consider safer alternatives (ONNX, joblib with caution)

3. Joblib — Optimized for Scientific Data

joblib is an alternative to pickle specifically optimized for objects containing large NumPy arrays (which is the case for most ML models).

Advantages of Joblib over Pickle

Feature	Pickle	Joblib
Speed (large arrays)	Standard	⚡ 2-10x faster
File size	Standard	📦 Built-in compression
Large NumPy objects	Average performance	Optimized
Security	⚠️ Risky	⚠️ Similar to pickle
Compatibility	Any Python object	Any Python object

Using Joblib

import joblib

# Save model (no compression)
joblib.dump(model, 'model_rf.joblib')

# Save with compression (smaller file, slightly slower)
joblib.dump(model, 'model_rf_compressed.joblib', compress=3)

# Load model
loaded_model = joblib.load('model_rf.joblib')

# Verify
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")

Comparison of File Sizes

import os
import pickle
import joblib

# Save with different methods
with open('model_pickle.pkl', 'wb') as f:
    pickle.dump(model, f)

joblib.dump(model, 'model_joblib.joblib')
joblib.dump(model, 'model_joblib_c3.joblib', compress=3)
joblib.dump(model, 'model_joblib_c9.joblib', compress=9)

# Compare sizes
files = ['model_pickle.pkl', 'model_joblib.joblib',
         'model_joblib_c3.joblib', 'model_joblib_c9.joblib']

for f in files:
    size_kb = os.path.getsize(f) / 1024
    print(f"{f:30s} → {size_kb:8.1f} KB")

When to Use Joblib?

Prefer joblib for scikit-learn models in general. The advantage is particularly significant for models with large internal arrays (Random Forest with many trees, large weight matrices, etc.).

4. ONNX — Cross-Platform Interoperability

ONNX (Open Neural Network Exchange) is an open format designed for model portability across frameworks and programming languages.

Convert a scikit-learn Model to ONNX

# pip install skl2onnx onnxruntime

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Define input shape
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]

# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save ONNX model
with open('model_rf.onnx', 'wb') as f:
    f.write(onnx_model.SerializeToString())

print("ONNX model saved successfully!")

Use an ONNX Model for Inference

import onnxruntime as ort
import numpy as np

# Load ONNX model
session = ort.InferenceSession('model_rf.onnx')

# Get input name
input_name = session.get_inputs()[0].name

# Run inference
X_test_float = X_test.astype(np.float32)
onnx_predictions = session.run(None, {input_name: X_test_float})

predicted_labels = onnx_predictions[0]
print(f"ONNX predictions (first 5): {predicted_labels[:5]}")

Advantages and Limitations of ONNX

Advantage	Detail
Portability	Same model in Python, C++, JavaScript, mobile
Performance	ONNX Runtime is often faster than native scikit-learn
Security	No arbitrary code execution (unlike pickle)
Standardization	Open format supported by Microsoft, Meta, AWS

Limitation	Detail
Conversion	Not all models/operations are supported
Complexity	More complex setup than pickle/joblib
Debugging	Harder to inspect the model
Pipeline	Complex pipelines may not convert easily

5. Comparison Table — Serialization Formats

Criterion	Pickle	Joblib	ONNX
Ease of use	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Performance (large models)	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Security	⭐	⭐	⭐⭐⭐⭐⭐
Portability	⭐⭐ (Python only)	⭐⭐ (Python only)	⭐⭐⭐⭐⭐
Compression	❌ Manual	✅ Built-in	N/A
Pipeline support	✅ Full	✅ Full	⚠️ Partial
Fast inference	Standard	Standard	⚡ Optimized
Ecosystem	Python standard	scikit-learn	Multi-framework
Use case	Rapid prototyping	Production sklearn models	Cross-platform deployment

Recommendation

Development / Prototyping → joblib (simple and efficient)
Production Python → joblib with compression
Multi-language production → ONNX
Avoid in production → raw pickle (prefer joblib)

6. Model Versioning Strategies

Why Version Models?

View Model Version History

Situation	Without Versioning	With Versioning
New model worse than the old one	😱 Panic, impossible to revert	😌 Rollback in 1 minute
Regulatory audit	❌ Impossible to prove which model was active	✅ Complete history
A/B Testing	❌ Only one model possible	✅ Compare v1 vs v2 in production
Bug in production	😰 Which model is causing the bug?	🔍 Exact version trace

Naming Convention

models/
├── iris_classifier_v1.0.0_2025-01-15.joblib
├── iris_classifier_v1.1.0_2025-02-01.joblib
├── iris_classifier_v2.0.0_2025-03-10.joblib
├── metadata/
│   ├── iris_classifier_v1.0.0_metadata.json
│   ├── iris_classifier_v1.1.0_metadata.json
│   └── iris_classifier_v2.0.0_metadata.json

Saving Models with Metadata

import json
import joblib
from datetime import datetime
from sklearn.metrics import accuracy_score, f1_score

def save_model_with_metadata(model, X_test, y_test, version, model_name):
    """Save a model alongside its metadata for tracking."""
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    filename = f"{model_name}_v{version}_{timestamp}.joblib"

    # Save model
    joblib.dump(model, filename)

    # Generate and save metadata
    y_pred = model.predict(X_test)
    metadata = {
        "model_name": model_name,
        "version": version,
        "timestamp": timestamp,
        "filename": filename,
        "metrics": {
            "accuracy": round(accuracy_score(y_test, y_pred), 4),
            "f1_score": round(f1_score(y_test, y_pred, average='weighted'), 4),
        },
        "hyperparameters": model.get_params() if hasattr(model, 'get_params') else {},
        "training_samples": len(X_test),
        "python_version": "3.10",
        "sklearn_version": "1.3.0",
    }

    metadata_file = f"{model_name}_v{version}_metadata.json"
    with open(metadata_file, 'w') as f:
        json.dump(metadata, f, indent=2, default=str)

    print(f"✅ Model saved: {filename}")
    print(f"📋 Metadata saved: {metadata_file}")
    return filename, metadata

# Usage
save_model_with_metadata(
    model=best_model,
    X_test=X_test, y_test=y_test,
    version="1.0.0",
    model_name="iris_classifier"
)

7. MLflow — Model Registry and Tracking

MLflow is an open-source platform for managing the ML lifecycle: experiment tracking, model versioning, and deployment.

Basic MLflow Tracking

# pip install mlflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

mlflow.set_experiment("iris-classification")

with mlflow.start_run(run_name="random_forest_v1"):
    # Log hyperparameters
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 5)
    mlflow.log_param("random_state", 42)

    # Train model
    model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
    model.fit(X_train, y_train)

    # Evaluate
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')

    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("f1_score", f1)

    # Log model
    mlflow.sklearn.log_model(model, "model")

    print(f"Run ID: {mlflow.active_run().info.run_id}")
    print(f"Accuracy: {accuracy:.4f}, F1: {f1:.4f}")

Loading a Model from MLflow

# Load by run ID
run_id = "abc123def456"
model_uri = f"runs:/{run_id}/model"
loaded_model = mlflow.sklearn.load_model(model_uri)

# Load from model registry (production stage)
model_uri = "models:/iris-classifier/Production"
production_model = mlflow.sklearn.load_model(model_uri)

predictions = production_model.predict(X_new)

MLflow Model Lifecycle

Stage	Description	Who accesses it
None	Model registered but not yet evaluated	Developer
Staging	Being validated, integration testing	QA Team
Production	Serving predictions in real time	API / Users
Archived	Retired, kept for audit and history	Archive

8. Saving Complete Pipelines

Critical Point

The biggest source of bugs in ML production comes from inconsistency between training preprocessing and production preprocessing. Saving the complete pipeline eliminates this risk.

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import joblib

# Define preprocessing for different column types
numeric_features = ['age', 'income', 'credit_score']
categorical_features = ['gender', 'education', 'employment']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ]
)

# Build complete pipeline
full_pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
])

# Train on raw data
full_pipeline.fit(X_train_raw, y_train)

# Save EVERYTHING in one file
joblib.dump(full_pipeline, 'full_pipeline_v1.0.0.joblib')

# In production: load and predict on raw input
pipeline = joblib.load('full_pipeline_v1.0.0.joblib')
predictions = pipeline.predict(new_raw_data)  # preprocessing handled automatically

9. File Size Considerations

Model	Typical Size	Recommended Format
Logistic Regression	1-10 KB	pickle / joblib
SVM (small dataset)	10-100 KB	joblib
Random Forest (100 trees)	1-50 MB	joblib compress=3
Random Forest (1000 trees)	50-500 MB	joblib compress=9
Deep Learning (ResNet50)	100-300 MB	ONNX
LLM (GPT-like)	1-100 GB	Specialized formats

import os
import joblib

# Check file size before deployment
model_file = 'full_pipeline_v1.0.0.joblib'
size_mb = os.path.getsize(model_file) / (1024 * 1024)

print(f"Model file size: {size_mb:.1f} MB")

if size_mb > 100:
    print("⚠️  Model is large. Consider:")
    print("  - Reducing n_estimators")
    print("  - Using compression: joblib.dump(model, file, compress=9)")
    print("  - Converting to ONNX for optimized runtime")
elif size_mb > 500:
    print("🚨 Model too large for most API deployments")
    print("  - Consider model distillation or pruning")

Summary

🔑 Key Takeaways

Always serialize the complete pipeline (preprocessing + model), not just the model.
Pickle: Simple but dangerous. Never load from an untrusted source.
Joblib: Preferred for scikit-learn. Built-in compression, optimized for large arrays.
ONNX: For cross-platform deployment and fast inference. More secure.
Version your models with naming conventions and metadata (metrics, hyperparameters, date).
MLflow: The reference tool for experiment tracking and model registry.
File size: Check before deployment. Compress if necessary.
Metadata: Always save dependency versions (Python, sklearn) with the model.

Resource	Link
Python pickle documentation	docs.python.org/3/library/pickle
Joblib documentation	joblib.readthedocs.io
ONNX Runtime	onnxruntime.ai
skl2onnx documentation	onnx.ai/sklearn-onnx
MLflow documentation	mlflow.org/docs/latest

Introduction​

1. Why Serialize Models?​

2. Pickle — The Python Standard​

Save a Model with Pickle​

Load a Model with Pickle​

Save a Complete Pipeline​

Pickle Security Risks​

3. Joblib — Optimized for Scientific Data​

Advantages of Joblib over Pickle​

Using Joblib​

Comparison of File Sizes​

4. ONNX — Cross-Platform Interoperability​

Convert a scikit-learn Model to ONNX​

Use an ONNX Model for Inference​

Advantages and Limitations of ONNX​

5. Comparison Table — Serialization Formats​

6. Model Versioning Strategies​

Why Version Models?​

Naming Convention​

Saving Models with Metadata​

7. MLflow — Model Registry and Tracking​

Basic MLflow Tracking​

Loading a Model from MLflow​

MLflow Model Lifecycle​

8. Saving Complete Pipelines​

9. File Size Considerations​

Summary​

Further Reading​