Skip to main content

Model Serialization and Versioning

Theory 45 min Module 2

Introduction

Training a model can take minutes, hours, or even days. Once you have a model that performs well, you need to save it so you can reuse it later — for prediction, deployment, or sharing with teammates. This process is called serialization.

Real-World Analogy

Serialization is like putting a cooked meal in the freezer. You spent hours preparing a complex stew (training the model). Instead of starting over each time, you freeze it (serialize) and reheat it (deserialize) when you need it. The dish retains all its flavors without having to cook again.


1. Why Serialize Models?

ReasonDetail
DeploymentA model in Python memory cannot serve an API. It must be saved to disk.
ReproducibilityBeing able to recreate the exact same predictions 6 months later.
CollaborationSharing a model with a colleague without asking them to retrain.
VersioningKeeping multiple versions and being able to roll back.
EfficiencyAvoiding retraining an expensive model at every server restart.

2. Pickle — The Python Standard

pickle is Python's native serialization module. It converts a Python object into a byte sequence and vice versa.

Save a Model with Pickle

import pickle
from sklearn.ensemble import RandomForestClassifier

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Serialize (save)
with open('model_rf.pkl', 'wb') as f:
pickle.dump(model, f)

print("Model saved successfully!")

Load a Model with Pickle

# Deserialize (load)
with open('model_rf.pkl', 'rb') as f:
loaded_model = pickle.load(f)

# Verify it works
predictions = loaded_model.predict(X_test)
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")

Save a Complete Pipeline

Pipeline, Not Just the Model!

In production, you must save the complete pipeline (preprocessing + model), not just the model. Otherwise, you will have to manually reproduce the preprocessing steps, which is error-prone.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipeline = Pipeline([
('scaler', StandardScaler()),
('clf', RandomForestClassifier(n_estimators=100, random_state=42))
])
pipeline.fit(X_train, y_train)

# Save the ENTIRE pipeline
with open('pipeline_rf.pkl', 'wb') as f:
pickle.dump(pipeline, f)

# Load and predict — no manual preprocessing needed!
with open('pipeline_rf.pkl', 'rb') as f:
loaded_pipeline = pickle.load(f)

predictions = loaded_pipeline.predict(X_test) # scaler + model applied

Pickle Security Risks

Critical Security Alert

NEVER load a pickle file from an untrusted source. pickle.load() can execute arbitrary code. A malicious pickle file can delete files, install malware, or steal data.

# ⚠️ EXAMPLE OF MALICIOUS PICKLE — DO NOT ACTUALLY USE
import pickle
import os

class MaliciousModel:
def __reduce__(self):
return (os.system, ('rm -rf /',)) # Deletes everything!

# If someone gives you this pickle file and you load it...
# pickle.load(malicious_file) → YOUR SYSTEM IS COMPROMISED

Security Rules:

  1. Never load a pickle from an unknown source
  2. Validate file integrity (SHA256 checksum)
  3. Use isolated environments (Docker containers)
  4. Consider safer alternatives (ONNX, joblib with caution)

3. Joblib — Optimized for Scientific Data

joblib is an alternative to pickle specifically optimized for objects containing large NumPy arrays (which is the case for most ML models).

Advantages of Joblib over Pickle

FeaturePickleJoblib
Speed (large arrays)Standard⚡ 2-10x faster
File sizeStandard📦 Built-in compression
Large NumPy objectsAverage performanceOptimized
Security⚠️ Risky⚠️ Similar to pickle
CompatibilityAny Python objectAny Python object

Using Joblib

import joblib

# Save model (no compression)
joblib.dump(model, 'model_rf.joblib')

# Save with compression (smaller file, slightly slower)
joblib.dump(model, 'model_rf_compressed.joblib', compress=3)

# Load model
loaded_model = joblib.load('model_rf.joblib')

# Verify
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")

Comparison of File Sizes

import os
import pickle
import joblib

# Save with different methods
with open('model_pickle.pkl', 'wb') as f:
pickle.dump(model, f)

joblib.dump(model, 'model_joblib.joblib')
joblib.dump(model, 'model_joblib_c3.joblib', compress=3)
joblib.dump(model, 'model_joblib_c9.joblib', compress=9)

# Compare sizes
files = ['model_pickle.pkl', 'model_joblib.joblib',
'model_joblib_c3.joblib', 'model_joblib_c9.joblib']

for f in files:
size_kb = os.path.getsize(f) / 1024
print(f"{f:30s}{size_kb:8.1f} KB")
When to Use Joblib?

Prefer joblib for scikit-learn models in general. The advantage is particularly significant for models with large internal arrays (Random Forest with many trees, large weight matrices, etc.).


4. ONNX — Cross-Platform Interoperability

ONNX (Open Neural Network Exchange) is an open format designed for model portability across frameworks and programming languages.

Convert a scikit-learn Model to ONNX

# pip install skl2onnx onnxruntime

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# Define input shape
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]

# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save ONNX model
with open('model_rf.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())

print("ONNX model saved successfully!")

Use an ONNX Model for Inference

import onnxruntime as ort
import numpy as np

# Load ONNX model
session = ort.InferenceSession('model_rf.onnx')

# Get input name
input_name = session.get_inputs()[0].name

# Run inference
X_test_float = X_test.astype(np.float32)
onnx_predictions = session.run(None, {input_name: X_test_float})

predicted_labels = onnx_predictions[0]
print(f"ONNX predictions (first 5): {predicted_labels[:5]}")

Advantages and Limitations of ONNX

AdvantageDetail
PortabilitySame model in Python, C++, JavaScript, mobile
PerformanceONNX Runtime is often faster than native scikit-learn
SecurityNo arbitrary code execution (unlike pickle)
StandardizationOpen format supported by Microsoft, Meta, AWS
LimitationDetail
ConversionNot all models/operations are supported
ComplexityMore complex setup than pickle/joblib
DebuggingHarder to inspect the model
PipelineComplex pipelines may not convert easily

5. Comparison Table — Serialization Formats

CriterionPickleJoblibONNX
Ease of use⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Performance (large models)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Security⭐⭐⭐⭐⭐
Portability⭐⭐ (Python only)⭐⭐ (Python only)⭐⭐⭐⭐⭐
Compression❌ Manual✅ Built-inN/A
Pipeline support✅ Full✅ Full⚠️ Partial
Fast inferenceStandardStandard⚡ Optimized
EcosystemPython standardscikit-learnMulti-framework
Use caseRapid prototypingProduction sklearn modelsCross-platform deployment
Recommendation
  • Development / Prototypingjoblib (simple and efficient)
  • Production Pythonjoblib with compression
  • Multi-language productionONNX
  • Avoid in production → raw pickle (prefer joblib)

6. Model Versioning Strategies

Why Version Models?

View Model Version History
SituationWithout VersioningWith Versioning
New model worse than the old one😱 Panic, impossible to revert😌 Rollback in 1 minute
Regulatory audit❌ Impossible to prove which model was active✅ Complete history
A/B Testing❌ Only one model possible✅ Compare v1 vs v2 in production
Bug in production😰 Which model is causing the bug?🔍 Exact version trace

Naming Convention

models/
├── iris_classifier_v1.0.0_2025-01-15.joblib
├── iris_classifier_v1.1.0_2025-02-01.joblib
├── iris_classifier_v2.0.0_2025-03-10.joblib
├── metadata/
│ ├── iris_classifier_v1.0.0_metadata.json
│ ├── iris_classifier_v1.1.0_metadata.json
│ └── iris_classifier_v2.0.0_metadata.json

Saving Models with Metadata

import json
import joblib
from datetime import datetime
from sklearn.metrics import accuracy_score, f1_score

def save_model_with_metadata(model, X_test, y_test, version, model_name):
"""Save a model alongside its metadata for tracking."""
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = f"{model_name}_v{version}_{timestamp}.joblib"

# Save model
joblib.dump(model, filename)

# Generate and save metadata
y_pred = model.predict(X_test)
metadata = {
"model_name": model_name,
"version": version,
"timestamp": timestamp,
"filename": filename,
"metrics": {
"accuracy": round(accuracy_score(y_test, y_pred), 4),
"f1_score": round(f1_score(y_test, y_pred, average='weighted'), 4),
},
"hyperparameters": model.get_params() if hasattr(model, 'get_params') else {},
"training_samples": len(X_test),
"python_version": "3.10",
"sklearn_version": "1.3.0",
}

metadata_file = f"{model_name}_v{version}_metadata.json"
with open(metadata_file, 'w') as f:
json.dump(metadata, f, indent=2, default=str)

print(f"✅ Model saved: {filename}")
print(f"📋 Metadata saved: {metadata_file}")
return filename, metadata

# Usage
save_model_with_metadata(
model=best_model,
X_test=X_test, y_test=y_test,
version="1.0.0",
model_name="iris_classifier"
)

7. MLflow — Model Registry and Tracking

MLflow is an open-source platform for managing the ML lifecycle: experiment tracking, model versioning, and deployment.

Basic MLflow Tracking

# pip install mlflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

mlflow.set_experiment("iris-classification")

with mlflow.start_run(run_name="random_forest_v1"):
# Log hyperparameters
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 5)
mlflow.log_param("random_state", 42)

# Train model
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')

# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)

# Log model
mlflow.sklearn.log_model(model, "model")

print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Accuracy: {accuracy:.4f}, F1: {f1:.4f}")

Loading a Model from MLflow

# Load by run ID
run_id = "abc123def456"
model_uri = f"runs:/{run_id}/model"
loaded_model = mlflow.sklearn.load_model(model_uri)

# Load from model registry (production stage)
model_uri = "models:/iris-classifier/Production"
production_model = mlflow.sklearn.load_model(model_uri)

predictions = production_model.predict(X_new)

MLflow Model Lifecycle

StageDescriptionWho accesses it
NoneModel registered but not yet evaluatedDeveloper
StagingBeing validated, integration testingQA Team
ProductionServing predictions in real timeAPI / Users
ArchivedRetired, kept for audit and historyArchive

8. Saving Complete Pipelines

Critical Point

The biggest source of bugs in ML production comes from inconsistency between training preprocessing and production preprocessing. Saving the complete pipeline eliminates this risk.

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import joblib

# Define preprocessing for different column types
numeric_features = ['age', 'income', 'credit_score']
categorical_features = ['gender', 'education', 'employment']

preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
]
)

# Build complete pipeline
full_pipeline = Pipeline([
('preprocessor', preprocessor),
('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
])

# Train on raw data
full_pipeline.fit(X_train_raw, y_train)

# Save EVERYTHING in one file
joblib.dump(full_pipeline, 'full_pipeline_v1.0.0.joblib')

# In production: load and predict on raw input
pipeline = joblib.load('full_pipeline_v1.0.0.joblib')
predictions = pipeline.predict(new_raw_data) # preprocessing handled automatically

9. File Size Considerations

ModelTypical SizeRecommended Format
Logistic Regression1-10 KBpickle / joblib
SVM (small dataset)10-100 KBjoblib
Random Forest (100 trees)1-50 MBjoblib compress=3
Random Forest (1000 trees)50-500 MBjoblib compress=9
Deep Learning (ResNet50)100-300 MBONNX
LLM (GPT-like)1-100 GBSpecialized formats
import os
import joblib

# Check file size before deployment
model_file = 'full_pipeline_v1.0.0.joblib'
size_mb = os.path.getsize(model_file) / (1024 * 1024)

print(f"Model file size: {size_mb:.1f} MB")

if size_mb > 100:
print("⚠️ Model is large. Consider:")
print(" - Reducing n_estimators")
print(" - Using compression: joblib.dump(model, file, compress=9)")
print(" - Converting to ONNX for optimized runtime")
elif size_mb > 500:
print("🚨 Model too large for most API deployments")
print(" - Consider model distillation or pruning")

Summary

🔑 Key Takeaways
  1. Always serialize the complete pipeline (preprocessing + model), not just the model.
  2. Pickle: Simple but dangerous. Never load from an untrusted source.
  3. Joblib: Preferred for scikit-learn. Built-in compression, optimized for large arrays.
  4. ONNX: For cross-platform deployment and fast inference. More secure.
  5. Version your models with naming conventions and metadata (metrics, hyperparameters, date).
  6. MLflow: The reference tool for experiment tracking and model registry.
  7. File size: Check before deployment. Compress if necessary.
  8. Metadata: Always save dependency versions (Python, sklearn) with the model.

Further Reading

ResourceLink
Python pickle documentationdocs.python.org/3/library/pickle
Joblib documentationjoblib.readthedocs.io
ONNX Runtimeonnxruntime.ai
skl2onnx documentationonnx.ai/sklearn-onnx
MLflow documentationmlflow.org/docs/latest