Model Serialization and Versioning
Introduction
Training a model can take minutes, hours, or even days. Once you have a model that performs well, you need to save it so you can reuse it later — for prediction, deployment, or sharing with teammates. This process is called serialization.
Serialization is like putting a cooked meal in the freezer. You spent hours preparing a complex stew (training the model). Instead of starting over each time, you freeze it (serialize) and reheat it (deserialize) when you need it. The dish retains all its flavors without having to cook again.
1. Why Serialize Models?
| Reason | Detail |
|---|---|
| Deployment | A model in Python memory cannot serve an API. It must be saved to disk. |
| Reproducibility | Being able to recreate the exact same predictions 6 months later. |
| Collaboration | Sharing a model with a colleague without asking them to retrain. |
| Versioning | Keeping multiple versions and being able to roll back. |
| Efficiency | Avoiding retraining an expensive model at every server restart. |
2. Pickle — The Python Standard
pickle is Python's native serialization module. It converts a Python object into a byte sequence and vice versa.
Save a Model with Pickle
import pickle
from sklearn.ensemble import RandomForestClassifier
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Serialize (save)
with open('model_rf.pkl', 'wb') as f:
pickle.dump(model, f)
print("Model saved successfully!")
Load a Model with Pickle
# Deserialize (load)
with open('model_rf.pkl', 'rb') as f:
loaded_model = pickle.load(f)
# Verify it works
predictions = loaded_model.predict(X_test)
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")
Save a Complete Pipeline
In production, you must save the complete pipeline (preprocessing + model), not just the model. Otherwise, you will have to manually reproduce the preprocessing steps, which is error-prone.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
('scaler', StandardScaler()),
('clf', RandomForestClassifier(n_estimators=100, random_state=42))
])
pipeline.fit(X_train, y_train)
# Save the ENTIRE pipeline
with open('pipeline_rf.pkl', 'wb') as f:
pickle.dump(pipeline, f)
# Load and predict — no manual preprocessing needed!
with open('pipeline_rf.pkl', 'rb') as f:
loaded_pipeline = pickle.load(f)
predictions = loaded_pipeline.predict(X_test) # scaler + model applied
Pickle Security Risks
NEVER load a pickle file from an untrusted source. pickle.load() can execute arbitrary code. A malicious pickle file can delete files, install malware, or steal data.
# ⚠️ EXAMPLE OF MALICIOUS PICKLE — DO NOT ACTUALLY USE
import pickle
import os
class MaliciousModel:
def __reduce__(self):
return (os.system, ('rm -rf /',)) # Deletes everything!
# If someone gives you this pickle file and you load it...
# pickle.load(malicious_file) → YOUR SYSTEM IS COMPROMISED
Security Rules:
- Never load a pickle from an unknown source
- Validate file integrity (SHA256 checksum)
- Use isolated environments (Docker containers)
- Consider safer alternatives (ONNX, joblib with caution)
3. Joblib — Optimized for Scientific Data
joblib is an alternative to pickle specifically optimized for objects containing large NumPy arrays (which is the case for most ML models).
Advantages of Joblib over Pickle
| Feature | Pickle | Joblib |
|---|---|---|
| Speed (large arrays) | Standard | ⚡ 2-10x faster |
| File size | Standard | 📦 Built-in compression |
| Large NumPy objects | Average performance | Optimized |
| Security | ⚠️ Risky | ⚠️ Similar to pickle |
| Compatibility | Any Python object | Any Python object |
Using Joblib
import joblib
# Save model (no compression)
joblib.dump(model, 'model_rf.joblib')
# Save with compression (smaller file, slightly slower)
joblib.dump(model, 'model_rf_compressed.joblib', compress=3)
# Load model
loaded_model = joblib.load('model_rf.joblib')
# Verify
print(f"Loaded model accuracy: {loaded_model.score(X_test, y_test):.4f}")
Comparison of File Sizes
import os
import pickle
import joblib
# Save with different methods
with open('model_pickle.pkl', 'wb') as f:
pickle.dump(model, f)
joblib.dump(model, 'model_joblib.joblib')
joblib.dump(model, 'model_joblib_c3.joblib', compress=3)
joblib.dump(model, 'model_joblib_c9.joblib', compress=9)
# Compare sizes
files = ['model_pickle.pkl', 'model_joblib.joblib',
'model_joblib_c3.joblib', 'model_joblib_c9.joblib']
for f in files:
size_kb = os.path.getsize(f) / 1024
print(f"{f:30s} → {size_kb:8.1f} KB")
Prefer joblib for scikit-learn models in general. The advantage is particularly significant for models with large internal arrays (Random Forest with many trees, large weight matrices, etc.).
4. ONNX — Cross-Platform Interoperability
ONNX (Open Neural Network Exchange) is an open format designed for model portability across frameworks and programming languages.
Convert a scikit-learn Model to ONNX
# pip install skl2onnx onnxruntime
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Define input shape
initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]
# Convert model to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)
# Save ONNX model
with open('model_rf.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())
print("ONNX model saved successfully!")
Use an ONNX Model for Inference
import onnxruntime as ort
import numpy as np
# Load ONNX model
session = ort.InferenceSession('model_rf.onnx')
# Get input name
input_name = session.get_inputs()[0].name
# Run inference
X_test_float = X_test.astype(np.float32)
onnx_predictions = session.run(None, {input_name: X_test_float})
predicted_labels = onnx_predictions[0]
print(f"ONNX predictions (first 5): {predicted_labels[:5]}")
Advantages and Limitations of ONNX
| Advantage | Detail |
|---|---|
| Portability | Same model in Python, C++, JavaScript, mobile |
| Performance | ONNX Runtime is often faster than native scikit-learn |
| Security | No arbitrary code execution (unlike pickle) |
| Standardization | Open format supported by Microsoft, Meta, AWS |
| Limitation | Detail |
|---|---|
| Conversion | Not all models/operations are supported |
| Complexity | More complex setup than pickle/joblib |
| Debugging | Harder to inspect the model |
| Pipeline | Complex pipelines may not convert easily |
5. Comparison Table — Serialization Formats
| Criterion | Pickle | Joblib | ONNX |
|---|---|---|---|
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Performance (large models) | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Security | ⭐ | ⭐ | ⭐⭐⭐⭐⭐ |
| Portability | ⭐⭐ (Python only) | ⭐⭐ (Python only) | ⭐⭐⭐⭐⭐ |
| Compression | ❌ Manual | ✅ Built-in | N/A |
| Pipeline support | ✅ Full | ✅ Full | ⚠️ Partial |
| Fast inference | Standard | Standard | ⚡ Optimized |
| Ecosystem | Python standard | scikit-learn | Multi-framework |
| Use case | Rapid prototyping | Production sklearn models | Cross-platform deployment |
- Development / Prototyping →
joblib(simple and efficient) - Production Python →
joblibwith compression - Multi-language production →
ONNX - Avoid in production → raw
pickle(prefer joblib)
6. Model Versioning Strategies
Why Version Models?
View Model Version History
| Situation | Without Versioning | With Versioning |
|---|---|---|
| New model worse than the old one | 😱 Panic, impossible to revert | 😌 Rollback in 1 minute |
| Regulatory audit | ❌ Impossible to prove which model was active | ✅ Complete history |
| A/B Testing | ❌ Only one model possible | ✅ Compare v1 vs v2 in production |
| Bug in production | 😰 Which model is causing the bug? | 🔍 Exact version trace |
Naming Convention
models/
├── iris_classifier_v1.0.0_2025-01-15.joblib
├── iris_classifier_v1.1.0_2025-02-01.joblib
├── iris_classifier_v2.0.0_2025-03-10.joblib
├── metadata/
│ ├── iris_classifier_v1.0.0_metadata.json
│ ├── iris_classifier_v1.1.0_metadata.json
│ └── iris_classifier_v2.0.0_metadata.json
Saving Models with Metadata
import json
import joblib
from datetime import datetime
from sklearn.metrics import accuracy_score, f1_score
def save_model_with_metadata(model, X_test, y_test, version, model_name):
"""Save a model alongside its metadata for tracking."""
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = f"{model_name}_v{version}_{timestamp}.joblib"
# Save model
joblib.dump(model, filename)
# Generate and save metadata
y_pred = model.predict(X_test)
metadata = {
"model_name": model_name,
"version": version,
"timestamp": timestamp,
"filename": filename,
"metrics": {
"accuracy": round(accuracy_score(y_test, y_pred), 4),
"f1_score": round(f1_score(y_test, y_pred, average='weighted'), 4),
},
"hyperparameters": model.get_params() if hasattr(model, 'get_params') else {},
"training_samples": len(X_test),
"python_version": "3.10",
"sklearn_version": "1.3.0",
}
metadata_file = f"{model_name}_v{version}_metadata.json"
with open(metadata_file, 'w') as f:
json.dump(metadata, f, indent=2, default=str)
print(f"✅ Model saved: {filename}")
print(f"📋 Metadata saved: {metadata_file}")
return filename, metadata
# Usage
save_model_with_metadata(
model=best_model,
X_test=X_test, y_test=y_test,
version="1.0.0",
model_name="iris_classifier"
)
7. MLflow — Model Registry and Tracking
MLflow is an open-source platform for managing the ML lifecycle: experiment tracking, model versioning, and deployment.
Basic MLflow Tracking
# pip install mlflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
mlflow.set_experiment("iris-classification")
with mlflow.start_run(run_name="random_forest_v1"):
# Log hyperparameters
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 5)
mlflow.log_param("random_state", 42)
# Train model
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Log model
mlflow.sklearn.log_model(model, "model")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Accuracy: {accuracy:.4f}, F1: {f1:.4f}")
Loading a Model from MLflow
# Load by run ID
run_id = "abc123def456"
model_uri = f"runs:/{run_id}/model"
loaded_model = mlflow.sklearn.load_model(model_uri)
# Load from model registry (production stage)
model_uri = "models:/iris-classifier/Production"
production_model = mlflow.sklearn.load_model(model_uri)
predictions = production_model.predict(X_new)
MLflow Model Lifecycle
| Stage | Description | Who accesses it |
|---|---|---|
| None | Model registered but not yet evaluated | Developer |
| Staging | Being validated, integration testing | QA Team |
| Production | Serving predictions in real time | API / Users |
| Archived | Retired, kept for audit and history | Archive |
8. Saving Complete Pipelines
The biggest source of bugs in ML production comes from inconsistency between training preprocessing and production preprocessing. Saving the complete pipeline eliminates this risk.
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
import joblib
# Define preprocessing for different column types
numeric_features = ['age', 'income', 'credit_score']
categorical_features = ['gender', 'education', 'employment']
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
]
)
# Build complete pipeline
full_pipeline = Pipeline([
('preprocessor', preprocessor),
('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
])
# Train on raw data
full_pipeline.fit(X_train_raw, y_train)
# Save EVERYTHING in one file
joblib.dump(full_pipeline, 'full_pipeline_v1.0.0.joblib')
# In production: load and predict on raw input
pipeline = joblib.load('full_pipeline_v1.0.0.joblib')
predictions = pipeline.predict(new_raw_data) # preprocessing handled automatically
9. File Size Considerations
| Model | Typical Size | Recommended Format |
|---|---|---|
| Logistic Regression | 1-10 KB | pickle / joblib |
| SVM (small dataset) | 10-100 KB | joblib |
| Random Forest (100 trees) | 1-50 MB | joblib compress=3 |
| Random Forest (1000 trees) | 50-500 MB | joblib compress=9 |
| Deep Learning (ResNet50) | 100-300 MB | ONNX |
| LLM (GPT-like) | 1-100 GB | Specialized formats |
import os
import joblib
# Check file size before deployment
model_file = 'full_pipeline_v1.0.0.joblib'
size_mb = os.path.getsize(model_file) / (1024 * 1024)
print(f"Model file size: {size_mb:.1f} MB")
if size_mb > 100:
print("⚠️ Model is large. Consider:")
print(" - Reducing n_estimators")
print(" - Using compression: joblib.dump(model, file, compress=9)")
print(" - Converting to ONNX for optimized runtime")
elif size_mb > 500:
print("🚨 Model too large for most API deployments")
print(" - Consider model distillation or pruning")
Summary
🔑 Key Takeaways
- Always serialize the complete pipeline (preprocessing + model), not just the model.
- Pickle: Simple but dangerous. Never load from an untrusted source.
- Joblib: Preferred for scikit-learn. Built-in compression, optimized for large arrays.
- ONNX: For cross-platform deployment and fast inference. More secure.
- Version your models with naming conventions and metadata (metrics, hyperparameters, date).
- MLflow: The reference tool for experiment tracking and model registry.
- File size: Check before deployment. Compress if necessary.
- Metadata: Always save dependency versions (Python, sklearn) with the model.
Further Reading
| Resource | Link |
|---|---|
| Python pickle documentation | docs.python.org/3/library/pickle |
| Joblib documentation | joblib.readthedocs.io |
| ONNX Runtime | onnxruntime.ai |
| skl2onnx documentation | onnx.ai/sklearn-onnx |
| MLflow documentation | mlflow.org/docs/latest |