AI Deployment Fundamental Concepts

Theory 45 min

What is ML Deployment?

From Notebook to Production

Most machine learning projects start in a Jupyter notebook — a comfortable sandbox where data scientists explore data, train models, and evaluate results. But a notebook is not a product. Deployment is the process of turning a trained model into a reliable service that real users can interact with.

The Bridge Analogy

Think of model development as designing a bridge on paper — you calculate load capacities, choose materials, and simulate wind resistance. Deployment is actually building the bridge so that thousands of cars can cross it safely every day.

Development (Design)	Deployment (Construction)
Works on sample data	Handles real-world data
Runs on your laptop	Runs on servers 24/7
Tolerates errors and retries	Must be fault-tolerant
One user (you)	Thousands of concurrent users
Speed doesn't matter much	Latency is critical
Manual execution	Automated pipeline

Key Takeaway

80% of ML projects never make it to production. The gap between a working notebook and a production service is where most projects fail. This course teaches you to cross that gap.

Development vs Deployment

Two Different Mindsets

Aspect	Development	Deployment
Goal	Maximize accuracy	Maximize reliability + accuracy
Data	Static datasets (CSV, Parquet)	Live data streams
Code quality	"It works" is enough	Must be tested, documented, maintainable
Environment	Local machine, notebooks	Servers, containers, cloud
Versioning	Maybe Git for code	Code + model + data versioning
Monitoring	Manual evaluation	Automated alerts and dashboards
Error handling	Print statements	Structured logging, graceful degradation
Reproducibility	"It worked on my machine"	Must work everywhere, every time

The MLOps Lifecycle

What is MLOps?

MLOps (Machine Learning Operations) is the set of practices that combines ML, DevOps, and Data Engineering to deploy and maintain ML systems in production reliably and efficiently.

Think of it as the ML equivalent of DevOps for traditional software.

MLOps Maturity Levels

Level	Name	Description	Example
0	No MLOps	Manual, script-driven process	Running notebooks by hand
1	DevOps but no MLOps	Automated CI/CD for code, manual ML	Git + tests, but manual model training
2	Automated Training	Automated training pipeline	Scheduled retraining with new data
3	Automated Deployment	CI/CD for models	Auto-deploy if metrics pass threshold
4	Full MLOps	Automated everything + monitoring	Complete pipeline with drift detection

Course Scope

In this course, we aim to bring you to Level 2-3 — you'll build automated training and deployment pipelines with proper testing.

Scope Definition

Why Define Scope First?

Before writing a single line of code, you must clearly define what your model will do, who will use it, and how it will be accessed. Without a clear scope, projects balloon in complexity and never ship.

The Project Brief

A project brief is a short document (1-2 pages) that answers these critical questions:

Question	Example Answer
What problem does the model solve?	Predict customer churn within 30 days
Who is the end user?	Customer success team via a web dashboard
What data is needed?	Customer activity logs, subscription history
What is the expected input?	JSON with customer_id and recent activity
What is the expected output?	Churn probability (0-1) and risk level
What latency is acceptable?	< 200ms per prediction
How often is the model retrained?	Weekly with new data
What is the success metric?	AUC-ROC > 0.85, precision > 0.80

Scope Boundaries

Common Mistake

Students often try to build "the perfect system" from day one. Start with a Minimum Viable Model (MVM) — a working model behind a simple API. You can always add complexity later.

Production Readiness

The Production Checklist

A model is production-ready when it meets these criteria:

Category	Requirement	Status
Code	Code is in version control (Git)	☐
Code	Dependencies are pinned (requirements.txt)	☐
Code	Code passes linting and formatting	☐
Model	Model is serialized (pickle/joblib/ONNX)	☐
Model	Model version is tracked	☐
Testing	Unit tests pass	☐
Testing	Integration tests pass	☐
API	Endpoints are documented (Swagger)	☐
API	Error handling is implemented	☐
Monitoring	Logging is configured	☐
Monitoring	Health check endpoint exists	☐

The Restaurant Inspection Analogy

Imagine a restaurant inspection before opening day:

Code quality = Kitchen cleanliness
Testing = Food safety checks
Documentation = Menu and allergen labels
Monitoring = Smoke detectors and temperature logs
Error handling = Emergency exits and first aid kits

You wouldn't open a restaurant without passing inspection. Don't deploy a model without passing your production checklist.

Data Dependencies

Data is the Fuel

A model is only as good as its data. In production, data issues are the #1 cause of model failures — not code bugs, not infrastructure problems.

Data Drift

Data drift occurs when the statistical properties of the input data change over time, causing model performance to degrade.

Type of Drift	Description	Example
Data drift	Input distribution changes	New customer demographics after marketing campaign
Concept drift	Relationship between input and output changes	COVID changed purchasing patterns
Label drift	Target variable distribution changes	Fraud patterns evolve with new techniques

Real-World Example

A credit scoring model trained on 2019 data performed poorly during COVID-19 because spending patterns (the input data) shifted dramatically. Income levels, payment behaviors, and spending categories all changed — this is a textbook case of both data drift and concept drift happening simultaneously.

Feature Stores

A feature store is a centralized repository for storing, managing, and serving ML features. It ensures that the same features used during training are available at prediction time.

Without Feature Store	With Feature Store
Features computed differently in training vs serving	Same feature computation everywhere
Duplicate feature code across teams	Single source of truth
No feature versioning	Full lineage and versioning
Inconsistent data transformations	Guaranteed consistency

Data Pipelines

A data pipeline automates the flow of data from source to model:

View Data Pipeline

Infrastructure Planning

CPU vs GPU

Criteria	CPU	GPU
Best for	Classical ML (sklearn, XGBoost)	Deep learning (PyTorch, TensorFlow)
Cost	Lower ($)	Higher ($$$)
Inference speed	Slower for large models	Much faster for neural networks
Availability	Always available	May need reservation
Typical use	Tabular data, small models	Images, NLP, large transformers

Rule of Thumb

If your model is a scikit-learn model (Random Forest, Logistic Regression, etc.), CPU is sufficient. GPU is only needed for deep learning models with millions of parameters.

Cloud vs On-Premises

Factor	Cloud	On-Premises
Setup time	Minutes	Weeks/months
Upfront cost	None (pay-as-you-go)	Very high
Scalability	Instant	Limited by hardware
Maintenance	Provider handles it	Your responsibility
Data control	Provider's data centers	Your facilities
Compliance	May have restrictions	Full control

Containers

Docker containers package your application and all its dependencies into a single, portable unit. This solves the infamous "it works on my machine" problem.

Deployment Patterns

Batch vs Real-Time

The Restaurant Analogy

Batch prediction = A buffet. The kitchen prepares all dishes in advance. Customers serve themselves. Efficient for large volumes, but food isn't made to order.
Real-time prediction = À la carte service. Each dish is prepared when ordered. Fresh and customized, but slower for large groups.

Aspect	Batch	Real-Time
Latency	Minutes to hours	Milliseconds to seconds
Throughput	Very high	Lower per request
Infrastructure	Scheduled jobs (Cron, Airflow)	API server (FastAPI, Flask)
Use case	Email recommendations, reports	Fraud detection, chatbots
Cost	Lower (run during off-peak)	Higher (always running)
Freshness	Stale (hours old)	Real-time

Shadow Mode

In shadow mode, the new model receives production traffic but its predictions are not shown to users. Instead, predictions are logged and compared with the existing model.

View Shadow Deployment Pattern

When to Use Shadow Mode

Shadow mode is ideal when you want to test a new model on real production data without any risk to users. It's the safest deployment strategy.

Canary Deployment

In a canary deployment, you gradually route a small percentage of traffic to the new model while monitoring for issues.

The process is gradual: 5% → 10% → 25% → 50% → 100%. At any point, if issues are detected, you roll back to 0%.

Blue-Green Deployment

In blue-green deployment, you maintain two identical production environments. At any time, one is "live" (Blue) and one is "idle" (Green).

Deployment Patterns Comparison

Pattern	Risk	Complexity	Rollback Speed	Best For
Direct (Big Bang)	🔴 High	Low	Slow	Small projects, non-critical
Shadow	🟢 None	Medium	Instant (not live)	Validating new models
Canary	🟡 Low	Medium	Fast	Gradual confidence building
Blue-Green	🟡 Low	High	Instant	Zero-downtime required
A/B Testing	🟡 Low	High	Fast	Comparing model variants

Model Versioning

Why Version Models?

Just like you version code with Git, you must version your models. Without versioning:

You can't reproduce past results
You can't roll back to a previous model
You don't know which model is in production
Debugging becomes impossible

What to Version

Artifact	Tool	Example
Code	Git	`git commit -m "Add feature engineering"`
Model	MLflow / Model Registry	`model_v2.1.0.pkl`
Data	DVC / Data versioning	`training_data_2024-01.csv`
Config	Git (YAML/JSON)	`hyperparameters.yaml`
Environment	Docker / requirements.txt	`Dockerfile`, `requirements.txt`

Semantic Versioning for Models

Apply semantic versioning (MAJOR.MINOR.PATCH) to models:

Version Change	When	Example
MAJOR (v1 → v2)	Breaking change: new features, different output format	Changed from binary to multi-class
MINOR (v1.0 → v1.1)	Improvement: retrained with more data, new algorithm	Better accuracy after retraining
PATCH (v1.0.0 → v1.0.1)	Bug fix: corrected preprocessing step	Fixed normalization bug

Summary

Key Concepts Map

Key Takeaways

#	Concept	Remember
1	Deployment ≠ Development	Different skills, tools, and mindset required
2	MLOps	Practices for reliable ML in production
3	Scope first	Define what you're building before writing code
4	Data drift	Your model will degrade over time — plan for it
5	Deployment patterns	Choose based on risk tolerance and requirements
6	Version everything	Code, model, data, config, environment

Next Steps

In the next section, we'll dive deeper into Infrastructure Planning — setting up Python environments, Docker containers, and understanding cloud services for AI deployment.

What is ML Deployment?​

From Notebook to Production​

The Bridge Analogy​

Development vs Deployment​

Two Different Mindsets​

The MLOps Lifecycle​

What is MLOps?​

MLOps Maturity Levels​

Scope Definition​

Why Define Scope First?​

The Project Brief​

Scope Boundaries​

Production Readiness​

The Production Checklist​

The Restaurant Inspection Analogy​

Data Dependencies​

Data is the Fuel​

Data Drift​

Feature Stores​

Data Pipelines​

Infrastructure Planning​

CPU vs GPU​

Cloud vs On-Premises​

Containers​

Deployment Patterns​

Batch vs Real-Time​

The Restaurant Analogy​

Shadow Mode​

Canary Deployment​

Blue-Green Deployment​

Deployment Patterns Comparison​

Model Versioning​

Why Version Models?​

What to Version​

Semantic Versioning for Models​

Summary​

Key Concepts Map​

Key Takeaways​