AI Deployment Fundamental Concepts
What is ML Deployment?
From Notebook to Production
Most machine learning projects start in a Jupyter notebook — a comfortable sandbox where data scientists explore data, train models, and evaluate results. But a notebook is not a product. Deployment is the process of turning a trained model into a reliable service that real users can interact with.
The Bridge Analogy
Think of model development as designing a bridge on paper — you calculate load capacities, choose materials, and simulate wind resistance. Deployment is actually building the bridge so that thousands of cars can cross it safely every day.
| Development (Design) | Deployment (Construction) |
|---|---|
| Works on sample data | Handles real-world data |
| Runs on your laptop | Runs on servers 24/7 |
| Tolerates errors and retries | Must be fault-tolerant |
| One user (you) | Thousands of concurrent users |
| Speed doesn't matter much | Latency is critical |
| Manual execution | Automated pipeline |
80% of ML projects never make it to production. The gap between a working notebook and a production service is where most projects fail. This course teaches you to cross that gap.
Development vs Deployment
Two Different Mindsets
| Aspect | Development | Deployment |
|---|---|---|
| Goal | Maximize accuracy | Maximize reliability + accuracy |
| Data | Static datasets (CSV, Parquet) | Live data streams |
| Code quality | "It works" is enough | Must be tested, documented, maintainable |
| Environment | Local machine, notebooks | Servers, containers, cloud |
| Versioning | Maybe Git for code | Code + model + data versioning |
| Monitoring | Manual evaluation | Automated alerts and dashboards |
| Error handling | Print statements | Structured logging, graceful degradation |
| Reproducibility | "It worked on my machine" | Must work everywhere, every time |
The MLOps Lifecycle
What is MLOps?
MLOps (Machine Learning Operations) is the set of practices that combines ML, DevOps, and Data Engineering to deploy and maintain ML systems in production reliably and efficiently.
Think of it as the ML equivalent of DevOps for traditional software.
MLOps Maturity Levels
| Level | Name | Description | Example |
|---|---|---|---|
| 0 | No MLOps | Manual, script-driven process | Running notebooks by hand |
| 1 | DevOps but no MLOps | Automated CI/CD for code, manual ML | Git + tests, but manual model training |
| 2 | Automated Training | Automated training pipeline | Scheduled retraining with new data |
| 3 | Automated Deployment | CI/CD for models | Auto-deploy if metrics pass threshold |
| 4 | Full MLOps | Automated everything + monitoring | Complete pipeline with drift detection |
In this course, we aim to bring you to Level 2-3 — you'll build automated training and deployment pipelines with proper testing.
Scope Definition
Why Define Scope First?
Before writing a single line of code, you must clearly define what your model will do, who will use it, and how it will be accessed. Without a clear scope, projects balloon in complexity and never ship.
The Project Brief
A project brief is a short document (1-2 pages) that answers these critical questions:
| Question | Example Answer |
|---|---|
| What problem does the model solve? | Predict customer churn within 30 days |
| Who is the end user? | Customer success team via a web dashboard |
| What data is needed? | Customer activity logs, subscription history |
| What is the expected input? | JSON with customer_id and recent activity |
| What is the expected output? | Churn probability (0-1) and risk level |
| What latency is acceptable? | < 200ms per prediction |
| How often is the model retrained? | Weekly with new data |
| What is the success metric? | AUC-ROC > 0.85, precision > 0.80 |
Scope Boundaries
Students often try to build "the perfect system" from day one. Start with a Minimum Viable Model (MVM) — a working model behind a simple API. You can always add complexity later.
Production Readiness
The Production Checklist
A model is production-ready when it meets these criteria:
| Category | Requirement | Status |
|---|---|---|
| Code | Code is in version control (Git) | ☐ |
| Code | Dependencies are pinned (requirements.txt) | ☐ |
| Code | Code passes linting and formatting | ☐ |
| Model | Model is serialized (pickle/joblib/ONNX) | ☐ |
| Model | Model version is tracked | ☐ |
| Testing | Unit tests pass | ☐ |
| Testing | Integration tests pass | ☐ |
| API | Endpoints are documented (Swagger) | ☐ |
| API | Error handling is implemented | ☐ |
| Monitoring | Logging is configured | ☐ |
| Monitoring | Health check endpoint exists | ☐ |
The Restaurant Inspection Analogy
Imagine a restaurant inspection before opening day:
- Code quality = Kitchen cleanliness
- Testing = Food safety checks
- Documentation = Menu and allergen labels
- Monitoring = Smoke detectors and temperature logs
- Error handling = Emergency exits and first aid kits
You wouldn't open a restaurant without passing inspection. Don't deploy a model without passing your production checklist.
Data Dependencies
Data is the Fuel
A model is only as good as its data. In production, data issues are the #1 cause of model failures — not code bugs, not infrastructure problems.
Data Drift
Data drift occurs when the statistical properties of the input data change over time, causing model performance to degrade.
| Type of Drift | Description | Example |
|---|---|---|
| Data drift | Input distribution changes | New customer demographics after marketing campaign |
| Concept drift | Relationship between input and output changes | COVID changed purchasing patterns |
| Label drift | Target variable distribution changes | Fraud patterns evolve with new techniques |
A credit scoring model trained on 2019 data performed poorly during COVID-19 because spending patterns (the input data) shifted dramatically. Income levels, payment behaviors, and spending categories all changed — this is a textbook case of both data drift and concept drift happening simultaneously.
Feature Stores
A feature store is a centralized repository for storing, managing, and serving ML features. It ensures that the same features used during training are available at prediction time.
| Without Feature Store | With Feature Store |
|---|---|
| Features computed differently in training vs serving | Same feature computation everywhere |
| Duplicate feature code across teams | Single source of truth |
| No feature versioning | Full lineage and versioning |
| Inconsistent data transformations | Guaranteed consistency |
Data Pipelines
A data pipeline automates the flow of data from source to model:
View Data Pipeline
Infrastructure Planning
CPU vs GPU
| Criteria | CPU | GPU |
|---|---|---|
| Best for | Classical ML (sklearn, XGBoost) | Deep learning (PyTorch, TensorFlow) |
| Cost | Lower ($) | Higher ($$$) |
| Inference speed | Slower for large models | Much faster for neural networks |
| Availability | Always available | May need reservation |
| Typical use | Tabular data, small models | Images, NLP, large transformers |
If your model is a scikit-learn model (Random Forest, Logistic Regression, etc.), CPU is sufficient. GPU is only needed for deep learning models with millions of parameters.
Cloud vs On-Premises
| Factor | Cloud | On-Premises |
|---|---|---|
| Setup time | Minutes | Weeks/months |
| Upfront cost | None (pay-as-you-go) | Very high |
| Scalability | Instant | Limited by hardware |
| Maintenance | Provider handles it | Your responsibility |
| Data control | Provider's data centers | Your facilities |
| Compliance | May have restrictions | Full control |
Containers
Docker containers package your application and all its dependencies into a single, portable unit. This solves the infamous "it works on my machine" problem.
Deployment Patterns
Batch vs Real-Time
The Restaurant Analogy
- Batch prediction = A buffet. The kitchen prepares all dishes in advance. Customers serve themselves. Efficient for large volumes, but food isn't made to order.
- Real-time prediction = À la carte service. Each dish is prepared when ordered. Fresh and customized, but slower for large groups.
| Aspect | Batch | Real-Time |
|---|---|---|
| Latency | Minutes to hours | Milliseconds to seconds |
| Throughput | Very high | Lower per request |
| Infrastructure | Scheduled jobs (Cron, Airflow) | API server (FastAPI, Flask) |
| Use case | Email recommendations, reports | Fraud detection, chatbots |
| Cost | Lower (run during off-peak) | Higher (always running) |
| Freshness | Stale (hours old) | Real-time |
Shadow Mode
In shadow mode, the new model receives production traffic but its predictions are not shown to users. Instead, predictions are logged and compared with the existing model.
View Shadow Deployment Pattern
Shadow mode is ideal when you want to test a new model on real production data without any risk to users. It's the safest deployment strategy.
Canary Deployment
In a canary deployment, you gradually route a small percentage of traffic to the new model while monitoring for issues.
The process is gradual: 5% → 10% → 25% → 50% → 100%. At any point, if issues are detected, you roll back to 0%.
Blue-Green Deployment
In blue-green deployment, you maintain two identical production environments. At any time, one is "live" (Blue) and one is "idle" (Green).
Deployment Patterns Comparison
| Pattern | Risk | Complexity | Rollback Speed | Best For |
|---|---|---|---|---|
| Direct (Big Bang) | 🔴 High | Low | Slow | Small projects, non-critical |
| Shadow | 🟢 None | Medium | Instant (not live) | Validating new models |
| Canary | 🟡 Low | Medium | Fast | Gradual confidence building |
| Blue-Green | 🟡 Low | High | Instant | Zero-downtime required |
| A/B Testing | 🟡 Low | High | Fast | Comparing model variants |
Model Versioning
Why Version Models?
Just like you version code with Git, you must version your models. Without versioning:
- You can't reproduce past results
- You can't roll back to a previous model
- You don't know which model is in production
- Debugging becomes impossible
What to Version
| Artifact | Tool | Example |
|---|---|---|
| Code | Git | git commit -m "Add feature engineering" |
| Model | MLflow / Model Registry | model_v2.1.0.pkl |
| Data | DVC / Data versioning | training_data_2024-01.csv |
| Config | Git (YAML/JSON) | hyperparameters.yaml |
| Environment | Docker / requirements.txt | Dockerfile, requirements.txt |
Semantic Versioning for Models
Apply semantic versioning (MAJOR.MINOR.PATCH) to models:
| Version Change | When | Example |
|---|---|---|
| MAJOR (v1 → v2) | Breaking change: new features, different output format | Changed from binary to multi-class |
| MINOR (v1.0 → v1.1) | Improvement: retrained with more data, new algorithm | Better accuracy after retraining |
| PATCH (v1.0.0 → v1.0.1) | Bug fix: corrected preprocessing step | Fixed normalization bug |
Summary
Key Concepts Map
Key Takeaways
| # | Concept | Remember |
|---|---|---|
| 1 | Deployment ≠ Development | Different skills, tools, and mindset required |
| 2 | MLOps | Practices for reliable ML in production |
| 3 | Scope first | Define what you're building before writing code |
| 4 | Data drift | Your model will degrade over time — plan for it |
| 5 | Deployment patterns | Choose based on risk tolerance and requirements |
| 6 | Version everything | Code, model, data, config, environment |
In the next section, we'll dive deeper into Infrastructure Planning — setting up Python environments, Docker containers, and understanding cloud services for AI deployment.