Infrastructure Planning for AI
Why Infrastructure Matters
The Foundation Analogy
Infrastructure for AI deployment is like the foundation of a building. The most beautiful architecture is useless if the foundation is weak. Similarly, the most accurate model is worthless if it can't run reliably in production.
Python Virtual Environments
The Problem: Dependency Hell
Imagine you have two projects:
- Project A requires
scikit-learn==1.2.0 - Project B requires
scikit-learn==1.4.0
If both use your system Python, installing one version breaks the other. This is called dependency hell.
The Solution: Virtual Environments
A virtual environment is an isolated Python installation. Each project gets its own set of packages without interfering with others.
venv — The Built-in Option
venv comes with Python and is the simplest option:
# Create a virtual environment
python -m venv .venv
# Activate it (Windows)
.venv\Scripts\activate
# Activate it (macOS/Linux)
source .venv/bin/activate
# Your terminal shows the active environment
(.venv) $ python --version
Python 3.11.5
# Install packages in isolation
(.venv) $ pip install scikit-learn pandas fastapi
# Deactivate when done
(.venv) $ deactivate
conda — The Data Science Option
Conda is a package manager popular in data science. It manages both Python packages and system-level dependencies (like CUDA for GPUs).
# Create a conda environment
conda create -n ml-project python=3.11
# Activate it
conda activate ml-project
# Install packages (can mix conda and pip)
conda install scikit-learn pandas
pip install fastapi
# Export environment
conda env export > environment.yml
# Recreate from file
conda env create -f environment.yml
venv vs conda
| Feature | venv | conda |
|---|---|---|
| Installation | Built-in (Python 3.3+) | Requires Anaconda/Miniconda |
| Package source | PyPI only | Conda channels + PyPI |
| Non-Python deps | Cannot manage | Can manage (CUDA, C libs) |
| Speed | Fast | Slower (dependency solving) |
| Reproducibility | requirements.txt | environment.yml |
| Disk space | Lightweight | Heavier |
| Best for | Web apps, APIs, CI/CD | Data science, GPU projects |
We use venv + pip throughout this course. It's simpler, faster, and sufficient for our API-focused deployment workflow. Use conda if you need GPU support or complex scientific libraries.
Dependency Management
requirements.txt — Pinning Versions
A requirements.txt file lists all your project's dependencies with pinned versions for reproducibility:
# Core ML
scikit-learn==1.4.2
pandas==2.2.0
numpy==1.26.4
joblib==1.3.2
# API Framework
fastapi==0.109.0
uvicorn==0.27.0
pydantic==2.5.3
# Testing
pytest==8.0.0
httpx==0.26.0
# Explainability
shap==0.44.1
lime==0.2.0.1
Never use pip install scikit-learn without a version in your requirements file. An unpinned dependency means your project might break tomorrow if a new version is released.
Generating requirements.txt
# Option 1: Freeze all installed packages
pip freeze > requirements.txt
# Option 2: Use pipreqs (only project imports)
pip install pipreqs
pipreqs . --force
# Install from requirements
pip install -r requirements.txt
The Lock File Pattern
For stricter reproducibility, modern tools create lock files that pin every sub-dependency:
| Tool | Config File | Lock File |
|---|---|---|
| pip | requirements.txt | requirements.txt (manually) |
| pip-tools | requirements.in | requirements.txt (compiled) |
| Poetry | pyproject.toml | poetry.lock |
| Pipenv | Pipfile | Pipfile.lock |
# Using pip-tools for better dependency management
pip install pip-tools
# Write your direct dependencies in requirements.in
# Then compile the full locked file:
pip-compile requirements.in --output-file requirements.txt
Docker Basics for ML
What is Docker?
Docker packages your application, its dependencies, and the operating system into a single container — a lightweight, portable, self-sufficient unit.
The Shipping Container Analogy
Before standardized shipping containers, every port had different cranes, trucks, and warehouses. Shipping was chaotic and slow. The standardized container revolutionized global trade.
Docker does the same for software:
| Shipping Container | Docker Container |
|---|---|
| Standard size fits any ship/truck/crane | Runs on any machine with Docker |
| Contents are isolated and sealed | App is isolated from host system |
| Stackable and composable | Multiple containers work together |
| Reusable across the world | Same image runs dev/staging/prod |
Dockerfile for an ML Project
A Dockerfile is a recipe for building a container image:
# Start from a Python base image
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Copy and install dependencies first (better caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the application code
COPY . .
# Expose the API port
EXPOSE 8000
# Start the FastAPI server
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Key Docker Commands
# Build an image
docker build -t my-ml-api:v1.0 .
# Run a container
docker run -p 8000:8000 my-ml-api:v1.0
# Run in background
docker run -d -p 8000:8000 --name ml-api my-ml-api:v1.0
# Check running containers
docker ps
# View logs
docker logs ml-api
# Stop container
docker stop ml-api
Docker Layer Caching
Docker builds images in layers. Each instruction in the Dockerfile creates a layer. If a layer hasn't changed, Docker reuses the cached version.
Always copy requirements.txt and install dependencies before copying your code. This way, Docker only reinstalls packages when dependencies actually change, not when you edit a Python file.
.dockerignore
Just like .gitignore, a .dockerignore file excludes unnecessary files from the Docker build context:
__pycache__
*.pyc
.git
.venv
.env
*.ipynb_checkpoints
data/raw/
notebooks/
.pytest_cache
GPU vs CPU Considerations
When Do You Need a GPU?
Cost Comparison
| Instance Type | vCPUs | RAM | GPU | Price/hour (approx.) | Use Case |
|---|---|---|---|---|---|
| t3.medium | 2 | 4 GB | None | $0.04 | Simple sklearn models |
| c5.xlarge | 4 | 8 GB | None | $0.17 | XGBoost, feature-heavy models |
| g4dn.xlarge | 4 | 16 GB | 1x T4 | $0.53 | PyTorch inference |
| p3.2xlarge | 8 | 61 GB | 1x V100 | $3.06 | Training deep learning models |
| p4d.24xlarge | 96 | 1152 GB | 8x A100 | $32.77 | Large Language Models |
A GPU instance can cost 10-100x more than a CPU instance. Always start with CPU and only upgrade to GPU if latency requirements demand it. For this course, CPU instances are sufficient.
Training vs Inference
| Phase | Compute Needs | Duration | Cost Strategy |
|---|---|---|---|
| Training | High (GPU often) | Hours to days | Use spot instances (60-90% savings) |
| Inference | Lower (CPU often OK) | Continuous | Use reserved instances or serverless |
Cloud Services for ML
The Big Three
Cloud Services Comparison
| Feature | AWS SageMaker | GCP Vertex AI | Azure ML |
|---|---|---|---|
| Notebooks | SageMaker Studio | Vertex Workbench | Azure ML Studio |
| Training | Training Jobs | Custom Training | Training Pipelines |
| Deployment | Endpoints | Endpoints | Managed Endpoints |
| AutoML | Autopilot | AutoML | AutoML |
| MLOps | Pipelines | Pipelines | Designer + Pipelines |
| Containers | ECR + ECS/EKS | GCR + GKE/Cloud Run | ACR + ACI/AKS |
| Serverless | Lambda | Cloud Functions | Azure Functions |
| Pricing | Pay-as-you-go | Pay-as-you-go | Pay-as-you-go |
Simpler Deployment Options
For college projects and small services, you don't need the full power of SageMaker or Vertex AI:
| Platform | Best For | Free Tier | Complexity |
|---|---|---|---|
| Render | Simple API hosting | 750 hours/month | ⭐ Very Low |
| Railway | Python apps + DB | $5 credit/month | ⭐ Very Low |
| Fly.io | Docker containers | 3 shared VMs | ⭐⭐ Low |
| AWS Lambda | Serverless functions | 1M requests/month | ⭐⭐ Low |
| Google Cloud Run | Container-based APIs | 2M requests/month | ⭐⭐ Low |
| Heroku | Full-stack apps | Eco plan $5/month | ⭐⭐ Low |
We'll use local development (FastAPI + uvicorn) for most labs. For the final project, you may optionally deploy to a cloud platform.
CI/CD Basics for ML
What is CI/CD?
CI/CD stands for Continuous Integration / Continuous Deployment. It automates the process of testing and deploying code changes.
View CI/CD Pipeline
The Assembly Line Analogy
CI/CD is like a car assembly line:
- CI = Quality checks at every station (unit tests, linting, building)
- CD = The car rolls off the line and drives to the dealership (deployment)
Without CI/CD, it's like hand-building each car and manually driving it to the customer.
CI/CD for ML — What's Different?
Traditional CI/CD tests code. ML CI/CD must also test data and models:
| Traditional CI/CD | ML CI/CD |
|---|---|
| Unit tests pass? | Unit tests pass? |
| Code compiles? | Code compiles? |
| — | Data validation passes? |
| — | Model metrics above threshold? |
| — | No data drift detected? |
| — | Model size within limits? |
| Deploy application | Deploy model + application |
Example: GitHub Actions for ML
name: ML Pipeline
on:
push:
branches: [main]
jobs:
test-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: pytest tests/ -v
- name: Check model metrics
run: python scripts/validate_model.py
- name: Build Docker image
run: docker build -t ml-api:latest .
Environment Reproducibility
The Reproducibility Pyramid
Minimum Reproducibility Checklist
| File | Purpose | Required? |
|---|---|---|
requirements.txt | Python dependencies with versions | ✅ Yes |
Dockerfile | Complete environment definition | ✅ Yes (for deployment) |
.dockerignore | Exclude unnecessary files | ✅ Yes |
.gitignore | Exclude generated files from Git | ✅ Yes |
README.md | Setup and run instructions | ✅ Yes |
pyproject.toml | Project metadata and tool config | Recommended |
.env.example | Template for environment variables | Recommended |
Makefile | Common commands shortcuts | Optional |
Standard Project Structure
ml-project/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── model.py # Model loading and prediction
│ └── schemas.py # Pydantic request/response models
├── models/
│ └── model_v1.0.0.pkl # Serialized model
├── data/
│ ├── raw/ # Original data (gitignored)
│ └── processed/ # Cleaned data
├── tests/
│ ├── __init__.py
│ ├── test_api.py # API endpoint tests
│ └── test_model.py # Model prediction tests
├── notebooks/
│ └── exploration.ipynb # Data exploration (gitignored in prod)
├── scripts/
│ └── train.py # Training script
├── .gitignore
├── .dockerignore
├── Dockerfile
├── requirements.txt
├── README.md
└── pyproject.toml
Every lab in this course follows this project structure. You'll build it incrementally — starting with the environment setup in TP1, adding the model in Module 2, the API in Module 3, and tests in Module 5.
Summary
Infrastructure Decision Tree
Key Takeaways
| # | Concept | Remember |
|---|---|---|
| 1 | Virtual environments | Always isolate project dependencies |
| 2 | Pin versions | requirements.txt with exact versions |
| 3 | Docker | Package everything for reproducibility |
| 4 | CPU first | Only use GPU if deep learning demands it |
| 5 | Cloud options | Simple platforms (Render, Cloud Run) for small projects |
| 6 | CI/CD | Automate testing and deployment |
| 7 | Project structure | Follow conventions for maintainability |
In TP1, you'll put these concepts into practice by setting up your project environment from scratch — creating a virtual environment, installing dependencies, and building the standard project structure.