TP1 - Project Brief and Environment Setup
Learning Objectives
By the end of this lab, you will be able to:
- Create a structured project directory for ML deployment
- Configure a Python virtual environment with
venv - Install and manage dependencies with
pipandrequirements.txt - Write a project brief for an ML project
- Verify that the environment is functional with a test script
Project Context
Throughout this course, you will build a complete ML prediction service — from model training to API deployment. This first lab lays the foundations.
View Project Roadmap
Prerequisites
| Item | Description |
|---|---|
| Python 3.10+ | Installed and accessible via terminal (python --version) |
| pip | Python package manager (included with Python) |
| Git | For code versioning |
| Terminal | PowerShell (Windows) or bash (macOS/Linux) |
| Code editor | VS Code, Cursor, or PyCharm |
Before starting, run these commands in your terminal:
python --version # Should display Python 3.10+
pip --version # Should display pip 23+
git --version # Should display git 2+
If a command fails, install the missing component before continuing.
Step 1: Create the project structure
1.1 Create the root directory
Open your terminal and create the project directory:
# Create the main directory
mkdir ml-deployment-project
cd ml-deployment-project
1.2 Create the full directory tree
Create the standard structure for an ML project:
# Create the directories
mkdir -p app
mkdir -p models
mkdir -p data/raw
mkdir -p data/processed
mkdir -p tests
mkdir -p scripts
mkdir -p notebooks
mkdir -p docs
1.3 Create the initialization files
# __init__.py files for Python packages
touch app/__init__.py
touch tests/__init__.py
Replace touch with New-Item:
New-Item -ItemType File -Path app/__init__.py
New-Item -ItemType File -Path tests/__init__.py
Or simply use your code editor to create the files.
1.4 Verify the structure
Your project should look like this:
ml-deployment-project/
├── app/
│ └── __init__.py
├── models/
├── data/
│ ├── raw/
│ └── processed/
├── tests/
│ └── __init__.py
├── scripts/
├── notebooks/
└── docs/
Verification — Command to display the directory tree
# macOS/Linux
find . -type f -o -type d | head -20
# Windows PowerShell
Get-ChildItem -Recurse -Depth 2 | Select-Object FullName
You should see all directories and the __init__.py files.
Step 2: Configure the virtual environment
2.1 Create the virtual environment
# From the project root
python -m venv .venv
2.2 Activate the environment
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# Windows (CMD)
.venv\Scripts\activate.bat
# macOS / Linux
source .venv/bin/activate
If you get an error on PowerShell, run first:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Then try activating again.
2.3 Verify activation
# The prompt should show (.venv)
(.venv) $ python --version
Python 3.11.x
# Verify that pip points to the virtual environment
(.venv) $ pip --version
pip 23.x.x from .../ml-deployment-project/.venv/lib/...
Verification — How to know if the environment is active?
- Your prompt displays
(.venv)at the beginning which python(Linux/macOS) orGet-Command python(PowerShell) points to.venv/pip listshows very few packages (only pip and setuptools)
2.4 Update pip
python -m pip install --upgrade pip
Step 3: Install dependencies
3.1 Create the requirements.txt file
Create a requirements.txt file at the project root with the following content:
# ===========================================
# ML Deployment Project - Dependencies
# ===========================================
# --- Core ML ---
scikit-learn==1.4.2
pandas==2.2.0
numpy==1.26.4
joblib==1.3.2
# --- API Framework ---
fastapi==0.109.0
uvicorn[standard]==0.27.0
pydantic==2.5.3
# --- Testing ---
pytest==8.0.0
pytest-cov==4.1.0
httpx==0.26.0
# --- Model Explainability ---
shap==0.44.1
lime==0.2.0.1
# --- Utilities ---
python-dotenv==1.0.0
requests==2.31.0
3.2 Install the dependencies
pip install -r requirements.txt
Installing SHAP and its dependencies can take 2-5 minutes. This is normal — SHAP compiles C extensions in the background.
If SHAP installation fails, you can temporarily comment it out in requirements.txt and install it later.
3.3 Verify the installation
# Verify that the main packages are installed
pip list | grep -i "scikit-learn\|fastapi\|pandas\|pytest\|shap"
Verification — Expected output
You should see something like:
fastapi 0.109.0
pandas 2.2.0
pytest 8.0.0
scikit-learn 1.4.2
shap 0.44.1
If a package is missing, check the installation errors and try again with:
pip install <package-name>
Step 4: Create configuration files
4.1 .gitignore file
Create a .gitignore file at the root:
# Python
__pycache__/
*.py[cod]
*.egg-info/
dist/
build/
*.egg
# Virtual environment
.venv/
venv/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
# Data files (too large for Git)
data/raw/
*.csv
*.parquet
*.h5
# Model files (too large for Git)
models/*.pkl
models/*.joblib
models/*.onnx
# Environment variables
.env
# Jupyter
.ipynb_checkpoints/
# OS
.DS_Store
Thumbs.db
# Testing
.pytest_cache/
htmlcov/
.coverage
4.2 README.md file
Create a README.md file at the root:
# ML Deployment Project
A complete machine learning deployment pipeline — from model training to production API.
## Quick Start
```bash
# Clone the repository
git clone <repo-url>
cd ml-deployment-project
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v
# Start the API server
uvicorn app.main:app --reload
Project Structure
ml-deployment-project/
├── app/ # FastAPI application
├── models/ # Serialized ML models
├── data/ # Datasets (raw + processed)
├── tests/ # Unit and integration tests
├── scripts/ # Training and utility scripts
├── notebooks/ # Jupyter notebooks for exploration
└── docs/ # Project documentation
Technology Stack
- ML: scikit-learn, pandas, NumPy
- API: FastAPI, uvicorn
- Testing: pytest, httpx
- Explainability: SHAP, LIME
### 4.3 pyproject.toml file
Create a `pyproject.toml` file at the root:
```toml
[project]
name = "ml-deployment-project"
version = "0.1.0"
description = "ML model deployment with FastAPI"
requires-python = ">=3.10"
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
addopts = "-v --tb=short"
[tool.ruff]
line-length = 88
target-version = "py311"
Step 5: Write the project brief
5.1 Create the document
Create the file docs/project_brief.md with the following content. Adapt it to your project — this template is for a customer churn prediction model.
# Project Brief — Customer Churn Prediction Service
## 1. Problem Statement
Predict whether a customer will cancel their subscription within the next 30 days,
based on their usage patterns and account information.
## 2. End Users
- **Primary**: Customer Success team (via web dashboard)
- **Secondary**: Marketing team (batch reports)
## 3. Data Requirements
| Feature | Type | Source |
|---------|------|--------|
| tenure_months | int | CRM database |
| monthly_charges | float | Billing system |
| total_charges | float | Billing system |
| contract_type | categorical | CRM database |
| internet_service | categorical | Service DB |
| tech_support_calls | int | Support tickets |
## 4. Model Requirements
| Metric | Target |
|--------|--------|
| AUC-ROC | > 0.85 |
| Precision | > 0.80 |
| Recall | > 0.70 |
| Inference latency | < 200ms |
## 5. Input / Output
**Input (API request):**
```json
{
"tenure_months": 24,
"monthly_charges": 65.5,
"total_charges": 1572.0,
"contract_type": "month-to-month",
"internet_service": "fiber_optic",
"tech_support_calls": 3
}
Output (API response):
{
"churn_probability": 0.73,
"risk_level": "high",
"model_version": "1.0.0"
}
6. Scope
In Scope
- Binary classification model (churn / no churn)
- REST API with FastAPI
- Unit tests with pytest
- Model explainability (SHAP)
- API documentation (Swagger)
Out of Scope
- Real-time streaming predictions
- Mobile application
- Multi-tenant architecture
- GPU inference
7. Timeline
| Week | Deliverable |
|---|---|
| 1-2 | Environment setup + project brief |
| 3-5 | Model training + evaluation |
| 6-8 | API development + documentation |
| 9-10 | Testing + explainability |
| 11-15 | Integration + final project |
:::tip[Customize your brief]
You can choose another ML problem if you prefer:
- **Spam detection** (text classification)
- **House price prediction** (regression)
- **Image classification** (if you're comfortable with deep learning)
The important thing is to clearly document the inputs, outputs, metrics, and scope.
:::
---
## Step 6: Verification script
### 6.1 Create the test script
Create the file `scripts/verify_setup.py`:
```python
"""
Verification script for ML Deployment Project setup.
Run this script to confirm your environment is correctly configured.
"""
import sys
def check_python_version():
version = sys.version_info
assert version.major == 3 and version.minor >= 10, (
f"Python 3.10+ required, got {version.major}.{version.minor}"
)
print(f" Python {version.major}.{version.minor}.{version.micro}")
def check_import(module_name, display_name=None):
display = display_name or module_name
try:
mod = __import__(module_name)
version = getattr(mod, "__version__", "unknown")
print(f" {display} {version}")
return True
except ImportError:
print(f" {display} — NOT INSTALLED")
return False
def check_project_structure():
from pathlib import Path
required_dirs = ["app", "models", "data", "tests", "scripts", "docs"]
required_files = [
"requirements.txt",
"app/__init__.py",
"tests/__init__.py",
]
project_root = Path(__file__).parent.parent
missing = []
for d in required_dirs:
if not (project_root / d).is_dir():
missing.append(f"Directory: {d}/")
for f in required_files:
if not (project_root / f).is_file():
missing.append(f"File: {f}")
if missing:
print(" MISSING:")
for m in missing:
print(f" - {m}")
return False
print(f" All {len(required_dirs)} directories present")
print(f" All {len(required_files)} required files present")
return True
def main():
print("=" * 50)
print("ML Deployment Project — Setup Verification")
print("=" * 50)
all_ok = True
print("\n[1/3] Python Version")
try:
check_python_version()
except AssertionError as e:
print(f" FAIL: {e}")
all_ok = False
print("\n[2/3] Package Imports")
packages = [
("sklearn", "scikit-learn"),
("pandas", None),
("numpy", None),
("fastapi", None),
("uvicorn", None),
("pydantic", None),
("pytest", None),
("httpx", None),
("joblib", None),
]
for module_name, display_name in packages:
if not check_import(module_name, display_name):
all_ok = False
# SHAP and LIME are optional (may fail on some systems)
print("\n Optional packages:")
check_import("shap")
check_import("lime")
print("\n[3/3] Project Structure")
if not check_project_structure():
all_ok = False
print("\n" + "=" * 50)
if all_ok:
print("ALL CHECKS PASSED — Your environment is ready!")
else:
print("SOME CHECKS FAILED — Review the errors above.")
print("=" * 50)
return 0 if all_ok else 1
if __name__ == "__main__":
sys.exit(main())
6.2 Run the verification
python scripts/verify_setup.py
Verification — Expected output
==================================================
ML Deployment Project — Setup Verification
==================================================
[1/3] Python Version
Python 3.11.5
[2/3] Package Imports
scikit-learn 1.4.2
pandas 2.2.0
numpy 1.26.4
fastapi 0.109.0
uvicorn 0.27.0
pydantic 2.5.3
pytest 8.0.0
httpx 0.26.0
joblib 1.3.2
Optional packages:
shap 0.44.1
lime 0.2.0.1
[3/3] Project Structure
All 6 directories present
All 3 required files present
==================================================
ALL CHECKS PASSED — Your environment is ready!
==================================================
Step 7: Create a first test
7.1 Write a unit test
Create the file tests/test_setup.py:
"""Basic tests to verify the project environment."""
import importlib
def test_python_version():
import sys
assert sys.version_info >= (3, 10), "Python 3.10+ required"
def test_sklearn_import():
sklearn = importlib.import_module("sklearn")
assert hasattr(sklearn, "__version__")
def test_fastapi_import():
fastapi = importlib.import_module("fastapi")
assert hasattr(fastapi, "FastAPI")
def test_pandas_import():
pd = importlib.import_module("pandas")
assert hasattr(pd, "DataFrame")
def test_project_structure():
from pathlib import Path
root = Path(__file__).parent.parent
assert (root / "app").is_dir()
assert (root / "models").is_dir()
assert (root / "tests").is_dir()
assert (root / "requirements.txt").is_file()
7.2 Run the tests
pytest tests/test_setup.py -v
Verification — Expected output
========================= test session starts =========================
collected 5 items
tests/test_setup.py::test_python_version PASSED [ 20%]
tests/test_setup.py::test_sklearn_import PASSED [ 40%]
tests/test_setup.py::test_fastapi_import PASSED [ 60%]
tests/test_setup.py::test_pandas_import PASSED [ 80%]
tests/test_setup.py::test_project_structure PASSED [100%]
========================= 5 passed in 0.42s =========================
All tests must pass (5/5). If a test fails, verify that the corresponding package is installed.
Step 8: Initialize Git
8.1 Initialize the repository
git init
git add .
git commit -m "Initial project setup: structure, dependencies, and verification"
8.2 Verify the status
git status
git log --oneline
Verification — What Git should ignore
Verify that .gitignore works correctly:
git status
You should NOT see:
.venv/(virtual environment)__pycache__/(compiled files)data/raw/(raw data)
If these files appear, check your .gitignore.
Lab Summary
What you have accomplished
| Step | Description | Status |
|---|---|---|
| 1 | Project structure created | ☐ |
| 2 | Virtual environment configured | ☐ |
| 3 | Dependencies installed | ☐ |
| 4 | Configuration files created (.gitignore, README, pyproject.toml) | ☐ |
| 5 | Project brief written | ☐ |
| 6 | Verification script run successfully | ☐ |
| 7 | First unit test written and passed | ☐ |
| 8 | Git repository initialized | ☐ |
Final project structure
ml-deployment-project/
├── app/
│ └── __init__.py
├── models/
├── data/
│ ├── raw/
│ └── processed/
├── tests/
│ ├── __init__.py
│ └── test_setup.py
├── scripts/
│ └── verify_setup.py
├── notebooks/
├── docs/
│ └── project_brief.md
├── .gitignore
├── .venv/ (not in Git)
├── pyproject.toml
├── README.md
└── requirements.txt
Essential commands to remember
| Action | Command |
|---|---|
| Activate environment | source .venv/bin/activate or .venv\Scripts\activate |
| Install dependencies | pip install -r requirements.txt |
| Run tests | pytest tests/ -v |
| Verify environment | python scripts/verify_setup.py |
| Start API (Module 3) | uvicorn app.main:app --reload |
In Lab 2 (Module 2), you will train a classification model with scikit-learn and serialize it for deployment. The structure you just created will serve as the foundation for the rest of the course.