Saltar al contenido principal

TP1 - Project Brief and Environment Setup

Practical Lab 60 min Beginner

Learning Objectives

By the end of this lab, you will be able to:

  • Create a structured project directory for ML deployment
  • Configure a Python virtual environment with venv
  • Install and manage dependencies with pip and requirements.txt
  • Write a project brief for an ML project
  • Verify that the environment is functional with a test script

Project Context

Throughout this course, you will build a complete ML prediction service — from model training to API deployment. This first lab lays the foundations.

View Project Roadmap

Prerequisites

ItemDescription
Python 3.10+Installed and accessible via terminal (python --version)
pipPython package manager (included with Python)
GitFor code versioning
TerminalPowerShell (Windows) or bash (macOS/Linux)
Code editorVS Code, Cursor, or PyCharm
Verify your Python installation

Before starting, run these commands in your terminal:

python --version    # Should display Python 3.10+
pip --version # Should display pip 23+
git --version # Should display git 2+

If a command fails, install the missing component before continuing.


Step 1: Create the project structure

1.1 Create the root directory

Open your terminal and create the project directory:

# Create the main directory
mkdir ml-deployment-project
cd ml-deployment-project

1.2 Create the full directory tree

Create the standard structure for an ML project:

# Create the directories
mkdir -p app
mkdir -p models
mkdir -p data/raw
mkdir -p data/processed
mkdir -p tests
mkdir -p scripts
mkdir -p notebooks
mkdir -p docs

1.3 Create the initialization files

# __init__.py files for Python packages
touch app/__init__.py
touch tests/__init__.py
On Windows (PowerShell)

Replace touch with New-Item:

New-Item -ItemType File -Path app/__init__.py
New-Item -ItemType File -Path tests/__init__.py

Or simply use your code editor to create the files.

1.4 Verify the structure

Your project should look like this:

ml-deployment-project/
├── app/
│ └── __init__.py
├── models/
├── data/
│ ├── raw/
│ └── processed/
├── tests/
│ └── __init__.py
├── scripts/
├── notebooks/
└── docs/
Verification — Command to display the directory tree
# macOS/Linux
find . -type f -o -type d | head -20

# Windows PowerShell
Get-ChildItem -Recurse -Depth 2 | Select-Object FullName

You should see all directories and the __init__.py files.


Step 2: Configure the virtual environment

2.1 Create the virtual environment

# From the project root
python -m venv .venv

2.2 Activate the environment

# Windows (PowerShell)
.venv\Scripts\Activate.ps1

# Windows (CMD)
.venv\Scripts\activate.bat

# macOS / Linux
source .venv/bin/activate
PowerShell — Execution policy

If you get an error on PowerShell, run first:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Then try activating again.

2.3 Verify activation

# The prompt should show (.venv)
(.venv) $ python --version
Python 3.11.x

# Verify that pip points to the virtual environment
(.venv) $ pip --version
pip 23.x.x from .../ml-deployment-project/.venv/lib/...
Verification — How to know if the environment is active?
  1. Your prompt displays (.venv) at the beginning
  2. which python (Linux/macOS) or Get-Command python (PowerShell) points to .venv/
  3. pip list shows very few packages (only pip and setuptools)

2.4 Update pip

python -m pip install --upgrade pip

Step 3: Install dependencies

3.1 Create the requirements.txt file

Create a requirements.txt file at the project root with the following content:

# ===========================================
# ML Deployment Project - Dependencies
# ===========================================

# --- Core ML ---
scikit-learn==1.4.2
pandas==2.2.0
numpy==1.26.4
joblib==1.3.2

# --- API Framework ---
fastapi==0.109.0
uvicorn[standard]==0.27.0
pydantic==2.5.3

# --- Testing ---
pytest==8.0.0
pytest-cov==4.1.0
httpx==0.26.0

# --- Model Explainability ---
shap==0.44.1
lime==0.2.0.1

# --- Utilities ---
python-dotenv==1.0.0
requests==2.31.0

3.2 Install the dependencies

pip install -r requirements.txt
Installation taking a long time?

Installing SHAP and its dependencies can take 2-5 minutes. This is normal — SHAP compiles C extensions in the background.

If SHAP installation fails, you can temporarily comment it out in requirements.txt and install it later.

3.3 Verify the installation

# Verify that the main packages are installed
pip list | grep -i "scikit-learn\|fastapi\|pandas\|pytest\|shap"
Verification — Expected output

You should see something like:

fastapi          0.109.0
pandas 2.2.0
pytest 8.0.0
scikit-learn 1.4.2
shap 0.44.1

If a package is missing, check the installation errors and try again with:

pip install <package-name>

Step 4: Create configuration files

4.1 .gitignore file

Create a .gitignore file at the root:

# Python
__pycache__/
*.py[cod]
*.egg-info/
dist/
build/
*.egg

# Virtual environment
.venv/
venv/
ENV/

# IDE
.vscode/
.idea/
*.swp
*.swo

# Data files (too large for Git)
data/raw/
*.csv
*.parquet
*.h5

# Model files (too large for Git)
models/*.pkl
models/*.joblib
models/*.onnx

# Environment variables
.env

# Jupyter
.ipynb_checkpoints/

# OS
.DS_Store
Thumbs.db

# Testing
.pytest_cache/
htmlcov/
.coverage

4.2 README.md file

Create a README.md file at the root:

# ML Deployment Project

A complete machine learning deployment pipeline — from model training to production API.

## Quick Start

```bash
# Clone the repository
git clone <repo-url>
cd ml-deployment-project

# Create virtual environment
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest tests/ -v

# Start the API server
uvicorn app.main:app --reload

Project Structure

ml-deployment-project/
├── app/ # FastAPI application
├── models/ # Serialized ML models
├── data/ # Datasets (raw + processed)
├── tests/ # Unit and integration tests
├── scripts/ # Training and utility scripts
├── notebooks/ # Jupyter notebooks for exploration
└── docs/ # Project documentation

Technology Stack

  • ML: scikit-learn, pandas, NumPy
  • API: FastAPI, uvicorn
  • Testing: pytest, httpx
  • Explainability: SHAP, LIME

### 4.3 pyproject.toml file

Create a `pyproject.toml` file at the root:

```toml
[project]
name = "ml-deployment-project"
version = "0.1.0"
description = "ML model deployment with FastAPI"
requires-python = ">=3.10"

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
addopts = "-v --tb=short"

[tool.ruff]
line-length = 88
target-version = "py311"

Step 5: Write the project brief

5.1 Create the document

Create the file docs/project_brief.md with the following content. Adapt it to your project — this template is for a customer churn prediction model.

# Project Brief — Customer Churn Prediction Service

## 1. Problem Statement

Predict whether a customer will cancel their subscription within the next 30 days,
based on their usage patterns and account information.

## 2. End Users

- **Primary**: Customer Success team (via web dashboard)
- **Secondary**: Marketing team (batch reports)

## 3. Data Requirements

| Feature | Type | Source |
|---------|------|--------|
| tenure_months | int | CRM database |
| monthly_charges | float | Billing system |
| total_charges | float | Billing system |
| contract_type | categorical | CRM database |
| internet_service | categorical | Service DB |
| tech_support_calls | int | Support tickets |

## 4. Model Requirements

| Metric | Target |
|--------|--------|
| AUC-ROC | > 0.85 |
| Precision | > 0.80 |
| Recall | > 0.70 |
| Inference latency | < 200ms |

## 5. Input / Output

**Input (API request):**
```json
{
"tenure_months": 24,
"monthly_charges": 65.5,
"total_charges": 1572.0,
"contract_type": "month-to-month",
"internet_service": "fiber_optic",
"tech_support_calls": 3
}

Output (API response):

{
"churn_probability": 0.73,
"risk_level": "high",
"model_version": "1.0.0"
}

6. Scope

In Scope

  • Binary classification model (churn / no churn)
  • REST API with FastAPI
  • Unit tests with pytest
  • Model explainability (SHAP)
  • API documentation (Swagger)

Out of Scope

  • Real-time streaming predictions
  • Mobile application
  • Multi-tenant architecture
  • GPU inference

7. Timeline

WeekDeliverable
1-2Environment setup + project brief
3-5Model training + evaluation
6-8API development + documentation
9-10Testing + explainability
11-15Integration + final project

:::tip[Customize your brief]
You can choose another ML problem if you prefer:
- **Spam detection** (text classification)
- **House price prediction** (regression)
- **Image classification** (if you're comfortable with deep learning)

The important thing is to clearly document the inputs, outputs, metrics, and scope.
:::

---

## Step 6: Verification script

### 6.1 Create the test script

Create the file `scripts/verify_setup.py`:

```python
"""
Verification script for ML Deployment Project setup.
Run this script to confirm your environment is correctly configured.
"""

import sys


def check_python_version():
version = sys.version_info
assert version.major == 3 and version.minor >= 10, (
f"Python 3.10+ required, got {version.major}.{version.minor}"
)
print(f" Python {version.major}.{version.minor}.{version.micro}")


def check_import(module_name, display_name=None):
display = display_name or module_name
try:
mod = __import__(module_name)
version = getattr(mod, "__version__", "unknown")
print(f" {display} {version}")
return True
except ImportError:
print(f" {display} — NOT INSTALLED")
return False


def check_project_structure():
from pathlib import Path

required_dirs = ["app", "models", "data", "tests", "scripts", "docs"]
required_files = [
"requirements.txt",
"app/__init__.py",
"tests/__init__.py",
]

project_root = Path(__file__).parent.parent
missing = []

for d in required_dirs:
if not (project_root / d).is_dir():
missing.append(f"Directory: {d}/")

for f in required_files:
if not (project_root / f).is_file():
missing.append(f"File: {f}")

if missing:
print(" MISSING:")
for m in missing:
print(f" - {m}")
return False

print(f" All {len(required_dirs)} directories present")
print(f" All {len(required_files)} required files present")
return True


def main():
print("=" * 50)
print("ML Deployment Project — Setup Verification")
print("=" * 50)
all_ok = True

print("\n[1/3] Python Version")
try:
check_python_version()
except AssertionError as e:
print(f" FAIL: {e}")
all_ok = False

print("\n[2/3] Package Imports")
packages = [
("sklearn", "scikit-learn"),
("pandas", None),
("numpy", None),
("fastapi", None),
("uvicorn", None),
("pydantic", None),
("pytest", None),
("httpx", None),
("joblib", None),
]

for module_name, display_name in packages:
if not check_import(module_name, display_name):
all_ok = False

# SHAP and LIME are optional (may fail on some systems)
print("\n Optional packages:")
check_import("shap")
check_import("lime")

print("\n[3/3] Project Structure")
if not check_project_structure():
all_ok = False

print("\n" + "=" * 50)
if all_ok:
print("ALL CHECKS PASSED — Your environment is ready!")
else:
print("SOME CHECKS FAILED — Review the errors above.")
print("=" * 50)

return 0 if all_ok else 1


if __name__ == "__main__":
sys.exit(main())

6.2 Run the verification

python scripts/verify_setup.py
Verification — Expected output
==================================================
ML Deployment Project — Setup Verification
==================================================

[1/3] Python Version
Python 3.11.5

[2/3] Package Imports
scikit-learn 1.4.2
pandas 2.2.0
numpy 1.26.4
fastapi 0.109.0
uvicorn 0.27.0
pydantic 2.5.3
pytest 8.0.0
httpx 0.26.0
joblib 1.3.2

Optional packages:
shap 0.44.1
lime 0.2.0.1

[3/3] Project Structure
All 6 directories present
All 3 required files present

==================================================
ALL CHECKS PASSED — Your environment is ready!
==================================================

Step 7: Create a first test

7.1 Write a unit test

Create the file tests/test_setup.py:

"""Basic tests to verify the project environment."""

import importlib


def test_python_version():
import sys
assert sys.version_info >= (3, 10), "Python 3.10+ required"


def test_sklearn_import():
sklearn = importlib.import_module("sklearn")
assert hasattr(sklearn, "__version__")


def test_fastapi_import():
fastapi = importlib.import_module("fastapi")
assert hasattr(fastapi, "FastAPI")


def test_pandas_import():
pd = importlib.import_module("pandas")
assert hasattr(pd, "DataFrame")


def test_project_structure():
from pathlib import Path

root = Path(__file__).parent.parent
assert (root / "app").is_dir()
assert (root / "models").is_dir()
assert (root / "tests").is_dir()
assert (root / "requirements.txt").is_file()

7.2 Run the tests

pytest tests/test_setup.py -v
Verification — Expected output
========================= test session starts =========================
collected 5 items

tests/test_setup.py::test_python_version PASSED [ 20%]
tests/test_setup.py::test_sklearn_import PASSED [ 40%]
tests/test_setup.py::test_fastapi_import PASSED [ 60%]
tests/test_setup.py::test_pandas_import PASSED [ 80%]
tests/test_setup.py::test_project_structure PASSED [100%]

========================= 5 passed in 0.42s =========================

All tests must pass (5/5). If a test fails, verify that the corresponding package is installed.


Step 8: Initialize Git

8.1 Initialize the repository

git init
git add .
git commit -m "Initial project setup: structure, dependencies, and verification"

8.2 Verify the status

git status
git log --oneline
Verification — What Git should ignore

Verify that .gitignore works correctly:

git status

You should NOT see:

  • .venv/ (virtual environment)
  • __pycache__/ (compiled files)
  • data/raw/ (raw data)

If these files appear, check your .gitignore.


Lab Summary

What you have accomplished

StepDescriptionStatus
1Project structure created
2Virtual environment configured
3Dependencies installed
4Configuration files created (.gitignore, README, pyproject.toml)
5Project brief written
6Verification script run successfully
7First unit test written and passed
8Git repository initialized

Final project structure

ml-deployment-project/
├── app/
│ └── __init__.py
├── models/
├── data/
│ ├── raw/
│ └── processed/
├── tests/
│ ├── __init__.py
│ └── test_setup.py
├── scripts/
│ └── verify_setup.py
├── notebooks/
├── docs/
│ └── project_brief.md
├── .gitignore
├── .venv/ (not in Git)
├── pyproject.toml
├── README.md
└── requirements.txt

Essential commands to remember

ActionCommand
Activate environmentsource .venv/bin/activate or .venv\Scripts\activate
Install dependenciespip install -r requirements.txt
Run testspytest tests/ -v
Verify environmentpython scripts/verify_setup.py
Start API (Module 3)uvicorn app.main:app --reload
Next lab

In Lab 2 (Module 2), you will train a classification model with scikit-learn and serialize it for deployment. The structure you just created will serve as the foundation for the rest of the course.