TP3 - Build a Prediction API with FastAPI
Objectives
By the end of this lab, you will be able to:
- Load a serialized ML model from Module 2 into a FastAPI application
- Define Pydantic schemas for request validation and response serialization
- Implement a
/predictendpoint that serves real-time predictions - Implement a
/healthendpoint for service monitoring - Add proper error handling for common failure scenarios
- Test the API using
uvicornand the auto-generated Swagger UI
Prerequisites
- Completed TP2 (you should have a serialized model file
model_v1.joblib) - Python 3.10+ installed
- Basic understanding of REST APIs (Module 3 concepts)
If you haven't completed TP2, run this script to create a sample model:
# create_sample_model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import joblib
X, y = make_classification(
n_samples=1000, n_features=5, n_informative=4,
n_redundant=1, random_state=42,
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
joblib.dump(model, "models/model_v1.joblib")
print("Model saved to models/model_v1.joblib")
Architecture Overview
Step 1 — Project Setup
1.1 Create the Project Structure
mkdir -p fastapi-ml-api/app
mkdir -p fastapi-ml-api/models
cd fastapi-ml-api
1.2 Create a Virtual Environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
1.3 Install Dependencies
pip install fastapi uvicorn pydantic scikit-learn joblib numpy
Create requirements.txt:
fastapi>=0.100.0
uvicorn>=0.23.0
pydantic>=2.0.0
scikit-learn>=1.3.0
joblib>=1.3.0
numpy>=1.24.0
1.4 Copy Your Model
Copy the model file from TP2 into the models/ directory:
cp /path/to/tp2/model_v1.joblib models/model_v1.joblib
Step 2 — Define Pydantic Schemas
Create app/schemas.py:
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
class PredictionInput(BaseModel):
"""Input features for the ML model."""
age: int = Field(
...,
ge=18,
le=120,
description="Applicant age in years",
examples=[35],
)
income: float = Field(
...,
gt=0,
description="Annual income in USD",
examples=[55000.0],
)
credit_score: int = Field(
...,
ge=300,
le=850,
description="Credit score (FICO)",
examples=[720],
)
employment_years: float = Field(
...,
ge=0,
description="Years of employment",
examples=[8.5],
)
loan_amount: float = Field(
...,
gt=0,
description="Requested loan amount in USD",
examples=[25000.0],
)
class Config:
json_schema_extra = {
"example": {
"age": 35,
"income": 55000.0,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000.0,
}
}
class PredictionOutput(BaseModel):
"""Prediction result from the ML model."""
prediction: str = Field(..., description="Predicted class label")
probability: float = Field(
...,
ge=0,
le=1,
description="Prediction confidence (0 to 1)",
)
model_version: str = Field(..., description="Version of the model used")
timestamp: datetime = Field(
default_factory=datetime.utcnow,
description="UTC timestamp of the prediction",
)
class HealthResponse(BaseModel):
"""Health check response."""
status: str = Field(..., description="Service status")
model_loaded: bool = Field(..., description="Whether the model is loaded")
model_version: str = Field(..., description="Current model version")
timestamp: str = Field(..., description="Current UTC time")
class ErrorResponse(BaseModel):
"""Standard error response."""
error_code: str = Field(..., description="Machine-readable error code")
message: str = Field(..., description="Human-readable error message")
details: Optional[list] = Field(None, description="Additional error details")
- Validation: FastAPI automatically rejects requests that don't match the schema
- Documentation: Swagger UI displays field descriptions, types, and constraints
- Serialization: Response data is automatically formatted to match the output schema
Step 3 — Create the ML Service
Create app/ml_service.py:
import joblib
import numpy as np
from pathlib import Path
class MLService:
"""Handles model loading and inference."""
def __init__(self):
self.model = None
self.model_version = "unknown"
self.feature_names = [
"age", "income", "credit_score",
"employment_years", "loan_amount",
]
def load_model(self, model_path: str) -> None:
"""Load a serialized model from disk."""
path = Path(model_path)
if not path.exists():
raise FileNotFoundError(
f"Model file not found: {model_path}"
)
self.model = joblib.load(path)
self.model_version = path.stem
print(f"[MLService] Model loaded: {self.model_version}")
def predict(self, features: dict) -> dict:
"""
Run inference on input features.
Returns prediction label and probability.
"""
if self.model is None:
raise RuntimeError("Model is not loaded")
feature_array = np.array([[
features["age"],
features["income"],
features["credit_score"],
features["employment_years"],
features["loan_amount"],
]])
prediction = self.model.predict(feature_array)[0]
probabilities = self.model.predict_proba(feature_array)[0]
confidence = float(max(probabilities))
label = "approved" if prediction == 1 else "denied"
return {
"prediction": label,
"probability": round(confidence, 4),
"model_version": self.model_version,
}
@property
def is_ready(self) -> bool:
return self.model is not None
# Singleton instance
ml_service = MLService()
Step 4 — Build the FastAPI Application
Create app/main.py:
from contextlib import asynccontextmanager
from datetime import datetime
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from app.schemas import (
PredictionInput,
PredictionOutput,
HealthResponse,
ErrorResponse,
)
from app.ml_service import ml_service
# --- Lifespan: load model at startup ---
@asynccontextmanager
async def lifespan(app: FastAPI):
try:
ml_service.load_model("models/model_v1.joblib")
except FileNotFoundError as e:
print(f"[WARNING] {e}. API will start in degraded mode.")
yield
print("[INFO] Shutting down API...")
# --- FastAPI App ---
app = FastAPI(
title="Loan Prediction API",
description="ML-powered loan approval prediction service built in TP3",
version="1.0.0",
lifespan=lifespan,
openapi_tags=[
{
"name": "Predictions",
"description": "Submit features and receive ML predictions",
},
{
"name": "System",
"description": "Health checks and service monitoring",
},
],
)
# --- CORS Middleware ---
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
# --- Health Check ---
@app.get(
"/health",
response_model=HealthResponse,
tags=["System"],
summary="Check service health",
)
def health_check():
"""Returns the current health status of the API and model."""
return HealthResponse(
status="healthy" if ml_service.is_ready else "degraded",
model_loaded=ml_service.is_ready,
model_version=ml_service.model_version,
timestamp=datetime.utcnow().isoformat(),
)
# --- Prediction Endpoint ---
@app.post(
"/api/v1/predict",
response_model=PredictionOutput,
responses={
422: {"model": ErrorResponse, "description": "Validation error"},
503: {"model": ErrorResponse, "description": "Model not available"},
500: {"model": ErrorResponse, "description": "Prediction failed"},
},
tags=["Predictions"],
summary="Get a loan approval prediction",
)
def predict(input_data: PredictionInput):
"""
Submit loan application features and receive a prediction.
The model returns:
- **prediction**: "approved" or "denied"
- **probability**: confidence score between 0 and 1
- **model_version**: which model version produced the result
"""
# Check model availability
if not ml_service.is_ready:
raise HTTPException(
status_code=503,
detail="Model is not loaded. Service is in degraded mode.",
)
# Run prediction
try:
features = input_data.model_dump()
result = ml_service.predict(features)
return PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
timestamp=datetime.utcnow(),
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Prediction failed: {str(e)}",
)
# --- Root ---
@app.get("/", tags=["System"])
def root():
"""API root — returns basic service information."""
return {
"service": "Loan Prediction API",
"version": "1.0.0",
"docs": "/docs",
"health": "/health",
}
Step 5 — Run and Test
5.1 Start the Server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
You should see:
[MLService] Model loaded: model_v1
INFO: Uvicorn running on http://0.0.0.0:8000
INFO: Started reloader process
5.2 Access Swagger Documentation
Open your browser and navigate to: http://localhost:8000/docs
You should see the interactive Swagger UI with:
- Predictions tag →
POST /api/v1/predict - System tag →
GET /health,GET /
5.3 Test the Health Endpoint
curl http://localhost:8000/health
Expected response:
{
"status": "healthy",
"model_loaded": true,
"model_version": "model_v1",
"timestamp": "2026-02-23T14:30:00.000000"
}
5.4 Test the Prediction Endpoint
curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"income": 55000,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000
}'
Expected response:
{
"prediction": "approved",
"probability": 0.87,
"model_version": "model_v1",
"timestamp": "2026-02-23T14:30:05.123456"
}
5.5 Test Validation Errors
Send invalid data to verify Pydantic validation:
curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": -5,
"income": 55000,
"credit_score": 720,
"employment_years": 8,
"loan_amount": 25000
}'
Expected: 422 Unprocessable Entity with details about the age field.
curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"income": 55000
}'
Expected: 422 with details about missing required fields.
Step 6 — Test with Swagger UI
- Open
http://localhost:8000/docsin your browser - Click on POST /api/v1/predict
- Click Try it out
- The example JSON is pre-filled from your schema
- Click Execute
- Observe the response code, body, and headers
During development, Swagger UI is faster than writing curl commands. Use it to:
- Test different inputs quickly
- See exact request/response formats
- Verify error responses
- Share API documentation with teammates
Step 7 — Final Project Structure
Your completed project should look like:
fastapi-ml-api/
├── app/
│ ├── __init__.py # empty
│ ├── main.py # FastAPI application
│ ├── schemas.py # Pydantic models
│ └── ml_service.py # Model loading & inference
├── models/
│ └── model_v1.joblib # Serialized ML model
├── requirements.txt
└── venv/
Verification Checklist
Before marking this lab as complete, verify:
-
uvicornstarts without errors -
GET /healthreturns{"status": "healthy", "model_loaded": true} -
POST /api/v1/predictwith valid data returns a prediction - Invalid data (negative age, missing fields) returns
422 - Swagger UI at
/docsshows all endpoints with schemas - Response includes
model_versionandtimestamp
Bonus Challenges
Challenge 1: Add a batch prediction endpoint
Add a POST /api/v1/predict/batch endpoint that accepts a list of inputs:
from typing import List
class BatchInput(BaseModel):
inputs: List[PredictionInput] = Field(..., min_length=1, max_length=50)
class BatchOutput(BaseModel):
predictions: List[PredictionOutput]
total: int
@app.post("/api/v1/predict/batch", response_model=BatchOutput, tags=["Predictions"])
def predict_batch(batch: BatchInput):
results = []
for item in batch.inputs:
features = item.model_dump()
result = ml_service.predict(features)
results.append(PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
))
return BatchOutput(predictions=results, total=len(results))
Challenge 2: Add request timing middleware
import time
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
class TimingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
ms = (time.perf_counter() - start) * 1000
response.headers["X-Response-Time-Ms"] = f"{ms:.2f}"
return response
app.add_middleware(TimingMiddleware)
Challenge 3: Add API key authentication
from fastapi import Depends, Header, HTTPException
API_KEYS = {"sk_test_abc123", "sk_test_def456"}
async def verify_api_key(x_api_key: str = Header(...)):
if x_api_key not in API_KEYS:
raise HTTPException(status_code=401, detail="Invalid API key")
return x_api_key
@app.post("/api/v1/predict", dependencies=[Depends(verify_api_key)])
def predict(input_data: PredictionInput):
...
Test with:
curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-H "X-API-Key: sk_test_abc123" \
-d '{"age": 35, "income": 55000, "credit_score": 720, "employment_years": 8, "loan_amount": 25000}'
Common Issues
| Issue | Solution |
|---|---|
ModuleNotFoundError: app.schemas | Make sure app/__init__.py exists (can be empty) |
FileNotFoundError: model_v1.joblib | Check that the model file is in models/ relative to where you run uvicorn |
| Port 8000 already in use | Use --port 8001 or kill the existing process |
| Changes not reflected | Ensure --reload flag is set with uvicorn |
| 422 errors on valid-looking data | Check field types — Pydantic is strict (e.g., "35" is not an int) |