Aller au contenu principal

Construire des API avec FastAPI

Theorie 60 min

Pourquoi FastAPI ?

FastAPI est un framework web Python moderne concu pour construire des API. Il a ete cree par Sebastian Ramirez en 2018 et est rapidement devenu le framework de reference pour servir des modeles ML en production.

Avantages cles

FonctionnaliteDescriptionImportance pour le ML
Support asynchroneConstruit sur Starlette, supporte async/awaitGerer de nombreuses requetes de prediction simultanees
Annotations de typeUtilise nativement les annotations de type PythonCode auto-documente, autocompletion IDE
Validation PydanticValidation automatique des requetes/reponsesRejeter les entrees de modele invalides avant l'inference
Documentation auto-genereeSwagger UI + ReDoc directement integresLes clients peuvent explorer et tester votre API instantanement
Haute performanceUn des frameworks Python les plus rapidesFaible latence pour les predictions en temps reel
Base sur les standardsConstruit sur OpenAPI et JSON SchemaIntegration facile avec n'importe quel client ou outil
ASGI vs WSGI
  • WSGI (Web Server Gateway Interface) : Synchrone — une requete a la fois par worker (utilise par Flask)
  • ASGI (Asynchronous Server Gateway Interface) : Asynchrone — gere de nombreuses requetes simultanees dans un seul worker (utilise par FastAPI)

Pour les API ML recevant de nombreuses requetes de prediction simultanees, ASGI ameliore significativement le debit.


Installation et configuration

Installation des dependances

pip install fastapi uvicorn pydantic
pip install scikit-learn joblib numpy pandas

Structure du projet

Un projet d'API ML bien organise suit cette structure :

ml-api/
├── app/
│ ├── __init__.py
│ ├── main.py # Point d'entree de l'application FastAPI
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Modeles Pydantic requete/reponse
│ ├── routers/
│ │ ├── __init__.py
│ │ └── predictions.py # Gestionnaires de routes de prediction
│ ├── services/
│ │ ├── __init__.py
│ │ └── ml_service.py # Logique de chargement et d'inference du modele
│ └── core/
│ ├── __init__.py
│ └── config.py # Parametres de configuration
├── models/
│ └── model_v1.joblib # Modele ML serialise
├── requirements.txt
└── README.md

Votre premiere application FastAPI

Exemple minimal

from fastapi import FastAPI

app = FastAPI(
title="ML Prediction API",
description="API for serving machine learning predictions",
version="1.0.0",
)

@app.get("/")
def root():
return {"message": "ML Prediction API is running"}

@app.get("/health")
def health_check():
return {"status": "healthy"}

Lancez l'application :

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Cela vous donne :

  • API disponible a http://localhost:8000
  • Swagger UI a http://localhost:8000/docs
  • ReDoc a http://localhost:8000/redoc

Modeles Pydantic pour la validation requete/reponse

Pydantic est la fondation de la validation des donnees de FastAPI. Vous definissez des classes Python avec des annotations de type, et Pydantic valide automatiquement les donnees entrantes.

Definition des schemas d'entree

from pydantic import BaseModel, Field
from typing import Optional, List
from enum import Enum

class LoanPurpose(str, Enum):
home = "home"
car = "car"
education = "education"
personal = "personal"

class PredictionInput(BaseModel):
"""Input features for loan approval prediction."""

age: int = Field(
...,
ge=18,
le=120,
description="Applicant age in years",
example=35,
)
income: float = Field(
...,
gt=0,
description="Annual income in USD",
example=55000.0,
)
credit_score: int = Field(
...,
ge=300,
le=850,
description="Credit score (FICO)",
example=720,
)
employment_years: float = Field(
...,
ge=0,
description="Years of employment",
example=8.5,
)
loan_amount: float = Field(
...,
gt=0,
description="Requested loan amount in USD",
example=25000.0,
)
loan_purpose: LoanPurpose = Field(
...,
description="Purpose of the loan",
example="home",
)

class Config:
json_schema_extra = {
"example": {
"age": 35,
"income": 55000.0,
"credit_score": 720,
"employment_years": 8.5,
"loan_amount": 25000.0,
"loan_purpose": "home",
}
}

Definition des schemas de sortie

from datetime import datetime

class PredictionOutput(BaseModel):
"""Prediction result from the ML model."""

prediction: str = Field(..., description="Predicted class label")
probability: float = Field(
..., ge=0, le=1, description="Prediction confidence"
)
model_version: str = Field(..., description="Model version used")
timestamp: datetime = Field(
default_factory=datetime.utcnow,
description="Prediction timestamp",
)

class ErrorResponse(BaseModel):
"""Standard error response."""

error_code: str
message: str
details: Optional[List[str]] = None
Aide-memoire de validation Field
ContrainteUtilisationExemple
... (Ellipsis)Champ obligatoireField(...)
default=Valeur par defautField(default=0.5)
ge=, gt=Superieur a (ou egal)Field(ge=0)
le=, lt=Inferieur a (ou egal)Field(le=100)
min_length=Longueur minimale de chaineField(min_length=1)
max_length=Longueur maximale de chaineField(max_length=255)
regex=Correspondance de motifField(regex=r"^[a-z]+$")

Chargement et service d'un modele ML

Le service ML

Creez une classe de service qui charge le modele une seule fois au demarrage et le reutilise pour chaque requete :

import joblib
import numpy as np
from pathlib import Path

class MLService:
"""Handles model loading and inference."""

def __init__(self):
self.model = None
self.model_version = "unknown"
self.feature_names = [
"age", "income", "credit_score",
"employment_years", "loan_amount",
]

def load_model(self, model_path: str):
"""Load a serialized model from disk."""
path = Path(model_path)
if not path.exists():
raise FileNotFoundError(f"Model not found: {model_path}")

self.model = joblib.load(path)
self.model_version = path.stem
return self

def predict(self, features: dict) -> dict:
"""Run inference on input features."""
if self.model is None:
raise RuntimeError("Model not loaded")

feature_array = np.array([[
features["age"],
features["income"],
features["credit_score"],
features["employment_years"],
features["loan_amount"],
]])

prediction = self.model.predict(feature_array)[0]
probabilities = self.model.predict_proba(feature_array)[0]

return {
"prediction": "approved" if prediction == 1 else "denied",
"probability": float(max(probabilities)),
"model_version": self.model_version,
}

ml_service = MLService()

Integration dans FastAPI avec les evenements Lifespan

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
"""Load model at startup, clean up at shutdown."""
ml_service.load_model("models/model_v1.joblib")
print(f"Model loaded: {ml_service.model_version}")
yield
print("Shutting down, releasing resources...")

app = FastAPI(
title="ML Prediction API",
version="1.0.0",
lifespan=lifespan,
)
Charger le modele UNE SEULE FOIS

Ne chargez jamais le modele a l'interieur d'un gestionnaire de requete. Deserialiser un modele depuis le disque a chaque requete ajoute une latence massive. Utilisez l'evenement lifespan ou un singleton global pour le charger une seule fois au demarrage.


Creation de l'endpoint de prediction

Route de prediction complete

from fastapi import FastAPI, HTTPException
from datetime import datetime

@app.post(
"/api/v1/predict",
response_model=PredictionOutput,
summary="Get a loan approval prediction",
tags=["Predictions"],
)
def predict(input_data: PredictionInput):
"""
Submit loan application features and receive
an approval/denial prediction with confidence score.
"""
try:
features = input_data.model_dump(exclude={"loan_purpose"})
result = ml_service.predict(features)

return PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
timestamp=datetime.utcnow(),
)

except RuntimeError as e:
raise HTTPException(
status_code=503,
detail=f"Model not available: {str(e)}",
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Prediction failed: {str(e)}",
)

Endpoint de verification de sante

@app.get("/health", tags=["System"])
def health_check():
"""Check if the API and model are ready."""
model_loaded = ml_service.model is not None
return {
"status": "healthy" if model_loaded else "degraded",
"model_loaded": model_loaded,
"model_version": ml_service.model_version,
"timestamp": datetime.utcnow().isoformat(),
}

Injection de dependances

Le systeme d'injection de dependances de FastAPI vous permet de partager de la logique entre les endpoints de maniere propre. C'est utile pour l'authentification, les connexions a la base de donnees ou l'acces au modele.

from fastapi import Depends, Header, HTTPException

async def verify_api_key(x_api_key: str = Header(...)):
"""Validate the API key from request headers."""
valid_keys = {"sk_live_abc123", "sk_live_def456"}
if x_api_key not in valid_keys:
raise HTTPException(
status_code=401,
detail="Invalid API key",
)
return x_api_key

@app.post("/api/v1/predict", dependencies=[Depends(verify_api_key)])
def predict(input_data: PredictionInput):
# Only reached if API key is valid
...

Gestion des erreurs

Gestionnaires d'exceptions personnalises

from fastapi import Request
from fastapi.responses import JSONResponse

class ModelNotLoadedError(Exception):
pass

class PredictionError(Exception):
def __init__(self, detail: str):
self.detail = detail

@app.exception_handler(ModelNotLoadedError)
async def model_not_loaded_handler(request: Request, exc: ModelNotLoadedError):
return JSONResponse(
status_code=503,
content={
"error_code": "MODEL_NOT_LOADED",
"message": "The ML model is not available. Please try again later.",
},
)

@app.exception_handler(PredictionError)
async def prediction_error_handler(request: Request, exc: PredictionError):
return JSONResponse(
status_code=500,
content={
"error_code": "PREDICTION_FAILED",
"message": exc.detail,
},
)

Middleware

Le middleware s'execute avant chaque requete et apres chaque reponse. C'est parfait pour la journalisation, le chronometrage et l'ajout d'en-tetes.

Middleware de chronometrage des requetes

import time
from starlette.middleware.base import BaseHTTPMiddleware

class TimingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
start_time = time.perf_counter()
response = await call_next(request)
duration_ms = (time.perf_counter() - start_time) * 1000
response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
return response

app.add_middleware(TimingMiddleware)

Middleware CORS

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
CORSMiddleware,
allow_origins=[
"http://localhost:3000",
"https://myapp.example.com",
],
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)

Endpoints asynchrones vs synchrones

FastAPI supporte les endpoints synchrones et asynchrones. Le choix depend de ce que fait votre endpoint.

ScenarioUtiliserPourquoi
Inference ML (CPU-bound)def (sync)scikit-learn n'est pas async ; FastAPI l'execute dans un pool de threads
Requetes base de donnees (I/O-bound)async defI/O non bloquant, meilleure concurrence
Operations fichiersasync def avec aiofilesNe bloque pas la boucle d'evenements
Appels API externesasync def avec httpxRequetes HTTP concurrentes
# Sync — FastAPI l'execute dans un pool de threads automatiquement
@app.post("/api/v1/predict")
def predict_sync(input_data: PredictionInput):
result = ml_service.predict(input_data.model_dump())
return result

# Async — s'execute sur la boucle d'evenements, ne faites pas de travail lourd CPU ici
@app.post("/api/v1/predict-async")
async def predict_async(input_data: PredictionInput):
import asyncio
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
None, ml_service.predict, input_data.model_dump()
)
return result
Erreur courante

Si vous definissez async def mais appelez ensuite une fonction bloquante (comme joblib.load() ou model.predict()), vous allez bloquer la boucle d'evenements et geler toutes les autres requetes. Utilisez def (sync) pour l'inference ML CPU-bound, ou executez-la explicitement dans un executor.


Endpoint de telechargement de fichiers

Pour les modeles qui traitent des images, de l'audio ou des documents, vous avez besoin du support de telechargement de fichiers.

from fastapi import UploadFile, File
import io
from PIL import Image

@app.post("/api/v1/predict/image", tags=["Predictions"])
async def predict_image(
file: UploadFile = File(..., description="Image file for classification"),
):
if file.content_type not in ["image/jpeg", "image/png"]:
raise HTTPException(
status_code=400,
detail="Only JPEG and PNG images are supported",
)

contents = await file.read()
image = Image.open(io.BytesIO(contents))

# Preprocess and predict (simplified)
result = image_model.predict(image)

return {
"filename": file.filename,
"prediction": result["label"],
"confidence": result["confidence"],
}

Endpoint de prediction par lot

Pour plus d'efficacite, permettez aux clients de soumettre plusieurs entrees dans une seule requete.

from typing import List

class BatchInput(BaseModel):
inputs: List[PredictionInput] = Field(
..., min_length=1, max_length=100,
description="List of prediction inputs (max 100)",
)

class BatchOutput(BaseModel):
predictions: List[PredictionOutput]
total: int
processing_time_ms: float

@app.post("/api/v1/predict/batch", response_model=BatchOutput, tags=["Predictions"])
def predict_batch(batch: BatchInput):
start = time.perf_counter()
results = []

for item in batch.inputs:
features = item.model_dump(exclude={"loan_purpose"})
result = ml_service.predict(features)
results.append(PredictionOutput(
prediction=result["prediction"],
probability=result["probability"],
model_version=result["model_version"],
))

duration = (time.perf_counter() - start) * 1000
return BatchOutput(
predictions=results,
total=len(results),
processing_time_ms=round(duration, 2),
)

Application complete — Tout assembler

from contextlib import asynccontextmanager
from datetime import datetime
from fastapi import FastAPI, HTTPException, Depends, Header
from fastapi.middleware.cors import CORSMiddleware
import time
import joblib
import numpy as np
from pydantic import BaseModel, Field
from typing import Optional

# --- Schemas ---
class PredictionInput(BaseModel):
age: int = Field(..., ge=18, le=120)
income: float = Field(..., gt=0)
credit_score: int = Field(..., ge=300, le=850)
employment_years: float = Field(..., ge=0)
loan_amount: float = Field(..., gt=0)

class PredictionOutput(BaseModel):
prediction: str
probability: float
model_version: str
timestamp: datetime = Field(default_factory=datetime.utcnow)

# --- ML Service ---
class MLService:
def __init__(self):
self.model = None
self.version = "unknown"

def load(self, path: str):
self.model = joblib.load(path)
self.version = "v1.0"

def predict(self, features: dict) -> dict:
arr = np.array([[
features["age"], features["income"],
features["credit_score"], features["employment_years"],
features["loan_amount"],
]])
pred = self.model.predict(arr)[0]
proba = self.model.predict_proba(arr)[0]
return {
"prediction": "approved" if pred == 1 else "denied",
"probability": float(max(proba)),
"model_version": self.version,
}

ml = MLService()

# --- Lifespan ---
@asynccontextmanager
async def lifespan(app: FastAPI):
ml.load("models/model_v1.joblib")
yield

# --- App ---
app = FastAPI(title="Loan Prediction API", version="1.0.0", lifespan=lifespan)

app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_methods=["*"],
allow_headers=["*"],
)

@app.get("/health", tags=["System"])
def health():
return {"status": "healthy", "model": ml.version}

@app.post("/api/v1/predict", response_model=PredictionOutput, tags=["Predictions"])
def predict(data: PredictionInput):
try:
result = ml.predict(data.model_dump())
return PredictionOutput(**result)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

Lancez-le :

uvicorn app.main:app --reload --port 8000

Testez-le :

curl -X POST http://localhost:8000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{"age": 35, "income": 55000, "credit_score": 720, "employment_years": 8, "loan_amount": 25000}'

Cycle de vie d'une requete FastAPI — Resume


Resume

SujetPoint cle
FastAPIFramework Python moderne, rapide et type-safe pour les API
PydanticValidation automatique des entrees/sorties avec messages d'erreur clairs
Evenements lifespanCharger le modele une fois au demarrage, pas par requete
DependancesLogique reutilisable pour l'authentification, l'acces au modele, etc.
MiddlewarePreoccupations transversales (CORS, chronometrage, journalisation)
Sync vs AsyncUtiliser def pour l'inference ML CPU-bound
Telechargement de fichiersUploadFile pour les API de prediction image/document
Predictions par lotTraiter plusieurs entrees dans une seule requete
Reference rapide FastAPI
ActionCode
Creer l'applicationapp = FastAPI(title="...", version="...")
Endpoint GET@app.get("/path")
Endpoint POST@app.post("/path", response_model=Schema)
Lancer le serveuruvicorn app.main:app --reload
Acceder a la dochttp://localhost:8000/docs
Valider l'entreeDefinir une sous-classe de BaseModel
Ajouter un middlewareapp.add_middleware(MiddlewareClass, ...)
Injection de dependances@app.post("/", dependencies=[Depends(fn)])
Lever une erreur HTTPraise HTTPException(status_code=400, detail="...")