Skip to main content

TP7 - Testing Your AI API

Practical Lab 60 min Intermediate

Objectives

By the end of this lab, you will be able to:

  • Write unit tests for prediction logic and data validation schemas
  • Write integration tests for API endpoints using FastAPI's TestClient
  • Test edge cases and error handling systematically
  • Measure code coverage with pytest-cov
  • Set up a basic test pipeline configuration

Prerequisites

  • Completed TP3 (FastAPI prediction API)
  • Python 3.10+ with pip
  • Your model file models/model_v1.joblib from Module 2
No API from TP3?

If you haven't completed TP3, use the minimal project structure below. All code needed is provided in this lab.


Architecture Overview


Step 1 — Project Setup

1.1 Install Test Dependencies

pip install pytest pytest-cov httpx

1.2 Create the Project Structure

project/
├── app/
│ ├── __init__.py
│ ├── main.py
│ ├── schemas.py
│ └── ml/
│ ├── __init__.py
│ └── model_service.py
├── models/
│ └── model_v1.joblib
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── unit/
│ │ ├── __init__.py
│ │ ├── test_schemas.py
│ │ └── test_model.py
│ └── integration/
│ ├── __init__.py
│ ├── test_api.py
│ ├── test_health.py
│ └── test_edge_cases.py
├── pytest.ini
└── requirements.txt

1.3 Application Code (Reference)

If you don't have a working API from TP3, create these files:

app/schemas.py

from pydantic import BaseModel, Field, field_validator
from typing import List


class PredictionRequest(BaseModel):
features: List[float] = Field(
...,
min_length=5,
max_length=5,
description="List of 5 numerical features",
)

@field_validator("features")
@classmethod
def validate_features(cls, v):
import math
for i, val in enumerate(v):
if math.isnan(val) or math.isinf(val):
raise ValueError(
f"Feature at index {i} contains invalid value: {val}"
)
return v


class PredictionResponse(BaseModel):
prediction: int
confidence: float = Field(ge=0.0, le=1.0)
model_version: str

app/ml/model_service.py

import joblib
import numpy as np
from pathlib import Path

MODEL_PATH = Path("models/model_v1.joblib")
MODEL_VERSION = "1.0.0"


class ModelService:
def __init__(self):
self.model = None
self.version = MODEL_VERSION

def load_model(self):
if not MODEL_PATH.exists():
raise FileNotFoundError(f"Model not found at {MODEL_PATH}")
self.model = joblib.load(MODEL_PATH)

def predict(self, features: list[float]) -> dict:
if self.model is None:
raise RuntimeError("Model not loaded")

X = np.array([features])
prediction = int(self.model.predict(X)[0])
probabilities = self.model.predict_proba(X)[0]
confidence = float(max(probabilities))

return {
"prediction": prediction,
"confidence": confidence,
"model_version": self.version,
}


model_service = ModelService()

app/main.py

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
from app.schemas import PredictionRequest, PredictionResponse
from app.ml.model_service import model_service


@asynccontextmanager
async def lifespan(app: FastAPI):
model_service.load_model()
yield


app = FastAPI(title="AI Prediction API", lifespan=lifespan)


@app.get("/health")
def health_check():
return {
"status": "healthy",
"model_loaded": model_service.model is not None,
"model_version": model_service.version,
}


@app.post("/api/v1/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
try:
result = model_service.predict(request.features)
return PredictionResponse(**result)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

1.4 Create a Sample Model (if needed)

# create_sample_model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import joblib
import numpy as np
from pathlib import Path

X, y = make_classification(
n_samples=1000, n_features=5, n_informative=4,
n_redundant=1, random_state=42,
)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

Path("models").mkdir(exist_ok=True)
joblib.dump(model, "models/model_v1.joblib")
np.save("data/X_train.npy", X[:800])
np.save("data/X_test.npy", X[800:])
np.save("data/y_test.npy", y[800:])
print("Model and data saved successfully")

1.5 Configure pytest

# pytest.ini
[pytest]
testpaths = tests
markers =
unit: Unit tests (fast, no external dependencies)
integration: Integration tests (requires API/model)
slow: Slow tests (skip with -m "not slow")
addopts = -v --tb=short

Step 2 — Shared Fixtures (conftest.py)

Create the shared fixtures that all test files will use:

# tests/conftest.py

import pytest
import numpy as np
import joblib
from fastapi.testclient import TestClient
from app.main import app
from app.ml.model_service import model_service


@pytest.fixture(scope="session", autouse=True)
def load_model():
"""Load the ML model once for the entire test session."""
model_service.load_model()
yield
model_service.model = None


@pytest.fixture
def client():
"""Create a FastAPI test client."""
return TestClient(app)


@pytest.fixture
def valid_features():
"""Return valid input features."""
return [5.1, 3.5, 1.4, 0.2, 2.3]


@pytest.fixture
def valid_payload(valid_features):
"""Return a valid prediction payload."""
return {"features": valid_features}


@pytest.fixture
def sample_array(valid_features):
"""Return features as a NumPy array."""
return np.array([valid_features])


@pytest.fixture
def trained_model():
"""Return the loaded model."""
return model_service.model


@pytest.fixture
def multiple_samples():
"""Return several test samples."""
return [
[5.1, 3.5, 1.4, 0.2, 2.3],
[6.7, 3.0, 5.2, 2.3, 1.1],
[4.9, 2.4, 3.3, 1.0, 0.5],
[7.2, 3.6, 6.1, 2.5, 0.8],
[4.6, 3.1, 1.5, 0.2, 1.9],
]

Step 3 — Unit Tests

3.1 Test Schemas

# tests/unit/test_schemas.py

import pytest
from pydantic import ValidationError
from app.schemas import PredictionRequest, PredictionResponse


class TestPredictionRequest:
"""Test the PredictionRequest schema validation."""

@pytest.mark.unit
def test_valid_request(self):
req = PredictionRequest(features=[1.0, 2.0, 3.0, 4.0, 5.0])
assert len(req.features) == 5
assert all(isinstance(f, float) for f in req.features)

@pytest.mark.unit
def test_integer_features_are_coerced(self):
req = PredictionRequest(features=[1, 2, 3, 4, 5])
assert all(isinstance(f, float) for f in req.features)

@pytest.mark.unit
def test_rejects_empty_features(self):
with pytest.raises(ValidationError) as exc_info:
PredictionRequest(features=[])
assert "min_length" in str(exc_info.value).lower() or "too_short" in str(exc_info.value).lower()

@pytest.mark.unit
def test_rejects_too_few_features(self):
with pytest.raises(ValidationError):
PredictionRequest(features=[1.0, 2.0])

@pytest.mark.unit
def test_rejects_too_many_features(self):
with pytest.raises(ValidationError):
PredictionRequest(features=[1.0] * 10)

@pytest.mark.unit
def test_rejects_string_values(self):
with pytest.raises(ValidationError):
PredictionRequest(features=["a", "b", "c", "d", "e"])

@pytest.mark.unit
def test_rejects_nan_values(self):
with pytest.raises(ValidationError):
PredictionRequest(features=[float("nan"), 1.0, 2.0, 3.0, 4.0])

@pytest.mark.unit
def test_rejects_infinity(self):
with pytest.raises(ValidationError):
PredictionRequest(features=[float("inf"), 1.0, 2.0, 3.0, 4.0])

@pytest.mark.unit
def test_rejects_missing_features_key(self):
with pytest.raises(ValidationError):
PredictionRequest()

@pytest.mark.unit
def test_accepts_negative_values(self):
req = PredictionRequest(features=[-1.0, -2.0, -3.0, -4.0, -5.0])
assert req.features[0] == -1.0

@pytest.mark.unit
def test_accepts_zero_values(self):
req = PredictionRequest(features=[0.0, 0.0, 0.0, 0.0, 0.0])
assert all(f == 0.0 for f in req.features)


class TestPredictionResponse:
"""Test the PredictionResponse schema."""

@pytest.mark.unit
def test_valid_response(self):
resp = PredictionResponse(
prediction=1, confidence=0.95, model_version="1.0.0"
)
assert resp.prediction == 1
assert resp.confidence == 0.95

@pytest.mark.unit
def test_confidence_upper_bound(self):
resp = PredictionResponse(
prediction=0, confidence=1.0, model_version="1.0.0"
)
assert resp.confidence == 1.0

@pytest.mark.unit
def test_confidence_lower_bound(self):
resp = PredictionResponse(
prediction=0, confidence=0.0, model_version="1.0.0"
)
assert resp.confidence == 0.0

@pytest.mark.unit
def test_rejects_confidence_above_one(self):
with pytest.raises(ValidationError):
PredictionResponse(
prediction=1, confidence=1.5, model_version="1.0.0"
)

@pytest.mark.unit
def test_rejects_negative_confidence(self):
with pytest.raises(ValidationError):
PredictionResponse(
prediction=1, confidence=-0.1, model_version="1.0.0"
)

3.2 Test Model Service

# tests/unit/test_model.py

import pytest
import numpy as np
from app.ml.model_service import ModelService


class TestModelService:
"""Test the model prediction logic."""

@pytest.mark.unit
def test_predict_returns_dict(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert isinstance(result, dict)

@pytest.mark.unit
def test_predict_has_required_keys(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert "prediction" in result
assert "confidence" in result
assert "model_version" in result

@pytest.mark.unit
def test_prediction_is_integer(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert isinstance(result["prediction"], int)

@pytest.mark.unit
def test_prediction_is_valid_class(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert result["prediction"] in [0, 1]

@pytest.mark.unit
def test_confidence_is_float(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert isinstance(result["confidence"], float)

@pytest.mark.unit
def test_confidence_in_range(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
assert 0.0 <= result["confidence"] <= 1.0

@pytest.mark.unit
@pytest.mark.parametrize("features", [
[5.1, 3.5, 1.4, 0.2, 2.3],
[6.7, 3.0, 5.2, 2.3, 1.1],
[4.9, 2.4, 3.3, 1.0, 0.5],
])
def test_predict_multiple_inputs(self, trained_model, features):
from app.ml.model_service import model_service
result = model_service.predict(features)
assert result["prediction"] in [0, 1]
assert 0.0 <= result["confidence"] <= 1.0

@pytest.mark.unit
def test_predict_raises_without_model(self):
service = ModelService()
with pytest.raises(RuntimeError, match="Model not loaded"):
service.predict([1.0, 2.0, 3.0, 4.0, 5.0])

@pytest.mark.unit
def test_model_version_format(self, trained_model, valid_features):
from app.ml.model_service import model_service
result = model_service.predict(valid_features)
parts = result["model_version"].split(".")
assert len(parts) == 3
assert all(part.isdigit() for part in parts)

Step 4 — Integration Tests

4.1 Test Health Endpoint

# tests/integration/test_health.py

import pytest


class TestHealthEndpoint:
"""Test the /health endpoint."""

@pytest.mark.integration
def test_health_returns_200(self, client):
response = client.get("/health")
assert response.status_code == 200

@pytest.mark.integration
def test_health_returns_json(self, client):
response = client.get("/health")
assert response.headers["content-type"] == "application/json"

@pytest.mark.integration
def test_health_status_healthy(self, client):
response = client.get("/health")
data = response.json()
assert data["status"] == "healthy"

@pytest.mark.integration
def test_health_model_loaded(self, client):
response = client.get("/health")
data = response.json()
assert data["model_loaded"] is True

@pytest.mark.integration
def test_health_has_model_version(self, client):
response = client.get("/health")
data = response.json()
assert "model_version" in data
assert isinstance(data["model_version"], str)

4.2 Test Prediction Endpoint

# tests/integration/test_api.py

import pytest


class TestPredictEndpoint:
"""Test the /api/v1/predict endpoint."""

@pytest.mark.integration
def test_predict_returns_200(self, client, valid_payload):
response = client.post("/api/v1/predict", json=valid_payload)
assert response.status_code == 200

@pytest.mark.integration
def test_predict_returns_json(self, client, valid_payload):
response = client.post("/api/v1/predict", json=valid_payload)
assert response.headers["content-type"] == "application/json"

@pytest.mark.integration
def test_predict_response_schema(self, client, valid_payload):
response = client.post("/api/v1/predict", json=valid_payload)
data = response.json()
assert "prediction" in data
assert "confidence" in data
assert "model_version" in data

@pytest.mark.integration
def test_predict_valid_class(self, client, valid_payload):
response = client.post("/api/v1/predict", json=valid_payload)
data = response.json()
assert data["prediction"] in [0, 1]

@pytest.mark.integration
def test_predict_confidence_range(self, client, valid_payload):
response = client.post("/api/v1/predict", json=valid_payload)
data = response.json()
assert 0.0 <= data["confidence"] <= 1.0

@pytest.mark.integration
def test_predict_multiple_samples(self, client, multiple_samples):
for features in multiple_samples:
payload = {"features": features}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200
data = response.json()
assert data["prediction"] in [0, 1]

@pytest.mark.integration
def test_predict_consistent_results(self, client, valid_payload):
"""Same input should produce same output (deterministic model)."""
results = []
for _ in range(5):
response = client.post("/api/v1/predict", json=valid_payload)
results.append(response.json())

predictions = [r["prediction"] for r in results]
confidences = [r["confidence"] for r in results]
assert len(set(predictions)) == 1
assert len(set(confidences)) == 1


class TestPredictErrors:
"""Test error handling for the prediction endpoint."""

@pytest.mark.integration
def test_missing_body(self, client):
response = client.post("/api/v1/predict")
assert response.status_code == 422

@pytest.mark.integration
def test_empty_json(self, client):
response = client.post("/api/v1/predict", json={})
assert response.status_code == 422

@pytest.mark.integration
def test_wrong_key_name(self, client):
response = client.post(
"/api/v1/predict", json={"data": [1.0, 2.0, 3.0, 4.0, 5.0]}
)
assert response.status_code == 422

@pytest.mark.integration
def test_string_features(self, client):
response = client.post(
"/api/v1/predict", json={"features": "not a list"}
)
assert response.status_code == 422

@pytest.mark.integration
def test_too_few_features(self, client):
response = client.post(
"/api/v1/predict", json={"features": [1.0, 2.0]}
)
assert response.status_code == 422

@pytest.mark.integration
def test_too_many_features(self, client):
response = client.post(
"/api/v1/predict", json={"features": [1.0] * 10}
)
assert response.status_code == 422

@pytest.mark.integration
def test_invalid_json_body(self, client):
response = client.post(
"/api/v1/predict",
content=b"this is not json",
headers={"Content-Type": "application/json"},
)
assert response.status_code == 422

Step 5 — Edge Case Tests

# tests/integration/test_edge_cases.py

import pytest


class TestEdgeCases:
"""Test boundary conditions and unusual inputs."""

@pytest.mark.integration
def test_empty_features_list(self, client):
response = client.post("/api/v1/predict", json={"features": []})
assert response.status_code == 422

@pytest.mark.integration
def test_null_features(self, client):
response = client.post("/api/v1/predict", json={"features": None})
assert response.status_code == 422

@pytest.mark.integration
def test_nan_in_features(self, client):
payload = {"features": [float("nan"), 1.0, 2.0, 3.0, 4.0]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 422

@pytest.mark.integration
def test_infinity_in_features(self, client):
payload = {"features": [float("inf"), 1.0, 2.0, 3.0, 4.0]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 422

@pytest.mark.integration
def test_all_zeros(self, client):
payload = {"features": [0.0, 0.0, 0.0, 0.0, 0.0]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200

@pytest.mark.integration
def test_negative_values(self, client):
payload = {"features": [-10.0, -5.0, -2.5, -1.0, -0.5]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200
data = response.json()
assert data["prediction"] in [0, 1]

@pytest.mark.integration
def test_very_large_values(self, client):
payload = {"features": [1e6, 1e6, 1e6, 1e6, 1e6]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200

@pytest.mark.integration
def test_very_small_values(self, client):
payload = {"features": [1e-10, 1e-10, 1e-10, 1e-10, 1e-10]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200

@pytest.mark.integration
def test_mixed_types_in_features(self, client):
"""Pydantic should coerce integers to floats."""
payload = {"features": [1, 2.0, 3, 4.0, 5]}
response = client.post("/api/v1/predict", json=payload)
assert response.status_code == 200

@pytest.mark.integration
def test_rapid_sequential_requests(self, client):
"""API should handle 20 rapid requests without errors."""
payload = {"features": [5.1, 3.5, 1.4, 0.2, 2.3]}
responses = [
client.post("/api/v1/predict", json=payload) for _ in range(20)
]
assert all(r.status_code == 200 for r in responses)

@pytest.mark.integration
def test_nonexistent_endpoint(self, client):
response = client.get("/api/v1/nonexistent")
assert response.status_code == 404

@pytest.mark.integration
def test_wrong_http_method(self, client):
response = client.get("/api/v1/predict")
assert response.status_code == 405

Step 6 — Measure Code Coverage

6.1 Run Tests with Coverage

# Run all tests with coverage
pytest --cov=app --cov-report=term-missing -v

Expected output:

tests/unit/test_schemas.py::TestPredictionRequest::test_valid_request PASSED
tests/unit/test_schemas.py::TestPredictionRequest::test_rejects_empty_features PASSED
tests/unit/test_model.py::TestModelService::test_predict_returns_dict PASSED
tests/integration/test_health.py::TestHealthEndpoint::test_health_returns_200 PASSED
tests/integration/test_api.py::TestPredictEndpoint::test_predict_returns_200 PASSED
tests/integration/test_edge_cases.py::TestEdgeCases::test_all_zeros PASSED
...

---------- coverage: platform linux, python 3.11 ----------
Name Stmts Miss Cover Missing
----------------------------------------------------------
app/__init__.py 0 0 100%
app/main.py 18 1 94% 35
app/schemas.py 20 0 100%
app/ml/__init__.py 0 0 100%
app/ml/model_service.py 22 1 95% 15
----------------------------------------------------------
TOTAL 60 2 97%

6.2 Generate HTML Report

pytest --cov=app --cov-report=html -v

Open htmlcov/index.html in your browser to see a detailed visual report.

6.3 Set a Coverage Threshold

# Fail if coverage drops below 85%
pytest --cov=app --cov-fail-under=85 -v

6.4 Run Tests by Category

# Only unit tests
pytest -m unit -v

# Only integration tests
pytest -m integration -v

# Exclude slow tests
pytest -m "not slow" -v

Step 7 — Basic Test Pipeline

7.1 Create a Makefile

# Makefile

.PHONY: test test-unit test-integration test-coverage test-all

test-unit:
pytest tests/unit -v -m unit

test-integration:
pytest tests/integration -v -m integration

test-coverage:
pytest --cov=app --cov-report=term-missing --cov-report=html --cov-fail-under=85 -v

test-all: test-unit test-integration test-coverage

test:
pytest -v

Run:

make test-all

7.2 GitHub Actions Pipeline

# .github/workflows/tests.yml

name: Test Suite

on:
push:
branches: [main, develop]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest pytest-cov httpx

- name: Run unit tests
run: pytest tests/unit -v -m unit

- name: Run integration tests
run: pytest tests/integration -v -m integration

- name: Coverage check
run: pytest --cov=app --cov-fail-under=85 --cov-report=xml

- name: Upload coverage
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml

Validation Checklist

Before moving to the next lab, verify:

  • conftest.py contains shared fixtures (client, valid_features, trained_model)
  • Unit tests pass: pytest tests/unit -v → all PASSED
  • Integration tests pass: pytest tests/integration -v → all PASSED
  • Edge case tests cover: NaN, infinity, empty, null, wrong types
  • Coverage is ≥ 85%: pytest --cov=app --cov-fail-under=85
  • CI configuration is in place (pytest.ini + GitHub Actions)

Commands Summary

CommandDescription
pytest -vRun all tests in verbose mode
pytest tests/unit -vRun unit tests only
pytest tests/integration -vRun integration tests
pytest -m "not slow" -vExclude slow tests
pytest --cov=app --cov-report=term-missingCoverage with missing lines
pytest --cov=app --cov-report=htmlHTML coverage report
pytest --cov=app --cov-fail-under=85Fail if coverage < 85%
pytest -xStop at first failure
pytest -k "test_predict"Run tests containing "test_predict"
Well done!

You now have a complete test suite for your prediction API. In the next lab, you will test the same API with Postman for a complementary approach.