TP7 - Testing Your AI API

Practical Lab 60 min Intermediate

Objectives

By the end of this lab, you will be able to:

Write unit tests for prediction logic and data validation schemas
Write integration tests for API endpoints using FastAPI's TestClient
Test edge cases and error handling systematically
Measure code coverage with pytest-cov
Set up a basic test pipeline configuration

Prerequisites

Completed TP3 (FastAPI prediction API)
Python 3.10+ with pip
Your model file models/model_v1.joblib from Module 2

No API from TP3?

If you haven't completed TP3, use the minimal project structure below. All code needed is provided in this lab.

Architecture Overview

Step 1 — Project Setup

1.1 Install Test Dependencies

pip install pytest pytest-cov httpx

1.2 Create the Project Structure

project/
├── app/
│   ├── __init__.py
│   ├── main.py
│   ├── schemas.py
│   └── ml/
│       ├── __init__.py
│       └── model_service.py
├── models/
│   └── model_v1.joblib
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── unit/
│   │   ├── __init__.py
│   │   ├── test_schemas.py
│   │   └── test_model.py
│   └── integration/
│       ├── __init__.py
│       ├── test_api.py
│       ├── test_health.py
│       └── test_edge_cases.py
├── pytest.ini
└── requirements.txt

1.3 Application Code (Reference)

If you don't have a working API from TP3, create these files:

app/schemas.py

from pydantic import BaseModel, Field, field_validator
from typing import List


class PredictionRequest(BaseModel):
    features: List[float] = Field(
        ...,
        min_length=5,
        max_length=5,
        description="List of 5 numerical features",
    )

    @field_validator("features")
    @classmethod
    def validate_features(cls, v):
        import math
        for i, val in enumerate(v):
            if math.isnan(val) or math.isinf(val):
                raise ValueError(
                    f"Feature at index {i} contains invalid value: {val}"
                )
        return v


class PredictionResponse(BaseModel):
    prediction: int
    confidence: float = Field(ge=0.0, le=1.0)
    model_version: str

app/ml/model_service.py

import joblib
import numpy as np
from pathlib import Path

MODEL_PATH = Path("models/model_v1.joblib")
MODEL_VERSION = "1.0.0"


class ModelService:
    def __init__(self):
        self.model = None
        self.version = MODEL_VERSION

    def load_model(self):
        if not MODEL_PATH.exists():
            raise FileNotFoundError(f"Model not found at {MODEL_PATH}")
        self.model = joblib.load(MODEL_PATH)

    def predict(self, features: list[float]) -> dict:
        if self.model is None:
            raise RuntimeError("Model not loaded")

        X = np.array([features])
        prediction = int(self.model.predict(X)[0])
        probabilities = self.model.predict_proba(X)[0]
        confidence = float(max(probabilities))

        return {
            "prediction": prediction,
            "confidence": confidence,
            "model_version": self.version,
        }


model_service = ModelService()

app/main.py

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
from app.schemas import PredictionRequest, PredictionResponse
from app.ml.model_service import model_service


@asynccontextmanager
async def lifespan(app: FastAPI):
    model_service.load_model()
    yield


app = FastAPI(title="AI Prediction API", lifespan=lifespan)


@app.get("/health")
def health_check():
    return {
        "status": "healthy",
        "model_loaded": model_service.model is not None,
        "model_version": model_service.version,
    }


@app.post("/api/v1/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    try:
        result = model_service.predict(request.features)
        return PredictionResponse(**result)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

1.4 Create a Sample Model (if needed)

# create_sample_model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import joblib
import numpy as np
from pathlib import Path

X, y = make_classification(
    n_samples=1000, n_features=5, n_informative=4,
    n_redundant=1, random_state=42,
)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

Path("models").mkdir(exist_ok=True)
joblib.dump(model, "models/model_v1.joblib")
np.save("data/X_train.npy", X[:800])
np.save("data/X_test.npy", X[800:])
np.save("data/y_test.npy", y[800:])
print("Model and data saved successfully")

1.5 Configure pytest

# pytest.ini
[pytest]
testpaths = tests
markers =
    unit: Unit tests (fast, no external dependencies)
    integration: Integration tests (requires API/model)
    slow: Slow tests (skip with -m "not slow")
addopts = -v --tb=short

Step 2 — Shared Fixtures (conftest.py)

Create the shared fixtures that all test files will use:

# tests/conftest.py

import pytest
import numpy as np
import joblib
from fastapi.testclient import TestClient
from app.main import app
from app.ml.model_service import model_service


@pytest.fixture(scope="session", autouse=True)
def load_model():
    """Load the ML model once for the entire test session."""
    model_service.load_model()
    yield
    model_service.model = None


@pytest.fixture
def client():
    """Create a FastAPI test client."""
    return TestClient(app)


@pytest.fixture
def valid_features():
    """Return valid input features."""
    return [5.1, 3.5, 1.4, 0.2, 2.3]


@pytest.fixture
def valid_payload(valid_features):
    """Return a valid prediction payload."""
    return {"features": valid_features}


@pytest.fixture
def sample_array(valid_features):
    """Return features as a NumPy array."""
    return np.array([valid_features])


@pytest.fixture
def trained_model():
    """Return the loaded model."""
    return model_service.model


@pytest.fixture
def multiple_samples():
    """Return several test samples."""
    return [
        [5.1, 3.5, 1.4, 0.2, 2.3],
        [6.7, 3.0, 5.2, 2.3, 1.1],
        [4.9, 2.4, 3.3, 1.0, 0.5],
        [7.2, 3.6, 6.1, 2.5, 0.8],
        [4.6, 3.1, 1.5, 0.2, 1.9],
    ]

Step 3 — Unit Tests

3.1 Test Schemas

# tests/unit/test_schemas.py

import pytest
from pydantic import ValidationError
from app.schemas import PredictionRequest, PredictionResponse


class TestPredictionRequest:
    """Test the PredictionRequest schema validation."""

    @pytest.mark.unit
    def test_valid_request(self):
        req = PredictionRequest(features=[1.0, 2.0, 3.0, 4.0, 5.0])
        assert len(req.features) == 5
        assert all(isinstance(f, float) for f in req.features)

    @pytest.mark.unit
    def test_integer_features_are_coerced(self):
        req = PredictionRequest(features=[1, 2, 3, 4, 5])
        assert all(isinstance(f, float) for f in req.features)

    @pytest.mark.unit
    def test_rejects_empty_features(self):
        with pytest.raises(ValidationError) as exc_info:
            PredictionRequest(features=[])
        assert "min_length" in str(exc_info.value).lower() or "too_short" in str(exc_info.value).lower()

    @pytest.mark.unit
    def test_rejects_too_few_features(self):
        with pytest.raises(ValidationError):
            PredictionRequest(features=[1.0, 2.0])

    @pytest.mark.unit
    def test_rejects_too_many_features(self):
        with pytest.raises(ValidationError):
            PredictionRequest(features=[1.0] * 10)

    @pytest.mark.unit
    def test_rejects_string_values(self):
        with pytest.raises(ValidationError):
            PredictionRequest(features=["a", "b", "c", "d", "e"])

    @pytest.mark.unit
    def test_rejects_nan_values(self):
        with pytest.raises(ValidationError):
            PredictionRequest(features=[float("nan"), 1.0, 2.0, 3.0, 4.0])

    @pytest.mark.unit
    def test_rejects_infinity(self):
        with pytest.raises(ValidationError):
            PredictionRequest(features=[float("inf"), 1.0, 2.0, 3.0, 4.0])

    @pytest.mark.unit
    def test_rejects_missing_features_key(self):
        with pytest.raises(ValidationError):
            PredictionRequest()

    @pytest.mark.unit
    def test_accepts_negative_values(self):
        req = PredictionRequest(features=[-1.0, -2.0, -3.0, -4.0, -5.0])
        assert req.features[0] == -1.0

    @pytest.mark.unit
    def test_accepts_zero_values(self):
        req = PredictionRequest(features=[0.0, 0.0, 0.0, 0.0, 0.0])
        assert all(f == 0.0 for f in req.features)


class TestPredictionResponse:
    """Test the PredictionResponse schema."""

    @pytest.mark.unit
    def test_valid_response(self):
        resp = PredictionResponse(
            prediction=1, confidence=0.95, model_version="1.0.0"
        )
        assert resp.prediction == 1
        assert resp.confidence == 0.95

    @pytest.mark.unit
    def test_confidence_upper_bound(self):
        resp = PredictionResponse(
            prediction=0, confidence=1.0, model_version="1.0.0"
        )
        assert resp.confidence == 1.0

    @pytest.mark.unit
    def test_confidence_lower_bound(self):
        resp = PredictionResponse(
            prediction=0, confidence=0.0, model_version="1.0.0"
        )
        assert resp.confidence == 0.0

    @pytest.mark.unit
    def test_rejects_confidence_above_one(self):
        with pytest.raises(ValidationError):
            PredictionResponse(
                prediction=1, confidence=1.5, model_version="1.0.0"
            )

    @pytest.mark.unit
    def test_rejects_negative_confidence(self):
        with pytest.raises(ValidationError):
            PredictionResponse(
                prediction=1, confidence=-0.1, model_version="1.0.0"
            )

3.2 Test Model Service

# tests/unit/test_model.py

import pytest
import numpy as np
from app.ml.model_service import ModelService


class TestModelService:
    """Test the model prediction logic."""

    @pytest.mark.unit
    def test_predict_returns_dict(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert isinstance(result, dict)

    @pytest.mark.unit
    def test_predict_has_required_keys(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert "prediction" in result
        assert "confidence" in result
        assert "model_version" in result

    @pytest.mark.unit
    def test_prediction_is_integer(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert isinstance(result["prediction"], int)

    @pytest.mark.unit
    def test_prediction_is_valid_class(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert result["prediction"] in [0, 1]

    @pytest.mark.unit
    def test_confidence_is_float(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert isinstance(result["confidence"], float)

    @pytest.mark.unit
    def test_confidence_in_range(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        assert 0.0 <= result["confidence"] <= 1.0

    @pytest.mark.unit
    @pytest.mark.parametrize("features", [
        [5.1, 3.5, 1.4, 0.2, 2.3],
        [6.7, 3.0, 5.2, 2.3, 1.1],
        [4.9, 2.4, 3.3, 1.0, 0.5],
    ])
    def test_predict_multiple_inputs(self, trained_model, features):
        from app.ml.model_service import model_service
        result = model_service.predict(features)
        assert result["prediction"] in [0, 1]
        assert 0.0 <= result["confidence"] <= 1.0

    @pytest.mark.unit
    def test_predict_raises_without_model(self):
        service = ModelService()
        with pytest.raises(RuntimeError, match="Model not loaded"):
            service.predict([1.0, 2.0, 3.0, 4.0, 5.0])

    @pytest.mark.unit
    def test_model_version_format(self, trained_model, valid_features):
        from app.ml.model_service import model_service
        result = model_service.predict(valid_features)
        parts = result["model_version"].split(".")
        assert len(parts) == 3
        assert all(part.isdigit() for part in parts)

Step 4 — Integration Tests

4.1 Test Health Endpoint

# tests/integration/test_health.py

import pytest


class TestHealthEndpoint:
    """Test the /health endpoint."""

    @pytest.mark.integration
    def test_health_returns_200(self, client):
        response = client.get("/health")
        assert response.status_code == 200

    @pytest.mark.integration
    def test_health_returns_json(self, client):
        response = client.get("/health")
        assert response.headers["content-type"] == "application/json"

    @pytest.mark.integration
    def test_health_status_healthy(self, client):
        response = client.get("/health")
        data = response.json()
        assert data["status"] == "healthy"

    @pytest.mark.integration
    def test_health_model_loaded(self, client):
        response = client.get("/health")
        data = response.json()
        assert data["model_loaded"] is True

    @pytest.mark.integration
    def test_health_has_model_version(self, client):
        response = client.get("/health")
        data = response.json()
        assert "model_version" in data
        assert isinstance(data["model_version"], str)

4.2 Test Prediction Endpoint

# tests/integration/test_api.py

import pytest


class TestPredictEndpoint:
    """Test the /api/v1/predict endpoint."""

    @pytest.mark.integration
    def test_predict_returns_200(self, client, valid_payload):
        response = client.post("/api/v1/predict", json=valid_payload)
        assert response.status_code == 200

    @pytest.mark.integration
    def test_predict_returns_json(self, client, valid_payload):
        response = client.post("/api/v1/predict", json=valid_payload)
        assert response.headers["content-type"] == "application/json"

    @pytest.mark.integration
    def test_predict_response_schema(self, client, valid_payload):
        response = client.post("/api/v1/predict", json=valid_payload)
        data = response.json()
        assert "prediction" in data
        assert "confidence" in data
        assert "model_version" in data

    @pytest.mark.integration
    def test_predict_valid_class(self, client, valid_payload):
        response = client.post("/api/v1/predict", json=valid_payload)
        data = response.json()
        assert data["prediction"] in [0, 1]

    @pytest.mark.integration
    def test_predict_confidence_range(self, client, valid_payload):
        response = client.post("/api/v1/predict", json=valid_payload)
        data = response.json()
        assert 0.0 <= data["confidence"] <= 1.0

    @pytest.mark.integration
    def test_predict_multiple_samples(self, client, multiple_samples):
        for features in multiple_samples:
            payload = {"features": features}
            response = client.post("/api/v1/predict", json=payload)
            assert response.status_code == 200
            data = response.json()
            assert data["prediction"] in [0, 1]

    @pytest.mark.integration
    def test_predict_consistent_results(self, client, valid_payload):
        """Same input should produce same output (deterministic model)."""
        results = []
        for _ in range(5):
            response = client.post("/api/v1/predict", json=valid_payload)
            results.append(response.json())

        predictions = [r["prediction"] for r in results]
        confidences = [r["confidence"] for r in results]
        assert len(set(predictions)) == 1
        assert len(set(confidences)) == 1


class TestPredictErrors:
    """Test error handling for the prediction endpoint."""

    @pytest.mark.integration
    def test_missing_body(self, client):
        response = client.post("/api/v1/predict")
        assert response.status_code == 422

    @pytest.mark.integration
    def test_empty_json(self, client):
        response = client.post("/api/v1/predict", json={})
        assert response.status_code == 422

    @pytest.mark.integration
    def test_wrong_key_name(self, client):
        response = client.post(
            "/api/v1/predict", json={"data": [1.0, 2.0, 3.0, 4.0, 5.0]}
        )
        assert response.status_code == 422

    @pytest.mark.integration
    def test_string_features(self, client):
        response = client.post(
            "/api/v1/predict", json={"features": "not a list"}
        )
        assert response.status_code == 422

    @pytest.mark.integration
    def test_too_few_features(self, client):
        response = client.post(
            "/api/v1/predict", json={"features": [1.0, 2.0]}
        )
        assert response.status_code == 422

    @pytest.mark.integration
    def test_too_many_features(self, client):
        response = client.post(
            "/api/v1/predict", json={"features": [1.0] * 10}
        )
        assert response.status_code == 422

    @pytest.mark.integration
    def test_invalid_json_body(self, client):
        response = client.post(
            "/api/v1/predict",
            content=b"this is not json",
            headers={"Content-Type": "application/json"},
        )
        assert response.status_code == 422

Step 5 — Edge Case Tests

# tests/integration/test_edge_cases.py

import pytest


class TestEdgeCases:
    """Test boundary conditions and unusual inputs."""

    @pytest.mark.integration
    def test_empty_features_list(self, client):
        response = client.post("/api/v1/predict", json={"features": []})
        assert response.status_code == 422

    @pytest.mark.integration
    def test_null_features(self, client):
        response = client.post("/api/v1/predict", json={"features": None})
        assert response.status_code == 422

    @pytest.mark.integration
    def test_nan_in_features(self, client):
        payload = {"features": [float("nan"), 1.0, 2.0, 3.0, 4.0]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 422

    @pytest.mark.integration
    def test_infinity_in_features(self, client):
        payload = {"features": [float("inf"), 1.0, 2.0, 3.0, 4.0]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 422

    @pytest.mark.integration
    def test_all_zeros(self, client):
        payload = {"features": [0.0, 0.0, 0.0, 0.0, 0.0]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 200

    @pytest.mark.integration
    def test_negative_values(self, client):
        payload = {"features": [-10.0, -5.0, -2.5, -1.0, -0.5]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 200
        data = response.json()
        assert data["prediction"] in [0, 1]

    @pytest.mark.integration
    def test_very_large_values(self, client):
        payload = {"features": [1e6, 1e6, 1e6, 1e6, 1e6]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 200

    @pytest.mark.integration
    def test_very_small_values(self, client):
        payload = {"features": [1e-10, 1e-10, 1e-10, 1e-10, 1e-10]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 200

    @pytest.mark.integration
    def test_mixed_types_in_features(self, client):
        """Pydantic should coerce integers to floats."""
        payload = {"features": [1, 2.0, 3, 4.0, 5]}
        response = client.post("/api/v1/predict", json=payload)
        assert response.status_code == 200

    @pytest.mark.integration
    def test_rapid_sequential_requests(self, client):
        """API should handle 20 rapid requests without errors."""
        payload = {"features": [5.1, 3.5, 1.4, 0.2, 2.3]}
        responses = [
            client.post("/api/v1/predict", json=payload) for _ in range(20)
        ]
        assert all(r.status_code == 200 for r in responses)

    @pytest.mark.integration
    def test_nonexistent_endpoint(self, client):
        response = client.get("/api/v1/nonexistent")
        assert response.status_code == 404

    @pytest.mark.integration
    def test_wrong_http_method(self, client):
        response = client.get("/api/v1/predict")
        assert response.status_code == 405

Step 6 — Measure Code Coverage

6.1 Run Tests with Coverage

# Run all tests with coverage
pytest --cov=app --cov-report=term-missing -v

Expected output:

tests/unit/test_schemas.py::TestPredictionRequest::test_valid_request PASSED
tests/unit/test_schemas.py::TestPredictionRequest::test_rejects_empty_features PASSED
tests/unit/test_model.py::TestModelService::test_predict_returns_dict PASSED
tests/integration/test_health.py::TestHealthEndpoint::test_health_returns_200 PASSED
tests/integration/test_api.py::TestPredictEndpoint::test_predict_returns_200 PASSED
tests/integration/test_edge_cases.py::TestEdgeCases::test_all_zeros PASSED
...

---------- coverage: platform linux, python 3.11 ----------
Name                         Stmts   Miss  Cover   Missing
----------------------------------------------------------
app/__init__.py                  0      0   100%
app/main.py                     18      1    94%   35
app/schemas.py                  20      0   100%
app/ml/__init__.py               0      0   100%
app/ml/model_service.py         22      1    95%   15
----------------------------------------------------------
TOTAL                           60      2    97%

6.2 Generate HTML Report

pytest --cov=app --cov-report=html -v

Open htmlcov/index.html in your browser to see a detailed visual report.

6.3 Set a Coverage Threshold

# Fail if coverage drops below 85%
pytest --cov=app --cov-fail-under=85 -v

6.4 Run Tests by Category

# Only unit tests
pytest -m unit -v

# Only integration tests
pytest -m integration -v

# Exclude slow tests
pytest -m "not slow" -v

Step 7 — Basic Test Pipeline

7.1 Create a Makefile

# Makefile

.PHONY: test test-unit test-integration test-coverage test-all

test-unit:
	pytest tests/unit -v -m unit

test-integration:
	pytest tests/integration -v -m integration

test-coverage:
	pytest --cov=app --cov-report=term-missing --cov-report=html --cov-fail-under=85 -v

test-all: test-unit test-integration test-coverage

test:
	pytest -v

Run:

make test-all

7.2 GitHub Actions Pipeline

# .github/workflows/tests.yml

name: Test Suite

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install pytest pytest-cov httpx

      - name: Run unit tests
        run: pytest tests/unit -v -m unit

      - name: Run integration tests
        run: pytest tests/integration -v -m integration

      - name: Coverage check
        run: pytest --cov=app --cov-fail-under=85 --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: ./coverage.xml

Validation Checklist

Before moving to the next lab, verify:

conftest.py contains shared fixtures (client, valid_features, trained_model)
Unit tests pass: pytest tests/unit -v → all PASSED
Integration tests pass: pytest tests/integration -v → all PASSED
Edge case tests cover: NaN, infinity, empty, null, wrong types
Coverage is ≥ 85%: pytest --cov=app --cov-fail-under=85
CI configuration is in place (pytest.ini + GitHub Actions)

Commands Summary

Command	Description
`pytest -v`	Run all tests in verbose mode
`pytest tests/unit -v`	Run unit tests only
`pytest tests/integration -v`	Run integration tests
`pytest -m "not slow" -v`	Exclude slow tests
`pytest --cov=app --cov-report=term-missing`	Coverage with missing lines
`pytest --cov=app --cov-report=html`	HTML coverage report
`pytest --cov=app --cov-fail-under=85`	Fail if coverage < 85%
`pytest -x`	Stop at first failure
`pytest -k "test_predict"`	Run tests containing "test_predict"

Well done!

You now have a complete test suite for your prediction API. In the next lab, you will test the same API with Postman for a complementary approach.

Objectives​

Prerequisites​

Architecture Overview​

Step 1 — Project Setup​

1.1 Install Test Dependencies​

1.2 Create the Project Structure​

1.3 Application Code (Reference)​

1.4 Create a Sample Model (if needed)​

1.5 Configure pytest​

Step 2 — Shared Fixtures (conftest.py)​

Step 3 — Unit Tests​

3.1 Test Schemas​

3.2 Test Model Service​

Step 4 — Integration Tests​

4.1 Test Health Endpoint​

4.2 Test Prediction Endpoint​

Step 5 — Edge Case Tests​

Step 6 — Measure Code Coverage​

6.1 Run Tests with Coverage​

6.2 Generate HTML Report​

6.3 Set a Coverage Threshold​

6.4 Run Tests by Category​

Step 7 — Basic Test Pipeline​

7.1 Create a Makefile​

7.2 GitHub Actions Pipeline​

Validation Checklist​

Commands Summary​