إنتقل إلى المحتوى الرئيسي

REST API Concepts for AI

Theory 45 min

What is an API?

An API (Application Programming Interface) is a contract that defines how two software components communicate with each other. In the context of AI deployment, an API is the bridge between your trained model and the outside world — applications, users, and other services that want to consume predictions.

The Restaurant Analogy

The most intuitive way to understand an API is to think of a restaurant:

RestaurantAPI World
CustomerClient application (web app, mobile app, another service)
MenuAPI documentation (available endpoints, expected inputs)
OrderHTTP request with input data (JSON payload)
WaiterAPI server (receives requests, routes them, returns responses)
KitchenML model (processes input, generates prediction)
Dish servedHTTP response with prediction results
ReceiptResponse status code (200 OK, 400 Bad Request, etc.)
Key Insight

Just like a waiter doesn't need to know how to cook, an API doesn't need to expose the internal workings of your model. The client only needs to know what to send and what to expect back.


REST Architecture

REST (Representational State Transfer) is an architectural style for designing networked applications. A REST API follows a set of constraints that make it scalable, stateless, and easy to understand.

REST Principles

PrincipleDescriptionAI API Example
StatelessEach request contains all information needed to process itEvery prediction request includes the full input features
Client-ServerSeparation between the consumer and the providerWeb app (client) is separate from the model server
Uniform InterfaceStandard HTTP methods and URI conventionsPOST /api/v1/predict for predictions
Resource-BasedEverything is a resource identified by a URI/models, /predictions, /health
CacheableResponses can be cached when appropriateCache repeated predictions for identical inputs
Layered SystemClient cannot tell if connected directly or via intermediaryLoad balancer sits between client and API

REST API Architecture for AI


HTTP Methods

HTTP methods define the action you want to perform on a resource. For AI APIs, some methods are more common than others.

MethodActionIdempotentSafeAI API Usage
GETRetrieve data✅ Yes✅ YesGet model info, health check, list available models
POSTCreate/Submit data❌ No❌ NoSubmit features for prediction, upload training data
PUTReplace entirely✅ Yes❌ NoReplace a model version
PATCHPartial update❌ No❌ NoUpdate model configuration
DELETERemove resource✅ Yes❌ NoRemove a deployed model

Common AI API Endpoints

GET    /api/v1/health              → Check if the service is running
GET /api/v1/models → List available models
GET /api/v1/models/{id} → Get details about a specific model
POST /api/v1/predict → Submit features, receive prediction
POST /api/v1/predict/batch → Submit multiple inputs for batch prediction
GET /api/v1/predict/{id} → Retrieve a past prediction result
DELETE /api/v1/models/{id} → Remove a deployed model
Why POST for predictions?

Even though a prediction doesn't "create" a resource in the traditional sense, we use POST because:

  1. Input features can be complex (nested objects, arrays) — too large for URL parameters
  2. The request has a body (JSON payload)
  3. Predictions may have side effects (logging, billing)

HTTP Status Codes

Status codes tell the client what happened with their request. They are grouped by category.

Status Code Families

RangeCategoryMeaning
1xxInformationalRequest received, processing continues
2xxSuccessRequest successfully processed
3xxRedirectionFurther action needed
4xxClient ErrorProblem with the request
5xxServer ErrorProblem on the server

Essential Status Codes for AI APIs

CodeNameWhen to UseAI API Example
200OKRequest succeededPrediction returned successfully
201CreatedResource createdNew model uploaded and registered
204No ContentSuccess, no bodyModel deleted successfully
400Bad RequestInvalid input formatJSON syntax error in request body
401UnauthorizedMissing authenticationNo API key provided
403ForbiddenInsufficient permissionsAPI key lacks prediction access
404Not FoundResource doesn't existModel ID not found
422Unprocessable EntityValidation failedFeature values out of expected range
429Too Many RequestsRate limit exceededClient sent too many prediction requests
500Internal Server ErrorUnexpected server failureModel crashed during inference
503Service UnavailableServer not readyModel still loading at startup
422 vs 400
  • 400 Bad Request: The JSON itself is malformed (syntax error)
  • 422 Unprocessable Entity: The JSON is valid, but the data doesn't pass validation (e.g., negative age, missing required field)

FastAPI uses 422 by default for validation errors from Pydantic models.


JSON Request/Response Format

REST APIs communicate using JSON (JavaScript Object Notation). For AI APIs, you need to design clear input/output schemas.

Prediction Request

{
"features": {
"age": 35,
"income": 55000,
"credit_score": 720,
"employment_years": 8,
"loan_amount": 25000
},
"options": {
"explain": true,
"threshold": 0.5
}
}

Prediction Response

{
"prediction": "approved",
"probability": 0.87,
"confidence": "high",
"model_version": "loan-classifier-v2.1",
"timestamp": "2026-02-23T14:30:00Z",
"explanation": {
"top_features": [
{"feature": "credit_score", "importance": 0.42},
{"feature": "income", "importance": 0.31},
{"feature": "employment_years", "importance": 0.15}
]
}
}

Error Response

{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid input features",
"details": [
{
"field": "age",
"message": "Value must be between 18 and 120",
"received": -5
}
]
},
"timestamp": "2026-02-23T14:31:00Z",
"request_id": "req_abc123"
}

API Design Best Practices

1. Endpoint Naming Conventions

ConventionGoodBad
Use nouns, not verbs/api/v1/predictions/api/v1/makePrediction
Use plural nouns/api/v1/models/api/v1/model
Use kebab-case/api/v1/model-versions/api/v1/modelVersions
Version your API/api/v1/predict/predict
Use hierarchy for relations/api/v1/models/{id}/predictions/api/v1/model-predictions

2. Request Validation

Always validate input data before sending it to your model:

from pydantic import BaseModel, Field, validator

class PredictionInput(BaseModel):
age: int = Field(..., ge=18, le=120, description="Customer age")
income: float = Field(..., gt=0, description="Annual income in USD")
credit_score: int = Field(..., ge=300, le=850)

@validator("income")
def income_must_be_reasonable(cls, v):
if v > 10_000_000:
raise ValueError("Income seems unrealistically high")
return v
Why validate?
  • Prevents your model from receiving nonsensical inputs
  • Returns clear error messages to clients
  • Avoids silent failures (model returns a prediction for garbage input)
  • Protects against injection attacks

3. Consistent Response Format

Always return responses in a consistent envelope:

{
"status": "success", # or "error"
"data": { ... }, # response payload
"meta": { # metadata
"model_version": "v2.1",
"response_time_ms": 45,
"request_id": "req_abc123"
}
}

Authentication and Security

Protecting your AI API is critical — you don't want unauthorized users running predictions (which consume compute resources and may access sensitive models).

API Keys

The simplest authentication method. The client includes a secret key in request headers.

GET /api/v1/models HTTP/1.1
Host: api.example.com
X-API-Key: sk_live_abc123def456
ProsCons
Simple to implementNo built-in expiration
Easy for clients to useHard to manage permissions per key
Works for server-to-serverVulnerable if exposed in client-side code

JWT (JSON Web Tokens) — Overview

JWT is a more advanced authentication mechanism where the server issues a signed token that the client includes in subsequent requests.

A JWT token has three parts: Header (algorithm), Payload (claims/permissions), and Signature (verification).

When to use what?
  • API Keys: Simple internal services, prototyping, server-to-server
  • JWT: Multi-user applications, fine-grained permissions, token expiration needed
  • OAuth 2.0: Third-party access, delegated authorization

CORS (Cross-Origin Resource Sharing)

When a web application at https://myapp.com tries to call your API at https://api.myml.com, the browser blocks it by default. CORS headers tell the browser which origins are allowed.

CORS Configuration Example

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
CORSMiddleware,
allow_origins=["https://myapp.com", "http://localhost:3000"],
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
Security

Never use allow_origins=["*"] in production. This allows any website to call your API, which can lead to abuse and data leaks.


Rate Limiting

Rate limiting controls how many requests a client can make in a given time window. This is essential for AI APIs because each prediction consumes compute resources (CPU/GPU time, memory).

StrategyDescriptionUse Case
Fixed WindowX requests per minute/hourSimple API key quotas
Sliding WindowSmoothed rate over rolling windowPrevents burst abuse
Token BucketAllows short bursts up to a limitAPIs with variable traffic
Per-EndpointDifferent limits for different endpoints/predict = 100/min, /health = unlimited

Rate Limit Response

When a client exceeds the limit, return a 429 Too Many Requests response with helpful headers:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708700000

{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "You have exceeded 100 requests per minute. Please retry after 30 seconds."
}
}

REST vs GraphQL vs gRPC

When building AI APIs, REST is the most common choice, but it's worth understanding the alternatives.

FeatureRESTGraphQLgRPC
ProtocolHTTP/1.1 or HTTP/2HTTP/1.1 or HTTP/2HTTP/2
Data FormatJSONJSONProtocol Buffers (binary)
SchemaOpenAPI (optional)Required (SDL)Required (.proto)
Learning CurveLowMediumHigh
PerformanceGoodGoodExcellent
Browser SupportNativeNativeLimited (needs proxy)
StreamingLimitedSubscriptionsBidirectional
Use CaseGeneral APIs, webFlexible queries, mobileMicroservices, low-latency
AI RelevanceMost common for ML APIsComplex multi-model queriesHigh-throughput inference
For this course

We focus on REST APIs because they are the most widely used, easiest to test, and best supported by tools like Swagger and Postman. If you need extremely low-latency inference between microservices, consider gRPC as a next step.


The Request/Response Lifecycle

Understanding the full lifecycle of an API request helps you debug issues and optimize performance.


Summary

ConceptKey Takeaway
REST APIStandard way to expose ML models via HTTP
HTTP MethodsPOST for predictions, GET for info/health
Status Codes200 = success, 422 = validation error, 500 = server error
JSONUniversal data format for request/response
AuthenticationAPI keys (simple) or JWT (advanced)
CORSRequired for browser-based clients
Rate LimitingProtects compute resources from abuse
REST vs alternativesREST for most AI APIs, gRPC for internal high-throughput

What's Next?

Now that you understand REST API concepts, you'll learn to implement them using two Python frameworks:

  1. FastAPI — Modern, async, auto-documented (next section)
  2. Flask — Lightweight, flexible, widely used
Vocabulary Quick Reference
TermDefinition
EndpointA specific URL path that accepts requests (e.g., /api/v1/predict)
PayloadThe data sent in the body of a request or response
SerializationConverting data structures to a transferable format (JSON)
IdempotentMaking the same request multiple times produces the same result
StatelessServer doesn't remember previous requests
MiddlewareCode that runs between receiving a request and returning a response
SchemaA formal description of the expected data structure