AI Service API Reference
Complete REST API documentation for AI service endpoints.
Base URL
All AI endpoints are prefixed with /ai:
Endpoints Overview
-
POST
/ai/chat
Send chat message and receive AI response
-
POST
/ai/chat/stream
Stream AI responses with Server-Sent Events
-
GET
/ai/conversations
List user conversations with metadata
-
GET
/ai/conversations/{id}
Get conversation with full message history
-
GET
/ai/health
Check AI service health status
-
GET
/ai/version
Get service version and capabilities
Chat Endpoints
POST /ai/chat
Send a chat message and receive AI response.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
message |
string | ✅ Yes | User's chat message |
conversation_id |
string | null | ❌ No | Existing conversation ID (creates new if null) |
user_id |
string | ❌ No | User identifier (default: "api-user") |
Response:
{
"message_id": "uuid",
"content": "AI response text",
"conversation_id": "uuid",
"response_time_ms": 1234.5
}
Examples:
import httpx
response = httpx.post( # (1)!
"http://localhost:8000/ai/chat",
json={
"message": "What is async/await in Python?", # (2)!
"user_id": "my-user" # (3)!
}
)
data = response.json()
print(f"AI: {data['content']}") # (4)!
print(f"Conversation: {data['conversation_id']}") # (5)!
- POST request to the chat endpoint
- The user's message - this is what gets sent to the AI
- User identifier for conversation tracking (optional, defaults to "api-user")
- Extract and print the AI's response text
- Save this conversation_id to continue the conversation in future requests
const response = await fetch('http://localhost:8000/ai/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: 'How do I handle errors in async functions?',
user_id: 'web-user'
})
});
const data = await response.json();
console.log(`AI: ${data.content}`);
Continue Conversation:
# First message
curl -X POST http://localhost:8000/ai/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is FastAPI?"}' \
| jq -r '.conversation_id' > conv_id.txt
# Follow-up message (maintains context)
curl -X POST http://localhost:8000/ai/chat \
-H "Content-Type: application/json" \
-d "{
\"message\": \"Can you show me an example?\",
\"conversation_id\": \"$(cat conv_id.txt)\"
}"
POST /ai/chat/stream
Stream chat response with Server-Sent Events (SSE).
Request Body:
Same as /ai/chat:
Response:
Server-Sent Events stream with the following event types:
Event: connect
Event: chunk (repeated for each content chunk)
event: chunk
data: {
"content": "text delta",
"is_final": false,
"is_delta": true,
"message_id": "uuid",
"conversation_id": "uuid",
"timestamp": "2024-01-15T10:30:00Z"
}
Event: final
event: final
data: {
"content": "complete response",
"is_final": true,
"is_delta": false,
"message_id": "uuid",
"conversation_id": "uuid",
"timestamp": "2024-01-15T10:30:05Z",
"response_time_ms": 1234.5,
"provider": "groq",
"model": "llama-3.1-70b-versatile"
}
Event: complete
Event: error (on error)
Examples:
const eventSource = new EventSource( // (1)!
'/ai/chat/stream?' + new URLSearchParams({
message: 'Explain async programming',
user_id: 'web-user'
})
);
let fullResponse = '';
eventSource.addEventListener('chunk', (e) => { // (2)!
const data = JSON.parse(e.data);
fullResponse += data.content;
updateUI(fullResponse); // (3)!
});
eventSource.addEventListener('final', (e) => { // (4)!
const data = JSON.parse(e.data);
console.log('Complete response:', data.content);
console.log('Response time:', data.response_time_ms);
});
eventSource.addEventListener('error', (e) => { // (5)!
const data = JSON.parse(e.data);
console.error('Error:', data.detail);
eventSource.close();
});
eventSource.addEventListener('complete', (e) => { // (6)!
console.log('Stream complete');
eventSource.close();
});
- Create EventSource connection to the streaming endpoint
- Handle each streamed chunk as it arrives
- Update UI in real-time as tokens stream in
- Handle final event with complete response and timing
- Handle errors and close connection
- Clean up connection when stream completes
import httpx
import json
url = "http://localhost:8000/ai/chat/stream"
data = {"message": "Explain decorators in Python", "user_id": "my-user"}
with httpx.stream("POST", url, json=data) as response: # (1)!
for line in response.iter_lines(): # (2)!
if line.startswith('event:'):
event_type = line.split(':')[1].strip()
elif line.startswith('data:'):
data = json.loads(line.split('data:')[1])
if event_type == 'chunk': # (3)!
print(data['content'], end='', flush=True)
elif event_type == 'final': # (4)!
print(f"\n\nResponse time: {data['response_time_ms']}ms")
- Open streaming connection with context manager
- Iterate through Server-Sent Events line by line
- Print each chunk as it arrives for real-time output
- Show final response metadata when stream completes
Conversation Management
GET /ai/conversations
List conversations for a user.
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
user_id |
string | ❌ No | "api-user" | User identifier |
limit |
integer | ❌ No | 50 | Maximum conversations to return |
Response:
[
{
"id": "uuid",
"title": "Conversation title",
"message_count": 5,
"last_activity": "2024-01-15T10:30:00Z",
"provider": "groq",
"model": "llama-3.1-70b-versatile"
}
]
Example:
# List conversations
curl "http://localhost:8000/ai/conversations?user_id=my-user&limit=10"
# With Python
import httpx
response = httpx.get(
"http://localhost:8000/ai/conversations",
params={"user_id": "my-user", "limit": 10}
)
conversations = response.json()
for conv in conversations:
print(f"{conv['id']}: {conv['title']} ({conv['message_count']} messages)")
GET /ai/conversations/{conversation_id}
Get a specific conversation with full message history.
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
conversation_id |
string | Conversation UUID |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
user_id |
string | ❌ No | "api-user" | User identifier for access control |
Response:
{
"id": "uuid",
"title": "Conversation title",
"provider": "groq",
"model": "llama-3.1-70b-versatile",
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-15T10:30:00Z",
"message_count": 5,
"messages": [
{
"id": "msg-uuid-1",
"role": "user",
"content": "What is FastAPI?",
"timestamp": "2024-01-15T10:00:00Z"
},
{
"id": "msg-uuid-2",
"role": "assistant",
"content": "FastAPI is a modern web framework...",
"timestamp": "2024-01-15T10:00:02Z"
}
],
"metadata": {
"user_id": "my-user",
"last_response_time_ms": 1234.5
}
}
Example:
Service Status
GET /ai/health
AI service health status and configuration.
Response:
{
"service": "ai",
"status": "healthy",
"enabled": true,
"provider": "groq",
"model": "llama-3.1-70b-versatile",
"agent_ready": true,
"total_conversations": 42,
"configuration_valid": true,
"validation_errors": []
}
Status Values:
- healthy - Service operational and properly configured
- unhealthy - Configuration issues or service errors
- error - Critical service failure
Example:
GET /ai/version
Service version and feature information.
Response:
{
"service": "ai",
"engine": "pydantic-ai",
"version": "1.0",
"features": [
"chat",
"conversation_management",
"multi_provider_support",
"health_monitoring",
"api_endpoints",
"cli_commands"
],
"providers_supported": [
"openai",
"anthropic",
"google",
"groq",
"mistral",
"cohere"
]
}
Example:
Error Handling
HTTP Status Codes
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request (invalid conversation_id, missing required fields) |
| 403 | Forbidden (conversation access denied) |
| 404 | Not found (conversation doesn't exist) |
| 502 | Bad gateway (AI provider error) |
| 503 | Service unavailable (AI service disabled or misconfigured) |
| 500 | Internal server error |
Error Response Format
Common Errors
AI Service Disabled:
Missing API Key:
{
"detail": "AI service error: Missing API key for openai provider. Set OPENAI_API_KEY environment variable."
}
Provider Error:
Conversation Not Found:
Access Denied:
Usage Analytics
GET /ai/usage/stats
Note
Requires database backend (ai[sqlite] or ai[postgres]). Not available with in-memory backend.
Get usage statistics with token counts, costs, and model breakdown.
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
user_id |
string | No | all users | Filter by user |
start_time |
datetime | No | all time | Start of time range |
end_time |
datetime | No | now | End of time range |
recent_limit |
integer | No | 10 | Number of recent activities |
Response:
{
"total_tokens": 45230,
"input_tokens": 32100,
"output_tokens": 13130,
"total_cost": 0.47,
"total_requests": 23,
"success_rate": 95.6,
"models": [
{
"model_id": "gpt-4o",
"vendor": "OpenAI",
"requests": 15,
"tokens": 30000,
"cost": 0.35,
"percentage": 65.2
}
],
"recent_activity": [
{
"timestamp": "2024-01-15T10:30:00Z",
"model": "gpt-4o",
"input_tokens": 1500,
"output_tokens": 800,
"cost": 0.02,
"success": true,
"action": "chat"
}
]
}
Example:
# All-time stats
curl http://localhost:8000/ai/usage/stats | jq
# Per-user stats
curl "http://localhost:8000/ai/usage/stats?user_id=my-user"
# Time-range query
curl "http://localhost:8000/ai/usage/stats?start_time=2024-01-01T00:00:00Z&recent_limit=20"
LLM Catalog Endpoints
All LLM catalog endpoints are prefixed with /llm. See LLM Catalog for full documentation.
GET /llm/status
Get catalog statistics.
{
"vendor_count": 32,
"model_count": 1847,
"deployment_count": 2103,
"price_count": 1952,
"top_vendors": [
{"name": "OpenAI", "model_count": 156},
{"name": "Google", "model_count": 89}
]
}
GET /llm/vendors
List all vendors with model counts.
GET /llm/modalities
List modalities (text, image, audio, video) with model counts.
GET /llm/models
Search and filter models.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pattern |
string | null | Search pattern for model ID/title |
vendor |
string | null | Filter by vendor name |
modality |
string | null | Filter by modality |
limit |
integer | 50 | Max results (1-200) |
include_disabled |
boolean | false | Include disabled models |
# Search models
curl "http://localhost:8000/llm/models?pattern=gpt-4&vendor=openai"
# Filter by modality
curl "http://localhost:8000/llm/models?modality=image&limit=20"
Response:
[
{
"model_id": "gpt-4o",
"vendor": "OpenAI",
"context_window": 128000,
"input_price": 2.50,
"output_price": 10.00,
"released_on": "2024-05-13"
}
]
GET /llm/current
Get current active LLM configuration enriched with catalog data.
{
"provider": "openai",
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 1000,
"context_window": 128000,
"input_price": 2.50,
"output_price": 10.00,
"modalities": ["text", "image"]
}
RAG Endpoints
All RAG endpoints are prefixed with /rag. See RAG for full documentation.
POST /rag/index
Index documents from a path.
Request:
{
"path": "./app",
"collection_name": "code",
"extensions": [".py", ".ts"],
"exclude_patterns": ["**/test_*"]
}
Response:
{
"collection_name": "code",
"documents_added": 1523,
"total_documents": 1523,
"duration_ms": 8300.5
}
POST /rag/search
Semantic search across indexed documents.
Request:
{
"query": "how does authentication work",
"collection_name": "code",
"top_k": 5,
"filter_metadata": null
}
Response:
{
"query": "how does authentication work",
"collection_name": "code",
"results": [
{
"content": "class AuthService:\n ...",
"metadata": {"source": "app/services/auth/service.py", "file_name": "service.py"},
"score": 0.8932,
"rank": 1
}
],
"result_count": 5
}
GET /rag/collections
List all collection names.
GET /rag/collections/{name}
Get collection info (name, document count, metadata).
GET /rag/collections/{name}/files
List indexed files with chunk counts.
{
"collection_name": "code",
"files": [
{"source": "app/services/ai/service.py", "chunks": 45},
{"source": "app/services/auth/service.py", "chunks": 23}
],
"total_files": 87,
"total_chunks": 1523
}
DELETE /rag/collections/{name}
Delete a collection and all its documents.
GET /rag/health
RAG service health status including configuration and validation.
Voice Endpoints
All voice endpoints are prefixed with /voice. See Voice for full documentation.
TTS Catalog
| Endpoint | Description |
|---|---|
GET /voice/catalog/tts/providers |
List TTS providers |
GET /voice/catalog/tts/{provider_id}/models |
List models for provider |
GET /voice/catalog/tts/{provider_id}/voices |
List voices for provider |
STT Catalog
| Endpoint | Description |
|---|---|
GET /voice/catalog/stt/providers |
List STT providers |
GET /voice/catalog/stt/{provider_id}/models |
List STT models |
Settings & Preview
| Endpoint | Description |
|---|---|
GET /voice/settings |
Get current voice settings |
POST /voice/settings |
Update voice settings |
POST /voice/preview |
Generate voice preview (returns audio/mpeg) |
GET /voice/preview/{voice_id} |
Browser-friendly voice preview |
GET /voice/catalog/summary |
Full catalog summary |
Error Handling
HTTP Status Codes
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request (invalid conversation_id, missing required fields) |
| 403 | Forbidden (conversation access denied) |
| 404 | Not found (conversation/collection doesn't exist) |
| 502 | Bad gateway (AI provider error) |
| 503 | Service unavailable (AI service disabled or misconfigured) |
| 500 | Internal server error |
Error Response Format
Common Errors
AI Service Disabled:
Missing API Key:
{
"detail": "AI service error: Missing API key for openai provider. Set OPENAI_API_KEY environment variable."
}
Provider Error:
Conversation Not Found:
Collection Not Found:
Next Steps:
- LLM Catalog - Full catalog documentation
- RAG - Full RAG documentation
- Cost Tracking - Usage analytics
- Voice - Voice capabilities
- Service Layer - Integration patterns and architecture
- CLI Commands - Command-line interface reference
- Examples - Real-world usage patterns