AI Service API Reference
Complete REST API documentation for AI service endpoints.
Base URL
All AI endpoints are prefixed with /ai:
Endpoints Overview
-
POST
/ai/chat
Send chat message and receive AI response
-
POST
/ai/chat/stream
Stream AI responses with Server-Sent Events
-
GET
/ai/conversations
List user conversations with metadata
-
GET
/ai/conversations/{id}
Get conversation with full message history
-
GET
/ai/health
Check AI service health status
-
GET
/ai/version
Get service version and capabilities
Chat Endpoints
POST /ai/chat
Send a chat message and receive AI response.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
message |
string | ✅ Yes | User's chat message |
conversation_id |
string | null | ❌ No | Existing conversation ID (creates new if null) |
user_id |
string | ❌ No | User identifier (default: "api-user") |
Response:
{
"message_id": "uuid",
"content": "AI response text",
"conversation_id": "uuid",
"response_time_ms": 1234.5
}
Examples:
import httpx
response = httpx.post( # (1)!
"http://localhost:8000/ai/chat",
json={
"message": "What is async/await in Python?", # (2)!
"user_id": "my-user" # (3)!
}
)
data = response.json()
print(f"AI: {data['content']}") # (4)!
print(f"Conversation: {data['conversation_id']}") # (5)!
- POST request to the chat endpoint
- The user's message - this is what gets sent to the AI
- User identifier for conversation tracking (optional, defaults to "api-user")
- Extract and print the AI's response text
- Save this conversation_id to continue the conversation in future requests
const response = await fetch('http://localhost:8000/ai/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: 'How do I handle errors in async functions?',
user_id: 'web-user'
})
});
const data = await response.json();
console.log(`AI: ${data.content}`);
Continue Conversation:
# First message
curl -X POST http://localhost:8000/ai/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is FastAPI?"}' \
| jq -r '.conversation_id' > conv_id.txt
# Follow-up message (maintains context)
curl -X POST http://localhost:8000/ai/chat \
-H "Content-Type: application/json" \
-d "{
\"message\": \"Can you show me an example?\",
\"conversation_id\": \"$(cat conv_id.txt)\"
}"
POST /ai/chat/stream
Stream chat response with Server-Sent Events (SSE).
Request Body:
Same as /ai/chat:
Response:
Server-Sent Events stream with the following event types:
Event: connect
Event: chunk (repeated for each content chunk)
event: chunk
data: {
"content": "text delta",
"is_final": false,
"is_delta": true,
"message_id": "uuid",
"conversation_id": "uuid",
"timestamp": "2024-01-15T10:30:00Z"
}
Event: final
event: final
data: {
"content": "complete response",
"is_final": true,
"is_delta": false,
"message_id": "uuid",
"conversation_id": "uuid",
"timestamp": "2024-01-15T10:30:05Z",
"response_time_ms": 1234.5,
"provider": "groq",
"model": "llama-3.1-70b-versatile"
}
Event: complete
Event: error (on error)
Examples:
const eventSource = new EventSource( // (1)!
'/ai/chat/stream?' + new URLSearchParams({
message: 'Explain async programming',
user_id: 'web-user'
})
);
let fullResponse = '';
eventSource.addEventListener('chunk', (e) => { // (2)!
const data = JSON.parse(e.data);
fullResponse += data.content;
updateUI(fullResponse); // (3)!
});
eventSource.addEventListener('final', (e) => { // (4)!
const data = JSON.parse(e.data);
console.log('Complete response:', data.content);
console.log('Response time:', data.response_time_ms);
});
eventSource.addEventListener('error', (e) => { // (5)!
const data = JSON.parse(e.data);
console.error('Error:', data.detail);
eventSource.close();
});
eventSource.addEventListener('complete', (e) => { // (6)!
console.log('Stream complete');
eventSource.close();
});
- Create EventSource connection to the streaming endpoint
- Handle each streamed chunk as it arrives
- Update UI in real-time as tokens stream in
- Handle final event with complete response and timing
- Handle errors and close connection
- Clean up connection when stream completes
import httpx
import json
url = "http://localhost:8000/ai/chat/stream"
data = {"message": "Explain decorators in Python", "user_id": "my-user"}
with httpx.stream("POST", url, json=data) as response: # (1)!
for line in response.iter_lines(): # (2)!
if line.startswith('event:'):
event_type = line.split(':')[1].strip()
elif line.startswith('data:'):
data = json.loads(line.split('data:')[1])
if event_type == 'chunk': # (3)!
print(data['content'], end='', flush=True)
elif event_type == 'final': # (4)!
print(f"\n\nResponse time: {data['response_time_ms']}ms")
- Open streaming connection with context manager
- Iterate through Server-Sent Events line by line
- Print each chunk as it arrives for real-time output
- Show final response metadata when stream completes
Conversation Management
GET /ai/conversations
List conversations for a user.
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
user_id |
string | ❌ No | "api-user" | User identifier |
limit |
integer | ❌ No | 50 | Maximum conversations to return |
Response:
[
{
"id": "uuid",
"title": "Conversation title",
"message_count": 5,
"last_activity": "2024-01-15T10:30:00Z",
"provider": "groq",
"model": "llama-3.1-70b-versatile"
}
]
Example:
# List conversations
curl "http://localhost:8000/ai/conversations?user_id=my-user&limit=10"
# With Python
import httpx
response = httpx.get(
"http://localhost:8000/ai/conversations",
params={"user_id": "my-user", "limit": 10}
)
conversations = response.json()
for conv in conversations:
print(f"{conv['id']}: {conv['title']} ({conv['message_count']} messages)")
GET /ai/conversations/{conversation_id}
Get a specific conversation with full message history.
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
conversation_id |
string | Conversation UUID |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
user_id |
string | ❌ No | "api-user" | User identifier for access control |
Response:
{
"id": "uuid",
"title": "Conversation title",
"provider": "groq",
"model": "llama-3.1-70b-versatile",
"created_at": "2024-01-15T10:00:00Z",
"updated_at": "2024-01-15T10:30:00Z",
"message_count": 5,
"messages": [
{
"id": "msg-uuid-1",
"role": "user",
"content": "What is FastAPI?",
"timestamp": "2024-01-15T10:00:00Z"
},
{
"id": "msg-uuid-2",
"role": "assistant",
"content": "FastAPI is a modern web framework...",
"timestamp": "2024-01-15T10:00:02Z"
}
],
"metadata": {
"user_id": "my-user",
"last_response_time_ms": 1234.5
}
}
Example:
Service Status
GET /ai/health
AI service health status and configuration.
Response:
{
"service": "ai",
"status": "healthy",
"enabled": true,
"provider": "groq",
"model": "llama-3.1-70b-versatile",
"agent_ready": true,
"total_conversations": 42,
"configuration_valid": true,
"validation_errors": []
}
Status Values:
- healthy - Service operational and properly configured
- unhealthy - Configuration issues or service errors
- error - Critical service failure
Example:
GET /ai/version
Service version and feature information.
Response:
{
"service": "ai",
"engine": "pydantic-ai",
"version": "1.0",
"features": [
"chat",
"conversation_management",
"multi_provider_support",
"health_monitoring",
"api_endpoints",
"cli_commands"
],
"providers_supported": [
"openai",
"anthropic",
"google",
"groq",
"mistral",
"cohere"
]
}
Example:
Error Handling
HTTP Status Codes
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request (invalid conversation_id, missing required fields) |
| 403 | Forbidden (conversation access denied) |
| 404 | Not found (conversation doesn't exist) |
| 502 | Bad gateway (AI provider error) |
| 503 | Service unavailable (AI service disabled or misconfigured) |
| 500 | Internal server error |
Error Response Format
Common Errors
AI Service Disabled:
Missing API Key:
{
"detail": "AI service error: Missing API key for openai provider. Set OPENAI_API_KEY environment variable."
}
Provider Error:
Conversation Not Found:
Access Denied:
Next Steps:
- Service Layer - Integration patterns and architecture
- CLI Commands - Command-line interface reference
- Examples - Real-world usage patterns