API Reference
MĀRGA provides a fully OpenAI-compatible API with additional monitoring and configuration endpoints.
Base URL
https://marga-449012790678.asia-southeast1.run.appAuthentication
All protected endpoints require authentication via API key in the request header:
Authorization: Bearer your-api-key
# OR
X-API-Key: your-api-keyEndpoints
Chat Completions
The primary endpoint for LLM interactions, fully compatible with OpenAI’s chat completions API.
Endpoint: POST /v1/chat/completions
Headers:
Content-Type: application/jsonAuthorization: Bearer YOUR_API_KEY
Request Body
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Hello, world!"
}
],
"max_tokens": 100,
"temperature": 0.7,
"top_p": 1.0,
"frequency_penalty": 0,
"presence_penalty": 0,
"stop": ["\\n"],
"stream": false,
"user": "user-123"
}Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | ✅ | Model identifier (e.g., gpt-4, claude-3-sonnet, llama3.1:8b) |
messages | array | ✅ | Array of conversation messages |
max_tokens | integer | ❌ | Maximum tokens to generate |
temperature | float | ❌ | Sampling temperature (0.0-2.0) |
top_p | float | ❌ | Nucleus sampling parameter |
frequency_penalty | float | ❌ | Frequency penalty (-2.0 to 2.0) |
presence_penalty | float | ❌ | Presence penalty (-2.0 to 2.0) |
stop | array | ❌ | Stop sequences |
stream | boolean | ❌ | Enable streaming response |
user | string | ❌ | User identifier for tracking |
Message Object
{
"role": "user|assistant|system",
"content": "Message content",
"name": "optional-name"
}Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699649600,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}cURL Example
curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"max_tokens": 50,
"temperature": 0.7
}'Health Check
Check the service health and provider status.
Endpoint: GET /health
Authentication: Not required
Response
{
"status": "ok",
"service": "marga",
"version": "0.1.0",
"providers": 3
}cURL Example
curl https://marga-449012790678.asia-southeast1.run.app/healthConfiguration
Get the current router configuration and provider status.
Endpoint: GET /v1/config
Authentication: Required
Response
{
"service": "marga",
"version": "0.1.0",
"providers": [
{
"name": "openai",
"type": "openai",
"enabled": true,
"healthy": true,
"models": ["gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo"],
"priority": 1
},
{
"name": "anthropic",
"type": "anthropic",
"enabled": true,
"healthy": true,
"models": ["claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022"],
"priority": 2
}
],
"routing": {
"strategy": "failover",
"model_mappings": {
"gpt-4": "openai/gpt-4o",
"claude-3-sonnet": "anthropic/claude-3-5-sonnet-20241022"
}
}
}cURL Example
curl -H "Authorization: Bearer your-api-key" \
https://marga-449012790678.asia-southeast1.run.app/v1/configMetrics
Get Prometheus-style metrics for monitoring and alerting.
Endpoint: GET /v1/metrics
Authentication: Not required
Response
# HELP marga_requests_total Total number of requests processed by MĀRGA
# TYPE marga_requests_total counter
marga_requests_total 1234
# HELP marga_request_duration_seconds Request duration in seconds
# TYPE marga_request_duration_seconds histogram
marga_request_duration_seconds_bucket{le="0.1"} 100
marga_request_duration_seconds_bucket{le="0.5"} 500
marga_request_duration_seconds_bucket{le="1.0"} 800
marga_request_duration_seconds_bucket{le="+Inf"} 1234
marga_request_duration_seconds_sum 456.78
marga_request_duration_seconds_count 1234
# HELP marga_provider_requests_total Total requests per provider
# TYPE marga_provider_requests_total counter
marga_provider_requests_total{provider="openai",model="gpt-4o",status="success"} 800
marga_provider_requests_total{provider="anthropic",model="claude-3-5-sonnet",status="success"} 400Model Mappings
MĀRGA supports automatic model mapping to provide a unified interface:
| Request Model | Maps To | Provider |
|---|---|---|
gpt-4 | gpt-4o | OpenAI |
gpt-3.5-turbo | gpt-3.5-turbo | OpenAI |
claude-3-sonnet | claude-3-5-sonnet-20241022 | Anthropic |
claude-3-haiku | claude-3-5-haiku-20241022 | Anthropic |
llama-8b | llama3.1:8b | Ollama |
Error Responses
All errors follow the OpenAI error format:
{
"error": {
"message": "The model 'invalid-model' does not exist",
"type": "invalid_request_error",
"code": "model_not_found"
}
}Error Types
| Status Code | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request |
| 401 | authentication_error | Invalid or missing API key |
| 404 | invalid_request_error | Model not found |
| 429 | rate_limit_exceeded | Too many requests |
| 500 | internal_error | Internal server error |
| 502 | upstream_error | Provider error |
| 503 | service_unavailable | No healthy providers |
Common Error Codes
| Code | Description |
|---|---|
missing_api_key | API key not provided |
invalid_api_key | API key is invalid |
missing_model | Model parameter not provided |
model_not_found | Requested model not available |
no_provider_available | No healthy providers for model |
rate_limit_exceeded | Request rate limit exceeded |
streaming_not_supported | Streaming not yet implemented |
Rate Limiting
MĀRGA implements both global and per-client rate limiting:
- Global: 10,000 requests/minute by default
- Per-client: 1,000 requests/minute by default
When rate limits are exceeded, you’ll receive a 429 response:
{
"error": {
"message": "Rate limit exceeded. Try again in 60 seconds.",
"type": "rate_limit_exceeded",
"code": "rate_limit_exceeded"
}
}Supported Features
✅ Currently Supported
- Chat completions
- Model selection and mapping
- Automatic failover
- Provider health checks
- Metrics collection
- Rate limiting
- Authentication
🚧 Coming Soon
- Streaming responses
- Function calling
- Embeddings endpoint
- Fine-tuned model support
- Batch processing
📋 Full OpenAI Compatibility
MĀRGA aims for 100% compatibility with the OpenAI API. Currently supported parameters:
- ✅
model- Model selection with mapping - ✅
messages- Full conversation history - ✅
max_tokens- Token limit control - ✅
temperature- Response randomness - ✅
top_p- Nucleus sampling - ✅
frequency_penalty- Repetition penalty - ✅
presence_penalty- Topic diversity - ✅
stop- Stop sequences - ✅
user- User tracking - 🚧
stream- Streaming (planned) - 🚧
tools- Function calling (planned) - 🚧
response_format- JSON mode (planned)
SDKs and Libraries
MĀRGA works with any OpenAI-compatible SDK by changing the base URL:
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://marga-449012790678.asia-southeast1.run.app/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)Node.js
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://marga-449012790678.asia-southeast1.run.app/v1',
apiKey: 'your-api-key',
});
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello!' }],
});Go
import "github.com/sashabaranov/go-openai"
config := openai.DefaultConfig("your-api-key")
config.BaseURL = "https://marga-449012790678.asia-southeast1.run.app/v1"
client := openai.NewClientWithConfig(config)
resp, err := client.CreateChatCompletion(
context.Background(),
openai.ChatCompletionRequest{
Model: "gpt-4",
Messages: []openai.ChatCompletionMessage{
{Role: "user", Content: "Hello!"},
},
},
)