API Reference

MĀRGA provides a fully OpenAI-compatible API with additional monitoring and configuration endpoints.

Base URL

https://marga-449012790678.asia-southeast1.run.app

Authentication

All protected endpoints require authentication via API key in the request header:

Authorization: Bearer your-api-key
# OR
X-API-Key: your-api-key

Endpoints

Chat Completions

The primary endpoint for LLM interactions, fully compatible with OpenAI’s chat completions API.

Endpoint: POST /v1/chat/completions

Headers:

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 1.0,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "stop": ["\\n"],
  "stream": false,
  "user": "user-123"
}

Request Parameters

Parameter	Type	Required	Description
`model`	string	✅	Model identifier (e.g., `gpt-4`, `claude-3-sonnet`, `llama3.1:8b`)
`messages`	array	✅	Array of conversation messages
`max_tokens`	integer	❌	Maximum tokens to generate
`temperature`	float	❌	Sampling temperature (0.0-2.0)
`top_p`	float	❌	Nucleus sampling parameter
`frequency_penalty`	float	❌	Frequency penalty (-2.0 to 2.0)
`presence_penalty`	float	❌	Presence penalty (-2.0 to 2.0)
`stop`	array	❌	Stop sequences
`stream`	boolean	❌	Enable streaming response
`user`	string	❌	User identifier for tracking

Message Object

{
  "role": "user|assistant|system",
  "content": "Message content",
  "name": "optional-name"
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699649600,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

cURL Example

curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'

Health Check

Check the service health and provider status.

Endpoint: GET /health

Authentication: Not required

Response

{
  "status": "ok",
  "service": "marga",
  "version": "0.1.0",
  "providers": 3
}

cURL Example

curl https://marga-449012790678.asia-southeast1.run.app/health

Configuration

Get the current router configuration and provider status.

Endpoint: GET /v1/config

Authentication: Required

Response

{
  "service": "marga",
  "version": "0.1.0",
  "providers": [
    {
      "name": "openai",
      "type": "openai",
      "enabled": true,
      "healthy": true,
      "models": ["gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo"],
      "priority": 1
    },
    {
      "name": "anthropic",
      "type": "anthropic",
      "enabled": true,
      "healthy": true,
      "models": ["claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022"],
      "priority": 2
    }
  ],
  "routing": {
    "strategy": "failover",
    "model_mappings": {
      "gpt-4": "openai/gpt-4o",
      "claude-3-sonnet": "anthropic/claude-3-5-sonnet-20241022"
    }
  }
}

cURL Example

curl -H "Authorization: Bearer your-api-key" \
  https://marga-449012790678.asia-southeast1.run.app/v1/config

Metrics

Get Prometheus-style metrics for monitoring and alerting.

Endpoint: GET /v1/metrics

Authentication: Not required

Response

# HELP marga_requests_total Total number of requests processed by MĀRGA
# TYPE marga_requests_total counter
marga_requests_total 1234

# HELP marga_request_duration_seconds Request duration in seconds
# TYPE marga_request_duration_seconds histogram
marga_request_duration_seconds_bucket{le="0.1"} 100
marga_request_duration_seconds_bucket{le="0.5"} 500
marga_request_duration_seconds_bucket{le="1.0"} 800
marga_request_duration_seconds_bucket{le="+Inf"} 1234
marga_request_duration_seconds_sum 456.78
marga_request_duration_seconds_count 1234

# HELP marga_provider_requests_total Total requests per provider
# TYPE marga_provider_requests_total counter
marga_provider_requests_total{provider="openai",model="gpt-4o",status="success"} 800
marga_provider_requests_total{provider="anthropic",model="claude-3-5-sonnet",status="success"} 400

Model Mappings

MĀRGA supports automatic model mapping to provide a unified interface:

Request Model	Maps To	Provider
`gpt-4`	`gpt-4o`	OpenAI
`gpt-3.5-turbo`	`gpt-3.5-turbo`	OpenAI
`claude-3-sonnet`	`claude-3-5-sonnet-20241022`	Anthropic
`claude-3-haiku`	`claude-3-5-haiku-20241022`	Anthropic
`llama-8b`	`llama3.1:8b`	Ollama

Error Responses

All errors follow the OpenAI error format:

{
  "error": {
    "message": "The model 'invalid-model' does not exist",
    "type": "invalid_request_error", 
    "code": "model_not_found"
  }
}

Error Types

Status Code	Error Type	Description
400	`invalid_request_error`	Malformed request
401	`authentication_error`	Invalid or missing API key
404	`invalid_request_error`	Model not found
429	`rate_limit_exceeded`	Too many requests
500	`internal_error`	Internal server error
502	`upstream_error`	Provider error
503	`service_unavailable`	No healthy providers

Common Error Codes

Code	Description
`missing_api_key`	API key not provided
`invalid_api_key`	API key is invalid
`missing_model`	Model parameter not provided
`model_not_found`	Requested model not available
`no_provider_available`	No healthy providers for model
`rate_limit_exceeded`	Request rate limit exceeded
`streaming_not_supported`	Streaming not yet implemented

Rate Limiting

MĀRGA implements both global and per-client rate limiting:

Global: 10,000 requests/minute by default
Per-client: 1,000 requests/minute by default

When rate limits are exceeded, you’ll receive a 429 response:

{
  "error": {
    "message": "Rate limit exceeded. Try again in 60 seconds.",
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded"
  }
}

Supported Features

✅ Currently Supported

Chat completions
Model selection and mapping
Automatic failover
Provider health checks
Metrics collection
Rate limiting
Authentication

🚧 Coming Soon

Streaming responses
Function calling
Embeddings endpoint
Fine-tuned model support
Batch processing

📋 Full OpenAI Compatibility

MĀRGA aims for 100% compatibility with the OpenAI API. Currently supported parameters:

✅ model - Model selection with mapping
✅ messages - Full conversation history
✅ max_tokens - Token limit control
✅ temperature - Response randomness
✅ top_p - Nucleus sampling
✅ frequency_penalty - Repetition penalty
✅ presence_penalty - Topic diversity
✅ stop - Stop sequences
✅ user - User tracking
🚧 stream - Streaming (planned)
🚧 tools - Function calling (planned)
🚧 response_format - JSON mode (planned)

SDKs and Libraries

MĀRGA works with any OpenAI-compatible SDK by changing the base URL:

Python (OpenAI SDK)

from openai import OpenAI
 
client = OpenAI(
    base_url="https://marga-449012790678.asia-southeast1.run.app/v1",
    api_key="your-api-key"
)
 
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.js

import OpenAI from 'openai';
 
const openai = new OpenAI({
  baseURL: 'https://marga-449012790678.asia-southeast1.run.app/v1',
  apiKey: 'your-api-key',
});
 
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Go

import "github.com/sashabaranov/go-openai"
 
config := openai.DefaultConfig("your-api-key")
config.BaseURL = "https://marga-449012790678.asia-southeast1.run.app/v1"
client := openai.NewClientWithConfig(config)
 
resp, err := client.CreateChatCompletion(
    context.Background(),
    openai.ChatCompletionRequest{
        Model: "gpt-4",
        Messages: []openai.ChatCompletionMessage{
            {Role: "user", Content: "Hello!"},
        },
    },
)

Quick Start Configuration