API Reference

MĀRGA provides a fully OpenAI-compatible API with additional monitoring and configuration endpoints.

Base URL

https://marga-449012790678.asia-southeast1.run.app

Authentication

All protected endpoints require authentication via API key in the request header:

Authorization: Bearer your-api-key
# OR
X-API-Key: your-api-key

Endpoints

Chat Completions

The primary endpoint for LLM interactions, fully compatible with OpenAI’s chat completions API.

Endpoint: POST /v1/chat/completions

Headers:

  • Content-Type: application/json
  • Authorization: Bearer YOUR_API_KEY

Request Body

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 1.0,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "stop": ["\\n"],
  "stream": false,
  "user": "user-123"
}

Request Parameters

ParameterTypeRequiredDescription
modelstringModel identifier (e.g., gpt-4, claude-3-sonnet, llama3.1:8b)
messagesarrayArray of conversation messages
max_tokensintegerMaximum tokens to generate
temperaturefloatSampling temperature (0.0-2.0)
top_pfloatNucleus sampling parameter
frequency_penaltyfloatFrequency penalty (-2.0 to 2.0)
presence_penaltyfloatPresence penalty (-2.0 to 2.0)
stoparrayStop sequences
streambooleanEnable streaming response
userstringUser identifier for tracking

Message Object

{
  "role": "user|assistant|system",
  "content": "Message content",
  "name": "optional-name"
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699649600,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

cURL Example

curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'

Health Check

Check the service health and provider status.

Endpoint: GET /health

Authentication: Not required

Response

{
  "status": "ok",
  "service": "marga",
  "version": "0.1.0",
  "providers": 3
}

cURL Example

curl https://marga-449012790678.asia-southeast1.run.app/health

Configuration

Get the current router configuration and provider status.

Endpoint: GET /v1/config

Authentication: Required

Response

{
  "service": "marga",
  "version": "0.1.0",
  "providers": [
    {
      "name": "openai",
      "type": "openai",
      "enabled": true,
      "healthy": true,
      "models": ["gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo"],
      "priority": 1
    },
    {
      "name": "anthropic",
      "type": "anthropic",
      "enabled": true,
      "healthy": true,
      "models": ["claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022"],
      "priority": 2
    }
  ],
  "routing": {
    "strategy": "failover",
    "model_mappings": {
      "gpt-4": "openai/gpt-4o",
      "claude-3-sonnet": "anthropic/claude-3-5-sonnet-20241022"
    }
  }
}

cURL Example

curl -H "Authorization: Bearer your-api-key" \
  https://marga-449012790678.asia-southeast1.run.app/v1/config

Metrics

Get Prometheus-style metrics for monitoring and alerting.

Endpoint: GET /v1/metrics

Authentication: Not required

Response

# HELP marga_requests_total Total number of requests processed by MĀRGA
# TYPE marga_requests_total counter
marga_requests_total 1234

# HELP marga_request_duration_seconds Request duration in seconds
# TYPE marga_request_duration_seconds histogram
marga_request_duration_seconds_bucket{le="0.1"} 100
marga_request_duration_seconds_bucket{le="0.5"} 500
marga_request_duration_seconds_bucket{le="1.0"} 800
marga_request_duration_seconds_bucket{le="+Inf"} 1234
marga_request_duration_seconds_sum 456.78
marga_request_duration_seconds_count 1234

# HELP marga_provider_requests_total Total requests per provider
# TYPE marga_provider_requests_total counter
marga_provider_requests_total{provider="openai",model="gpt-4o",status="success"} 800
marga_provider_requests_total{provider="anthropic",model="claude-3-5-sonnet",status="success"} 400

Model Mappings

MĀRGA supports automatic model mapping to provide a unified interface:

Request ModelMaps ToProvider
gpt-4gpt-4oOpenAI
gpt-3.5-turbogpt-3.5-turboOpenAI
claude-3-sonnetclaude-3-5-sonnet-20241022Anthropic
claude-3-haikuclaude-3-5-haiku-20241022Anthropic
llama-8bllama3.1:8bOllama

Error Responses

All errors follow the OpenAI error format:

{
  "error": {
    "message": "The model 'invalid-model' does not exist",
    "type": "invalid_request_error", 
    "code": "model_not_found"
  }
}

Error Types

Status CodeError TypeDescription
400invalid_request_errorMalformed request
401authentication_errorInvalid or missing API key
404invalid_request_errorModel not found
429rate_limit_exceededToo many requests
500internal_errorInternal server error
502upstream_errorProvider error
503service_unavailableNo healthy providers

Common Error Codes

CodeDescription
missing_api_keyAPI key not provided
invalid_api_keyAPI key is invalid
missing_modelModel parameter not provided
model_not_foundRequested model not available
no_provider_availableNo healthy providers for model
rate_limit_exceededRequest rate limit exceeded
streaming_not_supportedStreaming not yet implemented

Rate Limiting

MĀRGA implements both global and per-client rate limiting:

  • Global: 10,000 requests/minute by default
  • Per-client: 1,000 requests/minute by default

When rate limits are exceeded, you’ll receive a 429 response:

{
  "error": {
    "message": "Rate limit exceeded. Try again in 60 seconds.",
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded"
  }
}

Supported Features

✅ Currently Supported

  • Chat completions
  • Model selection and mapping
  • Automatic failover
  • Provider health checks
  • Metrics collection
  • Rate limiting
  • Authentication

🚧 Coming Soon

  • Streaming responses
  • Function calling
  • Embeddings endpoint
  • Fine-tuned model support
  • Batch processing

📋 Full OpenAI Compatibility

MĀRGA aims for 100% compatibility with the OpenAI API. Currently supported parameters:

  • model - Model selection with mapping
  • messages - Full conversation history
  • max_tokens - Token limit control
  • temperature - Response randomness
  • top_p - Nucleus sampling
  • frequency_penalty - Repetition penalty
  • presence_penalty - Topic diversity
  • stop - Stop sequences
  • user - User tracking
  • 🚧 stream - Streaming (planned)
  • 🚧 tools - Function calling (planned)
  • 🚧 response_format - JSON mode (planned)

SDKs and Libraries

MĀRGA works with any OpenAI-compatible SDK by changing the base URL:

Python (OpenAI SDK)

from openai import OpenAI
 
client = OpenAI(
    base_url="https://marga-449012790678.asia-southeast1.run.app/v1",
    api_key="your-api-key"
)
 
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.js

import OpenAI from 'openai';
 
const openai = new OpenAI({
  baseURL: 'https://marga-449012790678.asia-southeast1.run.app/v1',
  apiKey: 'your-api-key',
});
 
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }],
});

Go

import "github.com/sashabaranov/go-openai"
 
config := openai.DefaultConfig("your-api-key")
config.BaseURL = "https://marga-449012790678.asia-southeast1.run.app/v1"
client := openai.NewClientWithConfig(config)
 
resp, err := client.CreateChatCompletion(
    context.Background(),
    openai.ChatCompletionRequest{
        Model: "gpt-4",
        Messages: []openai.ChatCompletionMessage{
            {Role: "user", Content: "Hello!"},
        },
    },
)