Configuration Reference

This guide covers all configuration options for MĀRGA, including server settings, provider configuration, routing rules, and monitoring setup.

Configuration File

MĀRGA uses a YAML configuration file (default: config.yaml). You can specify a custom location with the CONFIG_FILE environment variable.

# Full configuration example
server:
  port: 8080
  host: 0.0.0.0
  timeout: 30s
  max_request_size: 10MB
 
logging:
  level: info
  format: json
 
metrics:
  enabled: true
  path: /v1/metrics
  datadog:
    enabled: true
    service_name: marga
    environment: production
 
providers:
  - name: openai
    type: openai
    enabled: true
    # ... provider config
 
routing:
  strategy: failover
  # ... routing config
 
rate_limit:
  enabled: true
  # ... rate limit config
 
security:
  api_key_required: true
  # ... security config
 
health:
  enabled: true
  # ... health config

Server Configuration

Controls the HTTP server behavior.

server:
  port: 8080                    # Port to listen on
  host: 0.0.0.0                # Host to bind to (0.0.0.0 for all interfaces)
  timeout: 30s                 # Request timeout
  max_request_size: 10MB       # Maximum request body size

Environment Variable Overrides

ConfigEnvironment VariableDefault
portPORT8080
hostHOST0.0.0.0
timeoutSERVER_TIMEOUT30s

Logging Configuration

Controls logging behavior and output format.

logging:
  level: info                  # Log level: debug, info, warn, error
  format: json                 # Output format: json, text

Log Levels

LevelDescriptionWhen to Use
debugDetailed debugging infoDevelopment, troubleshooting
infoGeneral operational infoProduction default
warnWarning conditionsProduction monitoring
errorError conditions onlyMinimal production logging

Environment Variable Overrides

ConfigEnvironment VariableDefault
levelLOG_LEVELinfo
formatLOG_FORMATjson

Metrics Configuration

Controls Prometheus metrics collection and Datadog integration.

metrics:
  enabled: true                # Enable metrics collection
  path: /v1/metrics           # Metrics endpoint path
  datadog:
    enabled: true             # Enable Datadog integration
    service_name: marga       # Service name in Datadog
    environment: production   # Environment tag

Available Metrics

MetricTypeDescription
marga_requests_totalCounterTotal requests processed
marga_request_duration_secondsHistogramRequest latency distribution
marga_requests_in_flightGaugeCurrent concurrent requests
marga_provider_requests_totalCounterRequests per provider
marga_provider_errors_totalCounterErrors per provider
marga_provider_healthGaugeProvider health status
marga_model_requests_totalCounterRequests per model

Provider Configuration

Configure LLM providers and their settings.

OpenAI Provider

providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models:
      - gpt-4o
      - gpt-4o-mini
      - gpt-3.5-turbo
    priority: 1
    rate_limit:
      requests_per_minute: 3000
      tokens_per_minute: 150000
    timeout: 30s

Anthropic Provider

providers:
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models:
      - claude-3-5-sonnet-20241022
      - claude-3-5-haiku-20241022
      - claude-3-opus-20240229
    priority: 2
    rate_limit:
      requests_per_minute: 4000
      tokens_per_minute: 400000
    timeout: 60s

Ollama Provider (Local Models)

providers:
  - name: ollama
    type: ollama
    enabled: false
    endpoint: http://localhost:11434
    models:
      - llama3.1:8b
      - llama3.1:70b
      - mistral:7b
    priority: 3
    rate_limit:
      requests_per_minute: 100
    timeout: 120s

Together AI Provider

providers:
  - name: together
    type: openai  # OpenAI-compatible
    enabled: false
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models:
      - meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
      - meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
    priority: 4
    rate_limit:
      requests_per_minute: 600
      tokens_per_minute: 1000000
    timeout: 60s

Provider Fields Reference

FieldTypeRequiredDescription
namestringUnique provider identifier
typestringProvider type: openai, anthropic, ollama
enabledbooleanWhether provider is active
endpointstringAPI endpoint URL
api_key_envstringEnvironment variable for API key
modelsarrayList of supported models
priorityintegerLower = higher priority
rate_limitobjectProvider-specific rate limits
timeoutdurationRequest timeout

Routing Configuration

Controls how requests are routed to providers.

routing:
  strategy: failover           # Routing strategy
  
  # Model mappings for transparent routing
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
    llama-8b: ollama/llama3.1:8b
 
  # Failover configuration
  failover:
    max_retries: 3
    retry_delay: 1s
    health_check_interval: 30s
 
  # Load balancing (when strategy is load_balance)
  load_balance:
    algorithm: round_robin     # round_robin, weighted, least_connections
    health_aware: true
 
  # Cost optimization (when strategy is cost_optimize)
  cost_optimize:
    prefer_cheaper: true
    cost_threshold: 0.01       # USD per 1K tokens

Routing Strategies

StrategyDescriptionUse Case
failoverTry providers in priority orderHigh availability
load_balanceDistribute requests across providersHigh throughput
cost_optimizeRoute to cheapest available providerCost efficiency

Model Mappings

Model mappings provide a unified interface by mapping generic model names to specific provider models:

model_mappings:
  # Generic name: provider/specific-model
  gpt-4: openai/gpt-4o
  gpt-3.5: openai/gpt-3.5-turbo
  claude-3: anthropic/claude-3-5-sonnet-20241022
  llama-8b: ollama/llama3.1:8b
  llama-70b: together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

This allows clients to request gpt-4 and automatically get routed to the best available GPT-4 variant.

Rate Limiting Configuration

Control request rates globally and per client.

rate_limit:
  enabled: true                # Enable rate limiting
  
  # Global limits across all clients
  global:
    requests_per_minute: 10000
    burst: 100
  
  # Per-client limits (by API key or IP)
  per_client:
    requests_per_minute: 1000
    burst: 50

Rate Limit Fields

FieldDescriptionExample
requests_per_minuteMaximum requests per minute1000
tokens_per_minuteMaximum tokens per minute100000
burstBurst capacity for short spikes50

Rate Limiting Algorithms

MĀRGA uses a token bucket algorithm for smooth rate limiting:

  • Bucket Size: Set by burst parameter
  • Refill Rate: Set by requests_per_minute
  • Overflow Handling: Requests are rejected with HTTP 429

Security Configuration

Configure authentication, CORS, and access controls.

security:
  api_key_required: true       # Require API key for access
  api_key_header: X-API-Key   # Header name for API key
  
  # CORS configuration
  allowed_origins:
    - "https://myapp.com"
    - "https://admin.myapp.com"
  cors:
    enabled: true
    credentials: false

API Key Authentication

When api_key_required is true, requests must include an API key in:

  • Authorization: Bearer YOUR_KEY header, or
  • Custom header specified by api_key_header
# Using Authorization header
curl -H "Authorization: Bearer your-api-key" \
  https://marga.example.com/v1/chat/completions
 
# Using custom header
curl -H "X-API-Key: your-api-key" \
  https://marga.example.com/v1/chat/completions

CORS Configuration

FieldDescriptionExample
allowed_originsAllowed origin domains["https://myapp.com", "*"]
cors.enabledEnable CORS middlewaretrue
cors.credentialsAllow credentials in CORSfalse

Health Check Configuration

Configure provider health monitoring.

health:
  enabled: true               # Enable health checks
  path: /health              # Health endpoint path
  check_providers: true      # Check individual providers
  timeout: 10s              # Health check timeout

Health Check Behavior

When enabled, MĀRGA:

  1. Exposes /health endpoint for load balancer checks
  2. Periodically checks provider health (if check_providers: true)
  3. Removes unhealthy providers from routing
  4. Automatically re-adds providers when they recover

Environment Variables

All configuration can be overridden with environment variables:

Server Variables

PORT=8080
HOST=0.0.0.0
SERVER_TIMEOUT=30s
MAX_REQUEST_SIZE=10MB

Logging Variables

LOG_LEVEL=info
LOG_FORMAT=json

Provider API Keys

OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
TOGETHER_API_KEY=your-together-key
OLLAMA_ENDPOINT=http://localhost:11434

Security Variables

MARGA_API_KEY=your-secure-api-key
API_KEY_REQUIRED=true
ALLOWED_ORIGINS=https://myapp.com,https://admin.myapp.com

Monitoring Variables

METRICS_ENABLED=true
DD_API_KEY=your-datadog-key
DD_ENV=production
DD_SERVICE=marga
DD_VERSION=0.1.0

Configuration Examples

Development Configuration

# config-dev.yaml - Development setup
server:
  port: 8080
  timeout: 30s
 
logging:
  level: debug
  format: text
 
metrics:
  enabled: true
  datadog:
    enabled: false
 
providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-3.5-turbo]
    priority: 1
 
routing:
  strategy: failover
 
security:
  api_key_required: false
  cors:
    enabled: true

Production Configuration

# config-prod.yaml - Production setup
server:
  port: 8080
  host: 0.0.0.0
  timeout: 30s
  max_request_size: 10MB
 
logging:
  level: info
  format: json
 
metrics:
  enabled: true
  datadog:
    enabled: true
    service_name: marga
    environment: production
 
providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-4o, gpt-4o-mini]
    priority: 1
    rate_limit:
      requests_per_minute: 3000
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-sonnet-20241022]
    priority: 2
    rate_limit:
      requests_per_minute: 4000
 
routing:
  strategy: failover
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
  failover:
    max_retries: 3
    retry_delay: 2s
 
rate_limit:
  enabled: true
  global:
    requests_per_minute: 10000
  per_client:
    requests_per_minute: 1000
 
security:
  api_key_required: true
  allowed_origins:
    - https://myapp.com
  cors:
    enabled: true
 
health:
  enabled: true
  check_providers: true
  timeout: 10s

High-Availability Configuration

# config-ha.yaml - High availability setup
routing:
  strategy: failover
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
    llama-70b: together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
  failover:
    max_retries: 5
    retry_delay: 1s
    health_check_interval: 15s
 
providers:
  - name: openai-primary
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY_PRIMARY
    models: [gpt-4o, gpt-4o-mini]
    priority: 1
    
  - name: openai-secondary
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY_SECONDARY
    models: [gpt-4o, gpt-4o-mini]
    priority: 2
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-sonnet-20241022]
    priority: 3
    
  - name: together
    type: openai
    enabled: true
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models: [meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo]
    priority: 4
 
health:
  enabled: true
  check_providers: true
  timeout: 5s

Cost-Optimized Configuration

# config-cost.yaml - Cost optimization setup
routing:
  strategy: cost_optimize
  cost_optimize:
    prefer_cheaper: true
    cost_threshold: 0.01
  model_mappings:
    gpt-4: openai/gpt-4o-mini  # Use mini for cost savings
    claude-3: anthropic/claude-3-5-haiku-20241022  # Use Haiku
 
providers:
  - name: together
    type: openai
    enabled: true
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models: [meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo]
    priority: 1  # Cheapest first
    
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-4o-mini, gpt-3.5-turbo]
    priority: 2
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-haiku-20241022]
    priority: 3

Configuration Validation

MĀRGA validates configuration on startup and reports errors:

Common Validation Errors

# Missing required fields
Error: Provider 'openai' missing required field 'type'
 
# Invalid values
Error: Invalid log level 'invalid', must be: debug, info, warn, error
 
# Duplicate names
Error: Duplicate provider name 'openai'
 
# Invalid endpoints
Error: Provider 'openai' has invalid endpoint URL

Configuration Testing

Test your configuration before deploying:

# Dry run to validate config
./marga --config config.yaml --validate-only
 
# Check specific provider
./marga --config config.yaml --test-provider openai