Configuration Reference

This guide covers all configuration options for MĀRGA, including server settings, provider configuration, routing rules, and monitoring setup.

Configuration File

MĀRGA uses a YAML configuration file (default: config.yaml). You can specify a custom location with the CONFIG_FILE environment variable.

# Full configuration example
server:
  port: 8080
  host: 0.0.0.0
  timeout: 30s
  max_request_size: 10MB
 
logging:
  level: info
  format: json
 
metrics:
  enabled: true
  path: /v1/metrics
  datadog:
    enabled: true
    service_name: marga
    environment: production
 
providers:
  - name: openai
    type: openai
    enabled: true
    # ... provider config
 
routing:
  strategy: failover
  # ... routing config
 
rate_limit:
  enabled: true
  # ... rate limit config
 
security:
  api_key_required: true
  # ... security config
 
health:
  enabled: true
  # ... health config

Server Configuration

Controls the HTTP server behavior.

server:
  port: 8080                    # Port to listen on
  host: 0.0.0.0                # Host to bind to (0.0.0.0 for all interfaces)
  timeout: 30s                 # Request timeout
  max_request_size: 10MB       # Maximum request body size

Environment Variable Overrides

Config	Environment Variable	Default
`port`	`PORT`	`8080`
`host`	`HOST`	`0.0.0.0`
`timeout`	`SERVER_TIMEOUT`	`30s`

Logging Configuration

Controls logging behavior and output format.

logging:
  level: info                  # Log level: debug, info, warn, error
  format: json                 # Output format: json, text

Log Levels

Level	Description	When to Use
`debug`	Detailed debugging info	Development, troubleshooting
`info`	General operational info	Production default
`warn`	Warning conditions	Production monitoring
`error`	Error conditions only	Minimal production logging

Environment Variable Overrides

Config	Environment Variable	Default
`level`	`LOG_LEVEL`	`info`
`format`	`LOG_FORMAT`	`json`

Metrics Configuration

Controls Prometheus metrics collection and Datadog integration.

metrics:
  enabled: true                # Enable metrics collection
  path: /v1/metrics           # Metrics endpoint path
  datadog:
    enabled: true             # Enable Datadog integration
    service_name: marga       # Service name in Datadog
    environment: production   # Environment tag

Available Metrics

Metric	Type	Description
`marga_requests_total`	Counter	Total requests processed
`marga_request_duration_seconds`	Histogram	Request latency distribution
`marga_requests_in_flight`	Gauge	Current concurrent requests
`marga_provider_requests_total`	Counter	Requests per provider
`marga_provider_errors_total`	Counter	Errors per provider
`marga_provider_health`	Gauge	Provider health status
`marga_model_requests_total`	Counter	Requests per model

Provider Configuration

Configure LLM providers and their settings.

OpenAI Provider

providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models:
      - gpt-4o
      - gpt-4o-mini
      - gpt-3.5-turbo
    priority: 1
    rate_limit:
      requests_per_minute: 3000
      tokens_per_minute: 150000
    timeout: 30s

Anthropic Provider

providers:
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models:
      - claude-3-5-sonnet-20241022
      - claude-3-5-haiku-20241022
      - claude-3-opus-20240229
    priority: 2
    rate_limit:
      requests_per_minute: 4000
      tokens_per_minute: 400000
    timeout: 60s

Ollama Provider (Local Models)

providers:
  - name: ollama
    type: ollama
    enabled: false
    endpoint: http://localhost:11434
    models:
      - llama3.1:8b
      - llama3.1:70b
      - mistral:7b
    priority: 3
    rate_limit:
      requests_per_minute: 100
    timeout: 120s

Together AI Provider

providers:
  - name: together
    type: openai  # OpenAI-compatible
    enabled: false
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models:
      - meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
      - meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
    priority: 4
    rate_limit:
      requests_per_minute: 600
      tokens_per_minute: 1000000
    timeout: 60s

Provider Fields Reference

Field	Type	Required	Description
`name`	string	✅	Unique provider identifier
`type`	string	✅	Provider type: `openai`, `anthropic`, `ollama`
`enabled`	boolean	✅	Whether provider is active
`endpoint`	string	✅	API endpoint URL
`api_key_env`	string	❌	Environment variable for API key
`models`	array	✅	List of supported models
`priority`	integer	✅	Lower = higher priority
`rate_limit`	object	❌	Provider-specific rate limits
`timeout`	duration	❌	Request timeout

Routing Configuration

Controls how requests are routed to providers.

routing:
  strategy: failover           # Routing strategy
  
  # Model mappings for transparent routing
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
    llama-8b: ollama/llama3.1:8b
 
  # Failover configuration
  failover:
    max_retries: 3
    retry_delay: 1s
    health_check_interval: 30s
 
  # Load balancing (when strategy is load_balance)
  load_balance:
    algorithm: round_robin     # round_robin, weighted, least_connections
    health_aware: true
 
  # Cost optimization (when strategy is cost_optimize)
  cost_optimize:
    prefer_cheaper: true
    cost_threshold: 0.01       # USD per 1K tokens

Routing Strategies

Strategy	Description	Use Case
`failover`	Try providers in priority order	High availability
`load_balance`	Distribute requests across providers	High throughput
`cost_optimize`	Route to cheapest available provider	Cost efficiency

Model Mappings

Model mappings provide a unified interface by mapping generic model names to specific provider models:

model_mappings:
  # Generic name: provider/specific-model
  gpt-4: openai/gpt-4o
  gpt-3.5: openai/gpt-3.5-turbo
  claude-3: anthropic/claude-3-5-sonnet-20241022
  llama-8b: ollama/llama3.1:8b
  llama-70b: together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo

This allows clients to request gpt-4 and automatically get routed to the best available GPT-4 variant.

Rate Limiting Configuration

Control request rates globally and per client.

rate_limit:
  enabled: true                # Enable rate limiting
  
  # Global limits across all clients
  global:
    requests_per_minute: 10000
    burst: 100
  
  # Per-client limits (by API key or IP)
  per_client:
    requests_per_minute: 1000
    burst: 50

Rate Limit Fields

Field	Description	Example
`requests_per_minute`	Maximum requests per minute	`1000`
`tokens_per_minute`	Maximum tokens per minute	`100000`
`burst`	Burst capacity for short spikes	`50`

Rate Limiting Algorithms

MĀRGA uses a token bucket algorithm for smooth rate limiting:

Bucket Size: Set by burst parameter
Refill Rate: Set by requests_per_minute
Overflow Handling: Requests are rejected with HTTP 429

Security Configuration

Configure authentication, CORS, and access controls.

security:
  api_key_required: true       # Require API key for access
  api_key_header: X-API-Key   # Header name for API key
  
  # CORS configuration
  allowed_origins:
    - "https://myapp.com"
    - "https://admin.myapp.com"
  cors:
    enabled: true
    credentials: false

API Key Authentication

When api_key_required is true, requests must include an API key in:

Authorization: Bearer YOUR_KEY header, or
Custom header specified by api_key_header

# Using Authorization header
curl -H "Authorization: Bearer your-api-key" \
  https://marga.example.com/v1/chat/completions
 
# Using custom header
curl -H "X-API-Key: your-api-key" \
  https://marga.example.com/v1/chat/completions

CORS Configuration

Field	Description	Example
`allowed_origins`	Allowed origin domains	`["https://myapp.com", "*"]`
`cors.enabled`	Enable CORS middleware	`true`
`cors.credentials`	Allow credentials in CORS	`false`

Health Check Configuration

Configure provider health monitoring.

health:
  enabled: true               # Enable health checks
  path: /health              # Health endpoint path
  check_providers: true      # Check individual providers
  timeout: 10s              # Health check timeout

Health Check Behavior

When enabled, MĀRGA:

Exposes /health endpoint for load balancer checks
Periodically checks provider health (if check_providers: true)
Removes unhealthy providers from routing
Automatically re-adds providers when they recover

Environment Variables

All configuration can be overridden with environment variables:

Server Variables

PORT=8080
HOST=0.0.0.0
SERVER_TIMEOUT=30s
MAX_REQUEST_SIZE=10MB

Logging Variables

LOG_LEVEL=info
LOG_FORMAT=json

Provider API Keys

OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
TOGETHER_API_KEY=your-together-key
OLLAMA_ENDPOINT=http://localhost:11434

Security Variables

MARGA_API_KEY=your-secure-api-key
API_KEY_REQUIRED=true
ALLOWED_ORIGINS=https://myapp.com,https://admin.myapp.com

Monitoring Variables

METRICS_ENABLED=true
DD_API_KEY=your-datadog-key
DD_ENV=production
DD_SERVICE=marga
DD_VERSION=0.1.0

Configuration Examples

Development Configuration

# config-dev.yaml - Development setup
server:
  port: 8080
  timeout: 30s
 
logging:
  level: debug
  format: text
 
metrics:
  enabled: true
  datadog:
    enabled: false
 
providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-3.5-turbo]
    priority: 1
 
routing:
  strategy: failover
 
security:
  api_key_required: false
  cors:
    enabled: true

Production Configuration

# config-prod.yaml - Production setup
server:
  port: 8080
  host: 0.0.0.0
  timeout: 30s
  max_request_size: 10MB
 
logging:
  level: info
  format: json
 
metrics:
  enabled: true
  datadog:
    enabled: true
    service_name: marga
    environment: production
 
providers:
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-4o, gpt-4o-mini]
    priority: 1
    rate_limit:
      requests_per_minute: 3000
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-sonnet-20241022]
    priority: 2
    rate_limit:
      requests_per_minute: 4000
 
routing:
  strategy: failover
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
  failover:
    max_retries: 3
    retry_delay: 2s
 
rate_limit:
  enabled: true
  global:
    requests_per_minute: 10000
  per_client:
    requests_per_minute: 1000
 
security:
  api_key_required: true
  allowed_origins:
    - https://myapp.com
  cors:
    enabled: true
 
health:
  enabled: true
  check_providers: true
  timeout: 10s

High-Availability Configuration

# config-ha.yaml - High availability setup
routing:
  strategy: failover
  model_mappings:
    gpt-4: openai/gpt-4o
    claude-3-sonnet: anthropic/claude-3-5-sonnet-20241022
    llama-70b: together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
  failover:
    max_retries: 5
    retry_delay: 1s
    health_check_interval: 15s
 
providers:
  - name: openai-primary
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY_PRIMARY
    models: [gpt-4o, gpt-4o-mini]
    priority: 1
    
  - name: openai-secondary
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY_SECONDARY
    models: [gpt-4o, gpt-4o-mini]
    priority: 2
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-sonnet-20241022]
    priority: 3
    
  - name: together
    type: openai
    enabled: true
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models: [meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo]
    priority: 4
 
health:
  enabled: true
  check_providers: true
  timeout: 5s

Cost-Optimized Configuration

# config-cost.yaml - Cost optimization setup
routing:
  strategy: cost_optimize
  cost_optimize:
    prefer_cheaper: true
    cost_threshold: 0.01
  model_mappings:
    gpt-4: openai/gpt-4o-mini  # Use mini for cost savings
    claude-3: anthropic/claude-3-5-haiku-20241022  # Use Haiku
 
providers:
  - name: together
    type: openai
    enabled: true
    endpoint: https://api.together.xyz/v1
    api_key_env: TOGETHER_API_KEY
    models: [meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo]
    priority: 1  # Cheapest first
    
  - name: openai
    type: openai
    enabled: true
    endpoint: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    models: [gpt-4o-mini, gpt-3.5-turbo]
    priority: 2
    
  - name: anthropic
    type: anthropic
    enabled: true
    endpoint: https://api.anthropic.com/v1
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-3-5-haiku-20241022]
    priority: 3

Configuration Validation

MĀRGA validates configuration on startup and reports errors:

Common Validation Errors

# Missing required fields
Error: Provider 'openai' missing required field 'type'
 
# Invalid values
Error: Invalid log level 'invalid', must be: debug, info, warn, error
 
# Duplicate names
Error: Duplicate provider name 'openai'
 
# Invalid endpoints
Error: Provider 'openai' has invalid endpoint URL

Configuration Testing

Test your configuration before deploying:

# Dry run to validate config
./marga --config config.yaml --validate-only
 
# Check specific provider
./marga --config config.yaml --test-provider openai

API Reference Deployment