Deployment Guide

This guide covers deploying MĀRGA to production environments including Docker, Kubernetes, Cloud Run, and traditional server deployments.

Self-hosted vs Managed

FeatureSelf-hostedManaged (Avyay Cloud)
ControlFull control over infrastructureManaged infrastructure
ScalingManual scaling setupAuto-scaling included
MonitoringSetup requiredBuilt-in dashboards
UpdatesManual updatesAutomatic updates
SupportCommunity support24/7 enterprise support
SLASelf-managed99.9% uptime SLA
CostInfrastructure + ops timeUsage-based pricing

Docker Deployment

Single Container

Simple deployment for small-scale production:

# Pull latest image
docker pull ghcr.io/gaurav21/marga:latest
 
# Create production environment file
cat > .env << EOF
OPENAI_API_KEY=sk-your-production-key
ANTHROPIC_API_KEY=sk-ant-your-production-key
MARGA_API_KEY=$(openssl rand -hex 32)
DD_API_KEY=your-datadog-key
DD_ENV=production
LOG_LEVEL=info
EOF
 
# Run in production mode
docker run -d \
  --name marga-prod \
  --restart unless-stopped \
  -p 8080:8080 \
  --env-file .env \
  -v /var/log/marga:/app/logs \
  ghcr.io/gaurav21/marga:latest

Docker Compose Production

For multi-service deployments with monitoring:

# docker-compose.prod.yml
version: '3.8'
 
services:
  marga:
    image: ghcr.io/gaurav21/marga:latest
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - MARGA_API_KEY=${MARGA_API_KEY}
      - DD_API_KEY=${DD_API_KEY}
      - DD_ENV=production
      - LOG_LEVEL=info
    volumes:
      - ./config/prod-config.yaml:/app/config.yaml:ro
      - marga-logs:/app/logs
    networks:
      - marga-net
    depends_on:
      - prometheus
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--spider", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
 
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
      - nginx-logs:/var/log/nginx
    networks:
      - marga-net
    depends_on:
      - marga
 
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    networks:
      - marga-net
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana:/etc/grafana/provisioning
    networks:
      - marga-net
 
volumes:
  marga-logs:
  nginx-logs:
  prometheus-data:
  grafana-data:
 
networks:
  marga-net:
    driver: bridge

Kubernetes Deployment

Namespace and ConfigMap

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: marga-system
 
---
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: marga-config
  namespace: marga-system
data:
  config.yaml: |
    server:
      port: 8080
      host: 0.0.0.0
      timeout: 30s
    
    logging:
      level: info
      format: json
    
    metrics:
      enabled: true
      path: /v1/metrics
      datadog:
        enabled: true
        service_name: marga
        environment: production
    
    # ... rest of config

Secret Management

# k8s/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: marga-secrets
  namespace: marga-system
type: Opaque
data:
  # Base64 encoded values
  openai-api-key: c2stbm90LXJlYWwtYXBpLWtleQ==
  anthropic-api-key: c2stYW50LW5vdC1yZWFsLWFwaS1rZXk=
  marga-api-key: c3VwZXItc2VjcmV0LWFwaS1rZXk=
  dd-api-key: ZGQtYXBpLWtleS1ub3QtcmVhbA==

Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: marga
  namespace: marga-system
  labels:
    app: marga
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: marga
  template:
    metadata:
      labels:
        app: marga
    spec:
      containers:
      - name: marga
        image: ghcr.io/gaurav21/marga:latest
        ports:
        - containerPort: 8080
          name: http
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: marga-secrets
              key: openai-api-key
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: marga-secrets
              key: anthropic-api-key
        - name: MARGA_API_KEY
          valueFrom:
            secretKeyRef:
              name: marga-secrets
              key: marga-api-key
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              name: marga-secrets
              key: dd-api-key
        - name: DD_ENV
          value: production
        - name: CONFIG_FILE
          value: /app/config.yaml
        volumeMounts:
        - name: config
          mountPath: /app/config.yaml
          subPath: config.yaml
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
      volumes:
      - name: config
        configMap:
          name: marga-config

Service and Ingress

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: marga-service
  namespace: marga-system
spec:
  selector:
    app: marga
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  type: ClusterIP
 
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: marga-ingress
  namespace: marga-system
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - marga.yourdomain.com
    secretName: marga-tls
  rules:
  - host: marga.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: marga-service
            port:
              number: 80

Deploy to Kubernetes

# Create secrets (use actual base64 values)
kubectl create secret generic marga-secrets \
  --from-literal=openai-api-key="sk-your-real-key" \
  --from-literal=anthropic-api-key="sk-ant-your-real-key" \
  --from-literal=marga-api-key="your-secure-key" \
  --from-literal=dd-api-key="your-dd-key" \
  -n marga-system
 
# Apply all manifests
kubectl apply -f k8s/
 
# Check deployment
kubectl get pods -n marga-system
kubectl logs -f deployment/marga -n marga-system

Google Cloud Run

Deploy with gcloud

# Build and push image
docker build -t gcr.io/your-project/marga .
docker push gcr.io/your-project/marga
 
# Deploy to Cloud Run
gcloud run deploy marga \
  --image gcr.io/your-project/marga \
  --platform managed \
  --region asia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars="DD_ENV=production" \
  --set-secrets="/app/secrets/openai-key=openai-api-key:latest" \
  --set-secrets="/app/secrets/anthropic-key=anthropic-api-key:latest" \
  --memory 1Gi \
  --cpu 1 \
  --concurrency 100 \
  --max-instances 10 \
  --port 8080

Cloud Run YAML

# cloudrun.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: marga
  annotations:
    run.googleapis.com/ingress: all
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/cpu-boost: true
        autoscaling.knative.dev/maxScale: "10"
        run.googleapis.com/execution-environment: gen2
    spec:
      serviceAccountName: marga-service-account
      containers:
      - image: gcr.io/your-project/marga
        ports:
        - containerPort: 8080
        env:
        - name: DD_ENV
          value: production
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              key: latest
              name: openai-api-key
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              key: latest
              name: anthropic-api-key
        resources:
          limits:
            cpu: 1000m
            memory: 1Gi

AWS Deployment

ECS with Fargate

{
  "family": "marga-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "marga",
      "image": "ghcr.io/gaurav21/marga:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/marga",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "secrets": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:openai-key"
        },
        {
          "name": "ANTHROPIC_API_KEY", 
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:anthropic-key"
        }
      ],
      "environment": [
        {
          "name": "DD_ENV",
          "value": "production"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Application Load Balancer

# Create target group
aws elbv2 create-target-group \
  --name marga-targets \
  --protocol HTTP \
  --port 8080 \
  --vpc-id vpc-12345678 \
  --target-type ip \
  --health-check-path /health
 
# Create load balancer
aws elbv2 create-load-balancer \
  --name marga-alb \
  --subnets subnet-12345678 subnet-87654321 \
  --security-groups sg-12345678

Environment Variables Reference

VariableRequiredDefaultDescription
OPENAI_API_KEY-OpenAI API key
ANTHROPIC_API_KEY-Anthropic API key
TOGETHER_API_KEY-Together AI API key
MARGA_API_KEY-MĀRGA access key
CONFIG_FILEconfig.yamlConfig file path
PORT8080Server port
HOST0.0.0.0Server host
LOG_LEVELinfoLog level
DD_API_KEY-Datadog API key
DD_ENV-Datadog environment
DD_SERVICEmargaDatadog service name

Security Considerations

Network Security

# Firewall rules (iptables example)
# Allow only necessary ports
iptables -A INPUT -p tcp --dport 8080 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j DROP
 
# Or use cloud provider security groups
# AWS: Allow port 8080 from ALB security group only
# GCP: Allow port 8080 from load balancer tag only

TLS Configuration

# nginx.conf for TLS termination
server {
    listen 443 ssl http2;
    server_name marga.yourdomain.com;
    
    ssl_certificate /etc/ssl/certs/marga.crt;
    ssl_certificate_key /etc/ssl/private/marga.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512;
    
    location / {
        proxy_pass http://marga:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Secrets Management

Kubernetes:

# Use sealed secrets or external secrets operator
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml

AWS:

# Use AWS Secrets Manager
aws secretsmanager create-secret \
  --name "marga/openai-key" \
  --secret-string "sk-your-openai-key"

GCP:

# Use Google Secret Manager
gcloud secrets create openai-api-key --data-file=key.txt

Monitoring and Observability

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s
 
scrape_configs:
- job_name: 'marga'
  static_configs:
  - targets: ['marga:8080']
  metrics_path: /v1/metrics
  scrape_interval: 30s

Grafana Dashboard

{
  "dashboard": {
    "title": "MĀRGA LLM Router",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(marga_requests_total[5m])",
            "legendFormat": "Requests/sec"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph", 
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(marga_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      }
    ]
  }
}

Scaling Guidelines

Horizontal Scaling

Concurrent UsersRecommended InstancesCPU/Memory per Instance
< 10010.5 CPU, 512MB
100-5002-31 CPU, 1GB
500-20003-52 CPU, 2GB
2000+5+4 CPU, 4GB

Load Balancer Configuration

# HAProxy example
global:
    daemon
 
defaults:
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
 
frontend marga_frontend
    bind *:80
    default_backend marga_servers
 
backend marga_servers
    balance roundrobin
    option httpchk GET /health
    server marga1 10.0.1.10:8080 check
    server marga2 10.0.1.11:8080 check
    server marga3 10.0.1.12:8080 check

Troubleshooting

Common Issues

503 Service Unavailable:

  • Check provider API keys and connectivity
  • Verify health check endpoints
  • Check rate limits

High Latency:

  • Monitor provider response times
  • Check network connectivity
  • Scale up instances

Memory Issues:

  • Monitor request sizes
  • Check for memory leaks
  • Increase memory limits

Debug Commands

# Check logs
docker logs marga-prod --tail 100
 
# Test connectivity
curl -I https://api.openai.com/v1/models
curl -I https://api.anthropic.com/v1/messages
 
# Monitor metrics
curl http://localhost:8080/v1/metrics | grep marga_
 
# Health check
curl http://localhost:8080/health

Performance Tuning

Go Runtime Tuning

# Environment variables for better performance
GOMAXPROCS=4
GOGC=100
GOMEMLIMIT=1GiB

Container Resource Limits

# Kubernetes resource limits
resources:
  requests:
    memory: "1Gi"
    cpu: "500m"
  limits:
    memory: "2Gi"  
    cpu: "1000m"

This covers the major deployment scenarios for MĀRGA. Choose the approach that best fits your infrastructure and scale requirements.