Deployment Guide
This guide covers deploying MĀRGA to production environments including Docker, Kubernetes, Cloud Run, and traditional server deployments.
Self-hosted vs Managed
| Feature | Self-hosted | Managed (Avyay Cloud) |
|---|---|---|
| Control | Full control over infrastructure | Managed infrastructure |
| Scaling | Manual scaling setup | Auto-scaling included |
| Monitoring | Setup required | Built-in dashboards |
| Updates | Manual updates | Automatic updates |
| Support | Community support | 24/7 enterprise support |
| SLA | Self-managed | 99.9% uptime SLA |
| Cost | Infrastructure + ops time | Usage-based pricing |
Docker Deployment
Single Container
Simple deployment for small-scale production:
# Pull latest image
docker pull ghcr.io/gaurav21/marga:latest
# Create production environment file
cat > .env << EOF
OPENAI_API_KEY=sk-your-production-key
ANTHROPIC_API_KEY=sk-ant-your-production-key
MARGA_API_KEY=$(openssl rand -hex 32)
DD_API_KEY=your-datadog-key
DD_ENV=production
LOG_LEVEL=info
EOF
# Run in production mode
docker run -d \
--name marga-prod \
--restart unless-stopped \
-p 8080:8080 \
--env-file .env \
-v /var/log/marga:/app/logs \
ghcr.io/gaurav21/marga:latestDocker Compose Production
For multi-service deployments with monitoring:
# docker-compose.prod.yml
version: '3.8'
services:
marga:
image: ghcr.io/gaurav21/marga:latest
restart: unless-stopped
ports:
- "8080:8080"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- MARGA_API_KEY=${MARGA_API_KEY}
- DD_API_KEY=${DD_API_KEY}
- DD_ENV=production
- LOG_LEVEL=info
volumes:
- ./config/prod-config.yaml:/app/config.yaml:ro
- marga-logs:/app/logs
networks:
- marga-net
depends_on:
- prometheus
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--spider", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- nginx-logs:/var/log/nginx
networks:
- marga-net
depends_on:
- marga
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
networks:
- marga-net
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana-data:/var/lib/grafana
- ./monitoring/grafana:/etc/grafana/provisioning
networks:
- marga-net
volumes:
marga-logs:
nginx-logs:
prometheus-data:
grafana-data:
networks:
marga-net:
driver: bridgeKubernetes Deployment
Namespace and ConfigMap
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: marga-system
---
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: marga-config
namespace: marga-system
data:
config.yaml: |
server:
port: 8080
host: 0.0.0.0
timeout: 30s
logging:
level: info
format: json
metrics:
enabled: true
path: /v1/metrics
datadog:
enabled: true
service_name: marga
environment: production
# ... rest of configSecret Management
# k8s/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: marga-secrets
namespace: marga-system
type: Opaque
data:
# Base64 encoded values
openai-api-key: c2stbm90LXJlYWwtYXBpLWtleQ==
anthropic-api-key: c2stYW50LW5vdC1yZWFsLWFwaS1rZXk=
marga-api-key: c3VwZXItc2VjcmV0LWFwaS1rZXk=
dd-api-key: ZGQtYXBpLWtleS1ub3QtcmVhbA==Deployment
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: marga
namespace: marga-system
labels:
app: marga
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
selector:
matchLabels:
app: marga
template:
metadata:
labels:
app: marga
spec:
containers:
- name: marga
image: ghcr.io/gaurav21/marga:latest
ports:
- containerPort: 8080
name: http
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: marga-secrets
key: openai-api-key
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: marga-secrets
key: anthropic-api-key
- name: MARGA_API_KEY
valueFrom:
secretKeyRef:
name: marga-secrets
key: marga-api-key
- name: DD_API_KEY
valueFrom:
secretKeyRef:
name: marga-secrets
key: dd-api-key
- name: DD_ENV
value: production
- name: CONFIG_FILE
value: /app/config.yaml
volumeMounts:
- name: config
mountPath: /app/config.yaml
subPath: config.yaml
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
volumes:
- name: config
configMap:
name: marga-configService and Ingress
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: marga-service
namespace: marga-system
spec:
selector:
app: marga
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: marga-ingress
namespace: marga-system
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- marga.yourdomain.com
secretName: marga-tls
rules:
- host: marga.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: marga-service
port:
number: 80Deploy to Kubernetes
# Create secrets (use actual base64 values)
kubectl create secret generic marga-secrets \
--from-literal=openai-api-key="sk-your-real-key" \
--from-literal=anthropic-api-key="sk-ant-your-real-key" \
--from-literal=marga-api-key="your-secure-key" \
--from-literal=dd-api-key="your-dd-key" \
-n marga-system
# Apply all manifests
kubectl apply -f k8s/
# Check deployment
kubectl get pods -n marga-system
kubectl logs -f deployment/marga -n marga-systemGoogle Cloud Run
Deploy with gcloud
# Build and push image
docker build -t gcr.io/your-project/marga .
docker push gcr.io/your-project/marga
# Deploy to Cloud Run
gcloud run deploy marga \
--image gcr.io/your-project/marga \
--platform managed \
--region asia-southeast1 \
--allow-unauthenticated \
--set-env-vars="DD_ENV=production" \
--set-secrets="/app/secrets/openai-key=openai-api-key:latest" \
--set-secrets="/app/secrets/anthropic-key=anthropic-api-key:latest" \
--memory 1Gi \
--cpu 1 \
--concurrency 100 \
--max-instances 10 \
--port 8080Cloud Run YAML
# cloudrun.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: marga
annotations:
run.googleapis.com/ingress: all
spec:
template:
metadata:
annotations:
run.googleapis.com/cpu-boost: true
autoscaling.knative.dev/maxScale: "10"
run.googleapis.com/execution-environment: gen2
spec:
serviceAccountName: marga-service-account
containers:
- image: gcr.io/your-project/marga
ports:
- containerPort: 8080
env:
- name: DD_ENV
value: production
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
key: latest
name: openai-api-key
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
key: latest
name: anthropic-api-key
resources:
limits:
cpu: 1000m
memory: 1GiAWS Deployment
ECS with Fargate
{
"family": "marga-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::account:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "marga",
"image": "ghcr.io/gaurav21/marga:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/marga",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "ecs"
}
},
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:region:account:secret:openai-key"
},
{
"name": "ANTHROPIC_API_KEY",
"valueFrom": "arn:aws:secretsmanager:region:account:secret:anthropic-key"
}
],
"environment": [
{
"name": "DD_ENV",
"value": "production"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
]
}Application Load Balancer
# Create target group
aws elbv2 create-target-group \
--name marga-targets \
--protocol HTTP \
--port 8080 \
--vpc-id vpc-12345678 \
--target-type ip \
--health-check-path /health
# Create load balancer
aws elbv2 create-load-balancer \
--name marga-alb \
--subnets subnet-12345678 subnet-87654321 \
--security-groups sg-12345678Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY | ✅ | - | OpenAI API key |
ANTHROPIC_API_KEY | ✅ | - | Anthropic API key |
TOGETHER_API_KEY | ❌ | - | Together AI API key |
MARGA_API_KEY | ✅ | - | MĀRGA access key |
CONFIG_FILE | ❌ | config.yaml | Config file path |
PORT | ❌ | 8080 | Server port |
HOST | ❌ | 0.0.0.0 | Server host |
LOG_LEVEL | ❌ | info | Log level |
DD_API_KEY | ❌ | - | Datadog API key |
DD_ENV | ❌ | - | Datadog environment |
DD_SERVICE | ❌ | marga | Datadog service name |
Security Considerations
Network Security
# Firewall rules (iptables example)
# Allow only necessary ports
iptables -A INPUT -p tcp --dport 8080 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j DROP
# Or use cloud provider security groups
# AWS: Allow port 8080 from ALB security group only
# GCP: Allow port 8080 from load balancer tag onlyTLS Configuration
# nginx.conf for TLS termination
server {
listen 443 ssl http2;
server_name marga.yourdomain.com;
ssl_certificate /etc/ssl/certs/marga.crt;
ssl_certificate_key /etc/ssl/private/marga.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512;
location / {
proxy_pass http://marga:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Secrets Management
Kubernetes:
# Use sealed secrets or external secrets operator
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yamlAWS:
# Use AWS Secrets Manager
aws secretsmanager create-secret \
--name "marga/openai-key" \
--secret-string "sk-your-openai-key"GCP:
# Use Google Secret Manager
gcloud secrets create openai-api-key --data-file=key.txtMonitoring and Observability
Prometheus Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'marga'
static_configs:
- targets: ['marga:8080']
metrics_path: /v1/metrics
scrape_interval: 30sGrafana Dashboard
{
"dashboard": {
"title": "MĀRGA LLM Router",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(marga_requests_total[5m])",
"legendFormat": "Requests/sec"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(marga_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
}
]
}
]
}
}Scaling Guidelines
Horizontal Scaling
| Concurrent Users | Recommended Instances | CPU/Memory per Instance |
|---|---|---|
| < 100 | 1 | 0.5 CPU, 512MB |
| 100-500 | 2-3 | 1 CPU, 1GB |
| 500-2000 | 3-5 | 2 CPU, 2GB |
| 2000+ | 5+ | 4 CPU, 4GB |
Load Balancer Configuration
# HAProxy example
global:
daemon
defaults:
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend marga_frontend
bind *:80
default_backend marga_servers
backend marga_servers
balance roundrobin
option httpchk GET /health
server marga1 10.0.1.10:8080 check
server marga2 10.0.1.11:8080 check
server marga3 10.0.1.12:8080 checkTroubleshooting
Common Issues
503 Service Unavailable:
- Check provider API keys and connectivity
- Verify health check endpoints
- Check rate limits
High Latency:
- Monitor provider response times
- Check network connectivity
- Scale up instances
Memory Issues:
- Monitor request sizes
- Check for memory leaks
- Increase memory limits
Debug Commands
# Check logs
docker logs marga-prod --tail 100
# Test connectivity
curl -I https://api.openai.com/v1/models
curl -I https://api.anthropic.com/v1/messages
# Monitor metrics
curl http://localhost:8080/v1/metrics | grep marga_
# Health check
curl http://localhost:8080/healthPerformance Tuning
Go Runtime Tuning
# Environment variables for better performance
GOMAXPROCS=4
GOGC=100
GOMEMLIMIT=1GiBContainer Resource Limits
# Kubernetes resource limits
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"This covers the major deployment scenarios for MĀRGA. Choose the approach that best fits your infrastructure and scale requirements.