DevOps RAGDeployment

Deployment Guide - DevOps RAG

Deploy DevOps RAG in production environments using Docker, docker-compose, Kubernetes, or Cloud Run. This guide covers persistent storage, scaling, and best practices.

🐳 Docker Deployment

Basic Docker Run

docker run -d \
  --name devops-rag \
  --restart unless-stopped \
  -p 8080:8080 \
  -e OPENAI_API_KEY=your_api_key \
  -v /opt/devops-rag/data:/app/data \
  -v /opt/devops-rag/runbooks:/app/runbooks \
  ghcr.io/gaurav21/devops-rag:latest

With Environment File

Create .env:

OPENAI_API_KEY=sk-your-key-here
DD_API_KEY=your_datadog_api_key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true

Run with env file:

docker run -d \
  --name devops-rag \
  --restart unless-stopped \
  -p 8080:8080 \
  --env-file .env \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/runbooks:/app/runbooks \
  ghcr.io/gaurav21/devops-rag:latest

🐙 Docker Compose Deployment

docker-compose.yml

version: '3.8'
 
services:
  devops-rag:
    image: ghcr.io/gaurav21/devops-rag:latest
    container_name: devops-rag
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - DD_API_KEY=${DD_API_KEY}
      - DD_SITE=${DD_SITE:-datadoghq.com}
      - DD_SERVICE=devops-rag
      - DD_ENV=${ENVIRONMENT:-production}
      - DD_VERSION=1.0.0
      - DD_TRACE_ENABLED=true
      - PORT=8080
    volumes:
      # Persistent vector index storage
      - ./data:/app/data
      # Your runbooks directory
      - ./runbooks:/app/runbooks
      # Optional: custom config
      - ./config.py:/app/config.py:ro
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
 
  # Optional: Nginx reverse proxy with SSL
  nginx:
    image: nginx:alpine
    container_name: devops-rag-nginx
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
    depends_on:
      - devops-rag
 
volumes:
  data:
    driver: local

Deploy

# Create directories
mkdir -p data runbooks
 
# Copy your runbooks
cp -r /path/to/your/runbooks/* ./runbooks/
 
# Start services
docker-compose up -d
 
# Check status
docker-compose ps
docker-compose logs -f devops-rag

☸️ Kubernetes Deployment

Namespace and ConfigMap

apiVersion: v1
kind: Namespace
metadata:
  name: devops-rag
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: devops-rag-config
  namespace: devops-rag
data:
  DD_SITE: "datadoghq.com"
  DD_SERVICE: "devops-rag"
  DD_ENV: "production"
  DD_VERSION: "1.0.0"
  DD_TRACE_ENABLED: "true"
  PORT: "8080"

Secret for API Keys

apiVersion: v1
kind: Secret
metadata:
  name: devops-rag-secrets
  namespace: devops-rag
type: Opaque
data:
  openai-api-key: <base64-encoded-openai-key>
  datadog-api-key: <base64-encoded-datadog-key>

PersistentVolume for Index Storage

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: devops-rag-data
  namespace: devops-rag
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: fast-ssd
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: devops-rag-runbooks
  namespace: devops-rag
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: shared-nfs

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: devops-rag
  namespace: devops-rag
  labels:
    app: devops-rag
spec:
  replicas: 2
  selector:
    matchLabels:
      app: devops-rag
  template:
    metadata:
      labels:
        app: devops-rag
      annotations:
        ad.datadoghq.com/devops-rag.logs: '[{"source":"uvicorn","service":"devops-rag"}]'
        ad.datadoghq.com/devops-rag.check_names: '["http_check"]'
        ad.datadoghq.com/devops-rag.init_configs: '[{}]'
        ad.datadoghq.com/devops-rag.instances: '[{"url":"http://%%host%%:8080/health","name":"devops-rag"}]'
    spec:
      containers:
      - name: devops-rag
        image: ghcr.io/gaurav21/devops-rag:latest
        ports:
        - containerPort: 8080
          name: http
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: devops-rag-secrets
              key: openai-api-key
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              name: devops-rag-secrets
              key: datadog-api-key
        envFrom:
        - configMapRef:
            name: devops-rag-config
        volumeMounts:
        - name: data
          mountPath: /app/data
        - name: runbooks
          mountPath: /app/runbooks
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: devops-rag-data
      - name: runbooks
        persistentVolumeClaim:
          claimName: devops-rag-runbooks

Service and Ingress

apiVersion: v1
kind: Service
metadata:
  name: devops-rag-service
  namespace: devops-rag
spec:
  selector:
    app: devops-rag
  ports:
    - port: 80
      targetPort: 8080
      name: http
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: devops-rag-ingress
  namespace: devops-rag
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
  tls:
  - hosts:
    - devops-rag.yourdomain.com
    secretName: devops-rag-tls
  rules:
  - host: devops-rag.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: devops-rag-service
            port:
              number: 80

Deploy to Kubernetes

# Apply all manifests
kubectl apply -f k8s/
 
# Check deployment
kubectl get pods -n devops-rag
kubectl logs -f deployment/devops-rag -n devops-rag
 
# Port forward for testing
kubectl port-forward -n devops-rag service/devops-rag-service 8080:80

☁️ Google Cloud Run Deployment

Using gcloud CLI

# Deploy directly from GitHub Container Registry
gcloud run deploy devops-rag \
  --image=ghcr.io/gaurav21/devops-rag:latest \
  --platform=managed \
  --region=asia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars="DD_SERVICE=devops-rag,DD_ENV=production" \
  --set-secrets="OPENAI_API_KEY=openai-key:latest" \
  --set-secrets="DD_API_KEY=datadog-key:latest" \
  --memory=1Gi \
  --cpu=1 \
  --concurrency=100 \
  --max-instances=10 \
  --timeout=300s \
  --port=8080

Cloud Run YAML

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: devops-rag
  annotations:
    run.googleapis.com/ingress: all
    run.googleapis.com/execution-environment: gen2
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/memory: "1Gi"
        run.googleapis.com/cpu: "1"
        run.googleapis.com/execution-environment: gen2
        autoscaling.knative.dev/maxScale: "10"
        run.googleapis.com/timeout: "300s"
    spec:
      containerConcurrency: 100
      containers:
      - image: ghcr.io/gaurav21/devops-rag:latest
        ports:
        - containerPort: 8080
        env:
        - name: DD_SERVICE
          value: devops-rag
        - name: DD_ENV
          value: production
        - name: DD_VERSION
          value: "1.0.0"
        - name: DD_TRACE_ENABLED
          value: "true"
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              key: "latest"
              name: "openai-key"
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              key: "latest" 
              name: "datadog-key"
        resources:
          limits:
            memory: "1Gi"
            cpu: "1000m"

Deploy with Custom Domain

# Deploy service
gcloud run services replace service.yaml --region=asia-southeast1
 
# Map custom domain
gcloud run domain-mappings create \
  --service=devops-rag \
  --domain=rag.yourdomain.com \
  --region=asia-southeast1

📊 Production Configuration

Environment Variables

VariableRequiredDefaultDescription
OPENAI_API_KEY-OpenAI API key for embeddings
DD_API_KEY🔶-Datadog API key for monitoring
DD_SERVICE🔶devops-ragService name in Datadog
DD_ENV🔶productionEnvironment tag
PORT8080Server port
CHUNK_SIZE512Token chunk size
CHUNK_OVERLAP64Chunk overlap tokens
TOP_K5Default retrieval count

Volume Mounts (Critical)

⚠️ Important: The vector index (/app/data/index.json) must be persisted across container restarts. Without persistent storage, the system will need to re-index all runbooks on every restart.

# Required mounts
-v /persistent/path/data:/app/data        # Vector index storage
-v /path/to/runbooks:/app/runbooks        # Your documentation

Resource Requirements

Deployment TypeCPUMemoryStorage
Development0.5 CPU512Mi1Gi
Production1 CPU1Gi5Gi
High Load2 CPU2Gi10Gi

Health Checks

# Health endpoint
curl http://your-domain/health
 
# Expected response
{
  "status": "ok",
  "index_ready": true,
  "total_chunks": 45,
  "total_sources": 18
}

🔧 Configuration Tuning

Custom Configuration

Create config.py and mount it into the container:

# config.py - Custom RAG configuration
EMBEDDING_MODEL = "text-embedding-3-small"  # or text-embedding-3-large
GENERATION_MODEL = "gpt-4o-mini"            # or gpt-4o  
CHUNK_SIZE = 512
CHUNK_OVERLAP = 64
TOP_K = 5
SIMILARITY_THRESHOLD = 0.7
 
# Datadog configuration  
DATADOG_ENABLED = True
DD_TRACE_SAMPLE_RATE = 0.1

Mount in Docker:

-v $(pwd)/config.py:/app/config.py:ro

Performance Tuning

For faster queries:

  • Reduce TOP_K from 5 to 3
  • Use text-embedding-3-small instead of large
  • Reduce CHUNK_SIZE to 256

For better accuracy:

  • Increase TOP_K to 8-10
  • Use text-embedding-3-large
  • Reduce SIMILARITY_THRESHOLD to 0.6

For cost optimization:

  • Use gpt-4o-mini for generation
  • Cache frequent queries (add Redis layer)
  • Implement rate limiting

🔐 Security Best Practices

API Key Management

# Never put keys in Dockerfiles or images
# Use secrets management:
 
# Kubernetes
kubectl create secret generic devops-rag-secrets \
  --from-literal=openai-key=sk-your-key
 
# Cloud Run
gcloud secrets create openai-key --data-file=key.txt
 
# Docker Swarm  
echo "sk-your-key" | docker secret create openai_key -

Network Security

# Limit container access
docker run --security-opt=no-new-privileges \
  --read-only \
  --tmpfs /tmp \
  --user 1001:1001 \
  ghcr.io/gaurav21/devops-rag:latest

Monitoring & Alerting

Set up alerts for:

  • High error rates (>5%)
  • Slow queries (>2s p95)
  • Index corruption (health check fails)
  • High memory usage (>80%)
  • OpenAI API quota exhaustion

🚨 Troubleshooting

Container Fails to Start

# Check logs
docker logs devops-rag
 
# Common issues:
# 1. Missing OPENAI_API_KEY
# 2. Permission denied on volume mounts
# 3. Port already in use

Index Not Persisting

# Verify volume mount
docker inspect devops-rag | grep -A 5 Mounts
 
# Check ownership
ls -la /path/to/data/
 
# Fix permissions
chown -R 1001:1001 /path/to/data/

Poor Query Performance

# Check index size
curl http://localhost:8080/stats
 
# Monitor memory usage
docker stats devops-rag
 
# Check query latency
time curl -X POST localhost:8080/ask -d '{"question":"test"}'

OpenAI API Issues

# Test API key directly
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"
 
# Check rate limits in container logs
docker logs devops-rag 2>&1 | grep -i rate

Next Steps: