Deployment Guide - DevOps RAG
Deploy DevOps RAG in production environments using Docker, docker-compose, Kubernetes, or Cloud Run. This guide covers persistent storage, scaling, and best practices.
🐳 Docker Deployment
Basic Docker Run
docker run -d \
--name devops-rag \
--restart unless-stopped \
-p 8080:8080 \
-e OPENAI_API_KEY=your_api_key \
-v /opt/devops-rag/data:/app/data \
-v /opt/devops-rag/runbooks:/app/runbooks \
ghcr.io/gaurav21/devops-rag:latestWith Environment File
Create .env:
OPENAI_API_KEY=sk-your-key-here
DD_API_KEY=your_datadog_api_key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=trueRun with env file:
docker run -d \
--name devops-rag \
--restart unless-stopped \
-p 8080:8080 \
--env-file .env \
-v $(pwd)/data:/app/data \
-v $(pwd)/runbooks:/app/runbooks \
ghcr.io/gaurav21/devops-rag:latest🐙 Docker Compose Deployment
docker-compose.yml
version: '3.8'
services:
devops-rag:
image: ghcr.io/gaurav21/devops-rag:latest
container_name: devops-rag
restart: unless-stopped
ports:
- "8080:8080"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- DD_API_KEY=${DD_API_KEY}
- DD_SITE=${DD_SITE:-datadoghq.com}
- DD_SERVICE=devops-rag
- DD_ENV=${ENVIRONMENT:-production}
- DD_VERSION=1.0.0
- DD_TRACE_ENABLED=true
- PORT=8080
volumes:
# Persistent vector index storage
- ./data:/app/data
# Your runbooks directory
- ./runbooks:/app/runbooks
# Optional: custom config
- ./config.py:/app/config.py:ro
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Optional: Nginx reverse proxy with SSL
nginx:
image: nginx:alpine
container_name: devops-rag-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
depends_on:
- devops-rag
volumes:
data:
driver: localDeploy
# Create directories
mkdir -p data runbooks
# Copy your runbooks
cp -r /path/to/your/runbooks/* ./runbooks/
# Start services
docker-compose up -d
# Check status
docker-compose ps
docker-compose logs -f devops-rag☸️ Kubernetes Deployment
Namespace and ConfigMap
apiVersion: v1
kind: Namespace
metadata:
name: devops-rag
---
apiVersion: v1
kind: ConfigMap
metadata:
name: devops-rag-config
namespace: devops-rag
data:
DD_SITE: "datadoghq.com"
DD_SERVICE: "devops-rag"
DD_ENV: "production"
DD_VERSION: "1.0.0"
DD_TRACE_ENABLED: "true"
PORT: "8080"Secret for API Keys
apiVersion: v1
kind: Secret
metadata:
name: devops-rag-secrets
namespace: devops-rag
type: Opaque
data:
openai-api-key: <base64-encoded-openai-key>
datadog-api-key: <base64-encoded-datadog-key>PersistentVolume for Index Storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: devops-rag-data
namespace: devops-rag
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: fast-ssd
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: devops-rag-runbooks
namespace: devops-rag
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: shared-nfsDeployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: devops-rag
namespace: devops-rag
labels:
app: devops-rag
spec:
replicas: 2
selector:
matchLabels:
app: devops-rag
template:
metadata:
labels:
app: devops-rag
annotations:
ad.datadoghq.com/devops-rag.logs: '[{"source":"uvicorn","service":"devops-rag"}]'
ad.datadoghq.com/devops-rag.check_names: '["http_check"]'
ad.datadoghq.com/devops-rag.init_configs: '[{}]'
ad.datadoghq.com/devops-rag.instances: '[{"url":"http://%%host%%:8080/health","name":"devops-rag"}]'
spec:
containers:
- name: devops-rag
image: ghcr.io/gaurav21/devops-rag:latest
ports:
- containerPort: 8080
name: http
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: devops-rag-secrets
key: openai-api-key
- name: DD_API_KEY
valueFrom:
secretKeyRef:
name: devops-rag-secrets
key: datadog-api-key
envFrom:
- configMapRef:
name: devops-rag-config
volumeMounts:
- name: data
mountPath: /app/data
- name: runbooks
mountPath: /app/runbooks
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: data
persistentVolumeClaim:
claimName: devops-rag-data
- name: runbooks
persistentVolumeClaim:
claimName: devops-rag-runbooksService and Ingress
apiVersion: v1
kind: Service
metadata:
name: devops-rag-service
namespace: devops-rag
spec:
selector:
app: devops-rag
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: devops-rag-ingress
namespace: devops-rag
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
tls:
- hosts:
- devops-rag.yourdomain.com
secretName: devops-rag-tls
rules:
- host: devops-rag.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: devops-rag-service
port:
number: 80Deploy to Kubernetes
# Apply all manifests
kubectl apply -f k8s/
# Check deployment
kubectl get pods -n devops-rag
kubectl logs -f deployment/devops-rag -n devops-rag
# Port forward for testing
kubectl port-forward -n devops-rag service/devops-rag-service 8080:80☁️ Google Cloud Run Deployment
Using gcloud CLI
# Deploy directly from GitHub Container Registry
gcloud run deploy devops-rag \
--image=ghcr.io/gaurav21/devops-rag:latest \
--platform=managed \
--region=asia-southeast1 \
--allow-unauthenticated \
--set-env-vars="DD_SERVICE=devops-rag,DD_ENV=production" \
--set-secrets="OPENAI_API_KEY=openai-key:latest" \
--set-secrets="DD_API_KEY=datadog-key:latest" \
--memory=1Gi \
--cpu=1 \
--concurrency=100 \
--max-instances=10 \
--timeout=300s \
--port=8080Cloud Run YAML
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: devops-rag
annotations:
run.googleapis.com/ingress: all
run.googleapis.com/execution-environment: gen2
spec:
template:
metadata:
annotations:
run.googleapis.com/memory: "1Gi"
run.googleapis.com/cpu: "1"
run.googleapis.com/execution-environment: gen2
autoscaling.knative.dev/maxScale: "10"
run.googleapis.com/timeout: "300s"
spec:
containerConcurrency: 100
containers:
- image: ghcr.io/gaurav21/devops-rag:latest
ports:
- containerPort: 8080
env:
- name: DD_SERVICE
value: devops-rag
- name: DD_ENV
value: production
- name: DD_VERSION
value: "1.0.0"
- name: DD_TRACE_ENABLED
value: "true"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
key: "latest"
name: "openai-key"
- name: DD_API_KEY
valueFrom:
secretKeyRef:
key: "latest"
name: "datadog-key"
resources:
limits:
memory: "1Gi"
cpu: "1000m"Deploy with Custom Domain
# Deploy service
gcloud run services replace service.yaml --region=asia-southeast1
# Map custom domain
gcloud run domain-mappings create \
--service=devops-rag \
--domain=rag.yourdomain.com \
--region=asia-southeast1📊 Production Configuration
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY | ✅ | - | OpenAI API key for embeddings |
DD_API_KEY | 🔶 | - | Datadog API key for monitoring |
DD_SERVICE | 🔶 | devops-rag | Service name in Datadog |
DD_ENV | 🔶 | production | Environment tag |
PORT | ❌ | 8080 | Server port |
CHUNK_SIZE | ❌ | 512 | Token chunk size |
CHUNK_OVERLAP | ❌ | 64 | Chunk overlap tokens |
TOP_K | ❌ | 5 | Default retrieval count |
Volume Mounts (Critical)
⚠️ Important: The vector index (/app/data/index.json) must be persisted across container restarts. Without persistent storage, the system will need to re-index all runbooks on every restart.
# Required mounts
-v /persistent/path/data:/app/data # Vector index storage
-v /path/to/runbooks:/app/runbooks # Your documentationResource Requirements
| Deployment Type | CPU | Memory | Storage |
|---|---|---|---|
| Development | 0.5 CPU | 512Mi | 1Gi |
| Production | 1 CPU | 1Gi | 5Gi |
| High Load | 2 CPU | 2Gi | 10Gi |
Health Checks
# Health endpoint
curl http://your-domain/health
# Expected response
{
"status": "ok",
"index_ready": true,
"total_chunks": 45,
"total_sources": 18
}🔧 Configuration Tuning
Custom Configuration
Create config.py and mount it into the container:
# config.py - Custom RAG configuration
EMBEDDING_MODEL = "text-embedding-3-small" # or text-embedding-3-large
GENERATION_MODEL = "gpt-4o-mini" # or gpt-4o
CHUNK_SIZE = 512
CHUNK_OVERLAP = 64
TOP_K = 5
SIMILARITY_THRESHOLD = 0.7
# Datadog configuration
DATADOG_ENABLED = True
DD_TRACE_SAMPLE_RATE = 0.1Mount in Docker:
-v $(pwd)/config.py:/app/config.py:roPerformance Tuning
For faster queries:
- Reduce
TOP_Kfrom 5 to 3 - Use
text-embedding-3-smallinstead oflarge - Reduce
CHUNK_SIZEto 256
For better accuracy:
- Increase
TOP_Kto 8-10 - Use
text-embedding-3-large - Reduce
SIMILARITY_THRESHOLDto 0.6
For cost optimization:
- Use
gpt-4o-minifor generation - Cache frequent queries (add Redis layer)
- Implement rate limiting
🔐 Security Best Practices
API Key Management
# Never put keys in Dockerfiles or images
# Use secrets management:
# Kubernetes
kubectl create secret generic devops-rag-secrets \
--from-literal=openai-key=sk-your-key
# Cloud Run
gcloud secrets create openai-key --data-file=key.txt
# Docker Swarm
echo "sk-your-key" | docker secret create openai_key -Network Security
# Limit container access
docker run --security-opt=no-new-privileges \
--read-only \
--tmpfs /tmp \
--user 1001:1001 \
ghcr.io/gaurav21/devops-rag:latestMonitoring & Alerting
Set up alerts for:
- High error rates (>5%)
- Slow queries (>2s p95)
- Index corruption (health check fails)
- High memory usage (>80%)
- OpenAI API quota exhaustion
🚨 Troubleshooting
Container Fails to Start
# Check logs
docker logs devops-rag
# Common issues:
# 1. Missing OPENAI_API_KEY
# 2. Permission denied on volume mounts
# 3. Port already in useIndex Not Persisting
# Verify volume mount
docker inspect devops-rag | grep -A 5 Mounts
# Check ownership
ls -la /path/to/data/
# Fix permissions
chown -R 1001:1001 /path/to/data/Poor Query Performance
# Check index size
curl http://localhost:8080/stats
# Monitor memory usage
docker stats devops-rag
# Check query latency
time curl -X POST localhost:8080/ask -d '{"question":"test"}'OpenAI API Issues
# Test API key directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
# Check rate limits in container logs
docker logs devops-rag 2>&1 | grep -i rateNext Steps:
- Configuration Guide - Detailed tuning options
- Monitoring Guide - Datadog dashboards and alerts
- Knowledge Base Guide - Optimize your runbooks