Configuration Guide - DevOps RAG
Complete configuration reference for tuning embedding models, chunk parameters, retrieval settings, and system behavior.
📋 Configuration Overview
DevOps RAG can be configured through environment variables, configuration files, or runtime parameters. This guide covers all available options and their impact on performance and accuracy.
🤖 Model Configuration
Embedding Models
| Model | Dimensions | Cost | Use Case |
|---|---|---|---|
text-embedding-3-small | 1536 | $0.02/1M tokens | Recommended - Best balance |
text-embedding-3-large | 3072 | $0.13/1M tokens | High accuracy, complex domains |
text-embedding-ada-002 | 1536 | $0.10/1M tokens | Legacy, not recommended |
Configuration:
# config.py
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_BATCH_SIZE = 100Environment Variable:
EMBEDDING_MODEL=text-embedding-3-smallGeneration Models
| Model | Cost | Latency | Quality | Use Case |
|---|---|---|---|---|
gpt-4o-mini | $0.15/1M input | ~500ms | High | Default - Fast & accurate |
gpt-4o | $5.00/1M input | ~1500ms | Highest | Complex reasoning tasks |
gpt-3.5-turbo | $0.50/1M input | ~300ms | Good | Budget option |
Configuration:
GENERATION_MODEL = "gpt-4o-mini"
MAX_TOKENS = 1000
TEMPERATURE = 0.1📄 Chunking Strategy
Chunking parameters directly impact retrieval quality. Based on our tuning report, these are the optimized settings.
Chunk Size
| Size | Chunks | Precision | Recall | Use Case |
|---|---|---|---|---|
| 512 | 45 | High | Perfect | Recommended |
| 256 | 83 | Highest | Good | High precision needs |
| 768 | 36 | Medium | Good | Longer context |
| 1024 | 26 | Lower | Good | Minimal chunks |
Configuration:
CHUNK_SIZE = 512 # Tokens per chunk
CHUNK_OVERLAP = 64 # Overlap between chunks
MIN_CHUNK_SIZE = 100 # Minimum viable chunkChunk Overlap
| Overlap | Rank-1 Accuracy | Index Size | Use Case |
|---|---|---|---|
| 64 | 100% | +13% | Recommended |
| 0 | 90% | Baseline | Storage constrained |
| 128 | 100% | +20% | Maximum accuracy |
Chunking Process
# Advanced chunking configuration
CHUNKING_CONFIG = {
"strategy": "semantic", # semantic, fixed, sliding
"chunk_size": 512,
"overlap": 64,
"separators": ["\n\n", "\n", ". ", " "],
"keep_separator": True,
"respect_word_boundaries": True,
"min_chunk_size": 100,
"max_chunk_size": 800
}🔍 Retrieval Configuration
Top-K Settings
| Top-K | Context Window | Latency | Quality | Cost |
|---|---|---|---|---|
| 3 | ~1,500 tokens | Fast | Good | Low |
| 5 | ~2,500 tokens | Medium | Best | Medium |
| 8 | ~4,000 tokens | Slower | Diminishing | High |
Configuration:
# Retrieval settings
TOP_K = 5 # Number of chunks to retrieve
SIMILARITY_THRESHOLD = 0.7 # Minimum cosine similarity
MAX_CONTEXT_LENGTH = 8000 # Max tokens in generation context
RERANK_TOP_K = True # Re-rank results by relevanceSimilarity Thresholds
# Quality gates
SIMILARITY_THRESHOLDS = {
"high_confidence": 0.85, # Show with high confidence
"medium_confidence": 0.75, # Show with caution
"low_confidence": 0.65, # Show as fallback
"min_threshold": 0.5 # Minimum to include
}Advanced Retrieval
# Hybrid search (if enabled)
HYBRID_SEARCH = {
"enabled": False, # Enable BM25 + vector search
"bm25_weight": 0.3, # BM25 contribution (0-1)
"vector_weight": 0.7, # Vector contribution (0-1)
"normalize_scores": True
}
# Metadata filtering
METADATA_FILTERS = {
"enabled": True,
"fields": ["source", "category", "severity", "team"],
"default_filters": {}
}🚀 Performance Configuration
Server Settings
# FastAPI/Uvicorn configuration
SERVER_CONFIG = {
"host": "0.0.0.0",
"port": 8080,
"workers": 1, # Single worker for consistency
"timeout_keep_alive": 30,
"timeout_graceful_shutdown": 30,
"max_requests": 1000,
"max_requests_jitter": 50
}Caching
# Response caching
CACHE_CONFIG = {
"enabled": True,
"backend": "memory", # memory, redis, file
"ttl": 3600, # 1 hour
"max_size": 1000, # Max cached responses
"cache_embeddings": True, # Cache query embeddings
"cache_generations": True # Cache generated responses
}Concurrency
# Async configuration
ASYNC_CONFIG = {
"embedding_batch_size": 100, # Batch embed requests
"max_concurrent_requests": 10, # Parallel OpenAI calls
"request_timeout": 30, # OpenAI timeout
"retry_attempts": 3,
"retry_delay": 1.0
}📊 Monitoring Configuration
Datadog Integration
# Datadog configuration
DATADOG_CONFIG = {
"enabled": True,
"service": "devops-rag",
"env": "production",
"version": "1.0.0",
"trace_sample_rate": 0.1, # 10% sampling
"log_level": "INFO",
"custom_metrics": {
"query_latency": True,
"similarity_scores": True,
"token_usage": True,
"error_rates": True
}
}Environment Variables:
# Datadog
DD_API_KEY=your_datadog_api_key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
DD_TRACE_SAMPLE_RATE=0.1
DD_LOGS_ENABLED=true
# Custom metrics
DD_RUNTIME_METRICS_ENABLED=true
DD_PROFILING_ENABLED=trueLogging Configuration
# Logging setup
LOGGING_CONFIG = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"default": {
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
},
"json": {
"format": "%(timestamp)s %(level)s %(name)s %(message)s",
"class": "pythonjsonlogger.jsonlogger.JsonFormatter"
}
},
"handlers": {
"console": {
"level": "INFO",
"class": "logging.StreamHandler",
"formatter": "json"
},
"file": {
"level": "DEBUG",
"class": "logging.handlers.RotatingFileHandler",
"filename": "/app/logs/devops-rag.log",
"maxBytes": 10485760,
"backupCount": 5,
"formatter": "json"
}
},
"root": {
"level": "INFO",
"handlers": ["console", "file"]
}
}🔧 Index Configuration
Storage Backend
# Vector index storage
INDEX_CONFIG = {
"backend": "json", # json, postgres, pinecone
"file_path": "/app/data/index.json",
"auto_save": True,
"save_interval": 300, # 5 minutes
"backup_enabled": True,
"backup_interval": 3600, # 1 hour
"compression": "gzip"
}Index Building
# Indexing behavior
INDEXING_CONFIG = {
"auto_index": True, # Auto-index new files
"watch_directories": ["/app/runbooks"],
"file_patterns": ["*.md", "*.txt", "*.rst"],
"exclude_patterns": [".*", "__pycache__", "*.tmp"],
"incremental": True, # Only re-index changed files
"parallel_processing": True,
"batch_size": 50
}🌍 Environment Variables Reference
Required Variables
# OpenAI (Required)
OPENAI_API_KEY=sk-your-openai-api-key-here
# Optional but recommended
DD_API_KEY=your-datadog-api-keyModel Configuration
# Embedding model
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
# Generation model
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1Chunking & Retrieval
# Chunking
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
# Retrieval
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000Performance
# Server
PORT=8080
WORKERS=1
TIMEOUT=300
# Caching
CACHE_ENABLED=true
CACHE_TTL=3600Paths
# File paths
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json
LOG_DIR=/app/logs📁 Configuration File Examples
Complete config.py
"""
DevOps RAG Configuration
Place this file at /app/config.py in your container
"""
import os
from typing import Dict, Any
# ==================== MODEL CONFIGURATION ====================
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-3-small")
GENERATION_MODEL = os.getenv("GENERATION_MODEL", "gpt-4o-mini")
OPENAI_CONFIG = {
"api_key": os.getenv("OPENAI_API_KEY"),
"max_retries": 3,
"timeout": 30,
"organization": os.getenv("OPENAI_ORG_ID"),
}
# ==================== CHUNKING CONFIGURATION ====================
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", "512"))
CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP", "64"))
MIN_CHUNK_SIZE = int(os.getenv("MIN_CHUNK_SIZE", "100"))
CHUNKING_CONFIG = {
"strategy": "semantic",
"chunk_size": CHUNK_SIZE,
"overlap": CHUNK_OVERLAP,
"separators": ["\n\n", "\n", ". ", " ", ""],
"keep_separator": True,
"respect_boundaries": True
}
# ==================== RETRIEVAL CONFIGURATION ====================
TOP_K = int(os.getenv("TOP_K", "5"))
SIMILARITY_THRESHOLD = float(os.getenv("SIMILARITY_THRESHOLD", "0.7"))
MAX_CONTEXT_LENGTH = int(os.getenv("MAX_CONTEXT_LENGTH", "8000"))
RETRIEVAL_CONFIG = {
"top_k": TOP_K,
"similarity_threshold": SIMILARITY_THRESHOLD,
"max_context_length": MAX_CONTEXT_LENGTH,
"rerank_results": True,
"diversity_penalty": 0.1
}
# ==================== PERFORMANCE CONFIGURATION ====================
CACHE_CONFIG = {
"enabled": os.getenv("CACHE_ENABLED", "true").lower() == "true",
"backend": "memory",
"ttl": int(os.getenv("CACHE_TTL", "3600")),
"max_size": 1000
}
ASYNC_CONFIG = {
"max_concurrent": 10,
"timeout": 30,
"batch_size": 100
}
# ==================== STORAGE CONFIGURATION ====================
INDEX_CONFIG = {
"backend": "json",
"file_path": os.getenv("INDEX_FILE", "/app/data/index.json"),
"auto_save": True,
"backup_enabled": True
}
# ==================== MONITORING CONFIGURATION ====================
DATADOG_CONFIG = {
"enabled": bool(os.getenv("DD_API_KEY")),
"service": os.getenv("DD_SERVICE", "devops-rag"),
"env": os.getenv("DD_ENV", "production"),
"version": os.getenv("DD_VERSION", "1.0.0"),
"trace_sample_rate": float(os.getenv("DD_TRACE_SAMPLE_RATE", "0.1"))
}
# ==================== PATHS ====================
RUNBOOKS_DIR = os.getenv("RUNBOOKS_DIR", "/app/runbooks")
DATA_DIR = os.getenv("DATA_DIR", "/app/data")
LOG_DIR = os.getenv("LOG_DIR", "/app/logs")
# ==================== VALIDATION ====================
def validate_config() -> Dict[str, Any]:
"""Validate configuration and return status"""
issues = []
if not OPENAI_CONFIG["api_key"]:
issues.append("OPENAI_API_KEY not set")
if CHUNK_SIZE < 50 or CHUNK_SIZE > 2000:
issues.append(f"CHUNK_SIZE {CHUNK_SIZE} out of range (50-2000)")
if TOP_K < 1 or TOP_K > 20:
issues.append(f"TOP_K {TOP_K} out of range (1-20)")
return {
"valid": len(issues) == 0,
"issues": issues,
"config": {
"embedding_model": EMBEDDING_MODEL,
"generation_model": GENERATION_MODEL,
"chunk_size": CHUNK_SIZE,
"top_k": TOP_K
}
}
if __name__ == "__main__":
result = validate_config()
print(f"Config valid: {result['valid']}")
if result['issues']:
print("Issues:", result['issues'])Docker Environment File
# .env file for Docker deployment
# ============ REQUIRED ============
OPENAI_API_KEY=sk-your-openai-api-key-here
# ============ DATADOG (OPTIONAL) ============
DD_API_KEY=your-datadog-api-key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
# ============ MODEL CONFIGURATION ============
EMBEDDING_MODEL=text-embedding-3-small
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1
# ============ CHUNKING ============
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
# ============ RETRIEVAL ============
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000
# ============ PERFORMANCE ============
PORT=8080
WORKERS=1
TIMEOUT=300
CACHE_ENABLED=true
CACHE_TTL=3600
# ============ PATHS ============
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json🔧 Troubleshooting
Configuration Issues
# Validate configuration
docker exec devops-rag python -c "
import config
result = config.validate_config()
print('Valid:', result['valid'])
if result['issues']:
print('Issues:', result['issues'])
"
# Check environment variables
docker exec devops-rag env | grep -E "(OPENAI|DD_|CHUNK|TOP_K)"Performance Tuning
Slow Queries (>2s):
- Reduce TOP_K from 5 to 3
- Use
text-embedding-3-small - Enable caching
High Memory Usage:
- Reduce CHUNK_SIZE to 256
- Lower MAX_CONTEXT_LENGTH
- Disable embedding caching
Poor Accuracy:
- Increase TOP_K to 8
- Lower SIMILARITY_THRESHOLD to 0.6
- Use
text-embedding-3-large
Index Issues
# Rebuild index with current config
docker exec devops-rag python cli.py ingest --force
# Check index statistics
docker exec devops-rag python cli.py stats
# Validate index integrity
docker exec devops-rag python -c "
import json
with open('/app/data/index.json') as f:
index = json.load(f)
print(f'Chunks: {len(index.get(\"chunks\", []))}')
print(f'Sources: {len(index.get(\"sources\", []))}')
"Next Steps:
- Tuning Guide - Performance optimization strategies
- Monitoring Guide - Observability and alerting
- Knowledge Base Guide - Content optimization