Configuration Guide - DevOps RAG

Complete configuration reference for tuning embedding models, chunk parameters, retrieval settings, and system behavior.

📋 Configuration Overview

DevOps RAG can be configured through environment variables, configuration files, or runtime parameters. This guide covers all available options and their impact on performance and accuracy.

🤖 Model Configuration

Embedding Models

Model	Dimensions	Cost	Use Case
`text-embedding-3-small`	1536	$0.02/1M tokens	Recommended - Best balance
`text-embedding-3-large`	3072	$0.13/1M tokens	High accuracy, complex domains
`text-embedding-ada-002`	1536	$0.10/1M tokens	Legacy, not recommended

Configuration:

# config.py
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_BATCH_SIZE = 100

Environment Variable:

EMBEDDING_MODEL=text-embedding-3-small

Generation Models

Model	Cost	Latency	Quality	Use Case
`gpt-4o-mini`	$0.15/1M input	~500ms	High	Default - Fast & accurate
`gpt-4o`	$5.00/1M input	~1500ms	Highest	Complex reasoning tasks
`gpt-3.5-turbo`	$0.50/1M input	~300ms	Good	Budget option

Configuration:

GENERATION_MODEL = "gpt-4o-mini"
MAX_TOKENS = 1000
TEMPERATURE = 0.1

📄 Chunking Strategy

Chunking parameters directly impact retrieval quality. Based on our tuning report, these are the optimized settings.

Chunk Size

Size	Chunks	Precision	Recall	Use Case
512	45	High	Perfect	Recommended
256	83	Highest	Good	High precision needs
768	36	Medium	Good	Longer context
1024	26	Lower	Good	Minimal chunks

Configuration:

CHUNK_SIZE = 512            # Tokens per chunk
CHUNK_OVERLAP = 64         # Overlap between chunks
MIN_CHUNK_SIZE = 100       # Minimum viable chunk

Chunk Overlap

Overlap	Rank-1 Accuracy	Index Size	Use Case
64	100%	+13%	Recommended
0	90%	Baseline	Storage constrained
128	100%	+20%	Maximum accuracy

Chunking Process

# Advanced chunking configuration
CHUNKING_CONFIG = {
    "strategy": "semantic",           # semantic, fixed, sliding
    "chunk_size": 512,
    "overlap": 64,
    "separators": ["\n\n", "\n", ". ", " "],
    "keep_separator": True,
    "respect_word_boundaries": True,
    "min_chunk_size": 100,
    "max_chunk_size": 800
}

🔍 Retrieval Configuration

Top-K Settings

Top-K	Context Window	Latency	Quality	Cost
3	~1,500 tokens	Fast	Good	Low
5	~2,500 tokens	Medium	Best	Medium
8	~4,000 tokens	Slower	Diminishing	High

Configuration:

# Retrieval settings
TOP_K = 5                           # Number of chunks to retrieve
SIMILARITY_THRESHOLD = 0.7          # Minimum cosine similarity
MAX_CONTEXT_LENGTH = 8000          # Max tokens in generation context
RERANK_TOP_K = True                # Re-rank results by relevance

Similarity Thresholds

# Quality gates
SIMILARITY_THRESHOLDS = {
    "high_confidence": 0.85,        # Show with high confidence
    "medium_confidence": 0.75,      # Show with caution
    "low_confidence": 0.65,         # Show as fallback
    "min_threshold": 0.5            # Minimum to include
}

Advanced Retrieval

# Hybrid search (if enabled)
HYBRID_SEARCH = {
    "enabled": False,               # Enable BM25 + vector search
    "bm25_weight": 0.3,            # BM25 contribution (0-1)
    "vector_weight": 0.7,          # Vector contribution (0-1)
    "normalize_scores": True
}
 
# Metadata filtering
METADATA_FILTERS = {
    "enabled": True,
    "fields": ["source", "category", "severity", "team"],
    "default_filters": {}
}

🚀 Performance Configuration

Server Settings

# FastAPI/Uvicorn configuration
SERVER_CONFIG = {
    "host": "0.0.0.0",
    "port": 8080,
    "workers": 1,                   # Single worker for consistency
    "timeout_keep_alive": 30,
    "timeout_graceful_shutdown": 30,
    "max_requests": 1000,
    "max_requests_jitter": 50
}

Caching

# Response caching
CACHE_CONFIG = {
    "enabled": True,
    "backend": "memory",            # memory, redis, file
    "ttl": 3600,                   # 1 hour
    "max_size": 1000,              # Max cached responses
    "cache_embeddings": True,       # Cache query embeddings
    "cache_generations": True       # Cache generated responses
}

Concurrency

# Async configuration
ASYNC_CONFIG = {
    "embedding_batch_size": 100,    # Batch embed requests
    "max_concurrent_requests": 10,  # Parallel OpenAI calls
    "request_timeout": 30,          # OpenAI timeout
    "retry_attempts": 3,
    "retry_delay": 1.0
}

📊 Monitoring Configuration

Datadog Integration

# Datadog configuration
DATADOG_CONFIG = {
    "enabled": True,
    "service": "devops-rag",
    "env": "production",
    "version": "1.0.0",
    "trace_sample_rate": 0.1,       # 10% sampling
    "log_level": "INFO",
    "custom_metrics": {
        "query_latency": True,
        "similarity_scores": True,
        "token_usage": True,
        "error_rates": True
    }
}

Environment Variables:

# Datadog
DD_API_KEY=your_datadog_api_key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
DD_TRACE_SAMPLE_RATE=0.1
DD_LOGS_ENABLED=true
 
# Custom metrics
DD_RUNTIME_METRICS_ENABLED=true
DD_PROFILING_ENABLED=true

Logging Configuration

# Logging setup
LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "default": {
            "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        },
        "json": {
            "format": "%(timestamp)s %(level)s %(name)s %(message)s",
            "class": "pythonjsonlogger.jsonlogger.JsonFormatter"
        }
    },
    "handlers": {
        "console": {
            "level": "INFO",
            "class": "logging.StreamHandler",
            "formatter": "json"
        },
        "file": {
            "level": "DEBUG", 
            "class": "logging.handlers.RotatingFileHandler",
            "filename": "/app/logs/devops-rag.log",
            "maxBytes": 10485760,
            "backupCount": 5,
            "formatter": "json"
        }
    },
    "root": {
        "level": "INFO",
        "handlers": ["console", "file"]
    }
}

🔧 Index Configuration

Storage Backend

# Vector index storage
INDEX_CONFIG = {
    "backend": "json",              # json, postgres, pinecone
    "file_path": "/app/data/index.json",
    "auto_save": True,
    "save_interval": 300,           # 5 minutes
    "backup_enabled": True,
    "backup_interval": 3600,        # 1 hour
    "compression": "gzip"
}

Index Building

# Indexing behavior
INDEXING_CONFIG = {
    "auto_index": True,             # Auto-index new files
    "watch_directories": ["/app/runbooks"],
    "file_patterns": ["*.md", "*.txt", "*.rst"],
    "exclude_patterns": [".*", "__pycache__", "*.tmp"],
    "incremental": True,            # Only re-index changed files
    "parallel_processing": True,
    "batch_size": 50
}

🌍 Environment Variables Reference

Required Variables

# OpenAI (Required)
OPENAI_API_KEY=sk-your-openai-api-key-here
 
# Optional but recommended
DD_API_KEY=your-datadog-api-key

Model Configuration

# Embedding model
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
 
# Generation model  
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1

Chunking & Retrieval

# Chunking
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
 
# Retrieval
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000

Performance

# Server
PORT=8080
WORKERS=1
TIMEOUT=300
 
# Caching
CACHE_ENABLED=true
CACHE_TTL=3600

Paths

# File paths
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json
LOG_DIR=/app/logs

📁 Configuration File Examples

Complete config.py

"""
DevOps RAG Configuration
Place this file at /app/config.py in your container
"""
 
import os
from typing import Dict, Any
 
# ==================== MODEL CONFIGURATION ====================
 
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-3-small")
GENERATION_MODEL = os.getenv("GENERATION_MODEL", "gpt-4o-mini")
 
OPENAI_CONFIG = {
    "api_key": os.getenv("OPENAI_API_KEY"),
    "max_retries": 3,
    "timeout": 30,
    "organization": os.getenv("OPENAI_ORG_ID"),
}
 
# ==================== CHUNKING CONFIGURATION ====================
 
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", "512"))
CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP", "64"))
MIN_CHUNK_SIZE = int(os.getenv("MIN_CHUNK_SIZE", "100"))
 
CHUNKING_CONFIG = {
    "strategy": "semantic",
    "chunk_size": CHUNK_SIZE,
    "overlap": CHUNK_OVERLAP,
    "separators": ["\n\n", "\n", ". ", " ", ""],
    "keep_separator": True,
    "respect_boundaries": True
}
 
# ==================== RETRIEVAL CONFIGURATION ====================
 
TOP_K = int(os.getenv("TOP_K", "5"))
SIMILARITY_THRESHOLD = float(os.getenv("SIMILARITY_THRESHOLD", "0.7"))
MAX_CONTEXT_LENGTH = int(os.getenv("MAX_CONTEXT_LENGTH", "8000"))
 
RETRIEVAL_CONFIG = {
    "top_k": TOP_K,
    "similarity_threshold": SIMILARITY_THRESHOLD,
    "max_context_length": MAX_CONTEXT_LENGTH,
    "rerank_results": True,
    "diversity_penalty": 0.1
}
 
# ==================== PERFORMANCE CONFIGURATION ====================
 
CACHE_CONFIG = {
    "enabled": os.getenv("CACHE_ENABLED", "true").lower() == "true",
    "backend": "memory",
    "ttl": int(os.getenv("CACHE_TTL", "3600")),
    "max_size": 1000
}
 
ASYNC_CONFIG = {
    "max_concurrent": 10,
    "timeout": 30,
    "batch_size": 100
}
 
# ==================== STORAGE CONFIGURATION ====================
 
INDEX_CONFIG = {
    "backend": "json",
    "file_path": os.getenv("INDEX_FILE", "/app/data/index.json"),
    "auto_save": True,
    "backup_enabled": True
}
 
# ==================== MONITORING CONFIGURATION ====================
 
DATADOG_CONFIG = {
    "enabled": bool(os.getenv("DD_API_KEY")),
    "service": os.getenv("DD_SERVICE", "devops-rag"),
    "env": os.getenv("DD_ENV", "production"),
    "version": os.getenv("DD_VERSION", "1.0.0"),
    "trace_sample_rate": float(os.getenv("DD_TRACE_SAMPLE_RATE", "0.1"))
}
 
# ==================== PATHS ====================
 
RUNBOOKS_DIR = os.getenv("RUNBOOKS_DIR", "/app/runbooks")
DATA_DIR = os.getenv("DATA_DIR", "/app/data")
LOG_DIR = os.getenv("LOG_DIR", "/app/logs")
 
# ==================== VALIDATION ====================
 
def validate_config() -> Dict[str, Any]:
    """Validate configuration and return status"""
    issues = []
    
    if not OPENAI_CONFIG["api_key"]:
        issues.append("OPENAI_API_KEY not set")
    
    if CHUNK_SIZE < 50 or CHUNK_SIZE > 2000:
        issues.append(f"CHUNK_SIZE {CHUNK_SIZE} out of range (50-2000)")
        
    if TOP_K < 1 or TOP_K > 20:
        issues.append(f"TOP_K {TOP_K} out of range (1-20)")
    
    return {
        "valid": len(issues) == 0,
        "issues": issues,
        "config": {
            "embedding_model": EMBEDDING_MODEL,
            "generation_model": GENERATION_MODEL,
            "chunk_size": CHUNK_SIZE,
            "top_k": TOP_K
        }
    }
 
if __name__ == "__main__":
    result = validate_config()
    print(f"Config valid: {result['valid']}")
    if result['issues']:
        print("Issues:", result['issues'])

Docker Environment File

# .env file for Docker deployment
 
# ============ REQUIRED ============
OPENAI_API_KEY=sk-your-openai-api-key-here
 
# ============ DATADOG (OPTIONAL) ============
DD_API_KEY=your-datadog-api-key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
 
# ============ MODEL CONFIGURATION ============
EMBEDDING_MODEL=text-embedding-3-small
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1
 
# ============ CHUNKING ============
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
 
# ============ RETRIEVAL ============
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000
 
# ============ PERFORMANCE ============
PORT=8080
WORKERS=1
TIMEOUT=300
CACHE_ENABLED=true
CACHE_TTL=3600
 
# ============ PATHS ============
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json

🔧 Troubleshooting

Configuration Issues

# Validate configuration
docker exec devops-rag python -c "
import config
result = config.validate_config()
print('Valid:', result['valid'])
if result['issues']:
    print('Issues:', result['issues'])
"
 
# Check environment variables
docker exec devops-rag env | grep -E "(OPENAI|DD_|CHUNK|TOP_K)"

Performance Tuning

Slow Queries (>2s):

Reduce TOP_K from 5 to 3
Use text-embedding-3-small
Enable caching

High Memory Usage:

Reduce CHUNK_SIZE to 256
Lower MAX_CONTEXT_LENGTH
Disable embedding caching

Poor Accuracy:

Increase TOP_K to 8
Lower SIMILARITY_THRESHOLD to 0.6
Use text-embedding-3-large

Index Issues

# Rebuild index with current config
docker exec devops-rag python cli.py ingest --force
 
# Check index statistics
docker exec devops-rag python cli.py stats
 
# Validate index integrity
docker exec devops-rag python -c "
import json
with open('/app/data/index.json') as f:
    index = json.load(f)
    print(f'Chunks: {len(index.get(\"chunks\", []))}')
    print(f'Sources: {len(index.get(\"sources\", []))}')
"

Next Steps:

Tuning Guide - Performance optimization strategies
Monitoring Guide - Observability and alerting
Knowledge Base Guide - Content optimization

API Reference Deployment