DevOps RAGConfiguration

Configuration Guide - DevOps RAG

Complete configuration reference for tuning embedding models, chunk parameters, retrieval settings, and system behavior.

📋 Configuration Overview

DevOps RAG can be configured through environment variables, configuration files, or runtime parameters. This guide covers all available options and their impact on performance and accuracy.


🤖 Model Configuration

Embedding Models

ModelDimensionsCostUse Case
text-embedding-3-small1536$0.02/1M tokensRecommended - Best balance
text-embedding-3-large3072$0.13/1M tokensHigh accuracy, complex domains
text-embedding-ada-0021536$0.10/1M tokensLegacy, not recommended

Configuration:

# config.py
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_BATCH_SIZE = 100

Environment Variable:

EMBEDDING_MODEL=text-embedding-3-small

Generation Models

ModelCostLatencyQualityUse Case
gpt-4o-mini$0.15/1M input~500msHighDefault - Fast & accurate
gpt-4o$5.00/1M input~1500msHighestComplex reasoning tasks
gpt-3.5-turbo$0.50/1M input~300msGoodBudget option

Configuration:

GENERATION_MODEL = "gpt-4o-mini"
MAX_TOKENS = 1000
TEMPERATURE = 0.1

📄 Chunking Strategy

Chunking parameters directly impact retrieval quality. Based on our tuning report, these are the optimized settings.

Chunk Size

SizeChunksPrecisionRecallUse Case
51245HighPerfectRecommended
25683HighestGoodHigh precision needs
76836MediumGoodLonger context
102426LowerGoodMinimal chunks

Configuration:

CHUNK_SIZE = 512            # Tokens per chunk
CHUNK_OVERLAP = 64         # Overlap between chunks
MIN_CHUNK_SIZE = 100       # Minimum viable chunk

Chunk Overlap

OverlapRank-1 AccuracyIndex SizeUse Case
64100%+13%Recommended
090%BaselineStorage constrained
128100%+20%Maximum accuracy

Chunking Process

# Advanced chunking configuration
CHUNKING_CONFIG = {
    "strategy": "semantic",           # semantic, fixed, sliding
    "chunk_size": 512,
    "overlap": 64,
    "separators": ["\n\n", "\n", ". ", " "],
    "keep_separator": True,
    "respect_word_boundaries": True,
    "min_chunk_size": 100,
    "max_chunk_size": 800
}

🔍 Retrieval Configuration

Top-K Settings

Top-KContext WindowLatencyQualityCost
3~1,500 tokensFastGoodLow
5~2,500 tokensMediumBestMedium
8~4,000 tokensSlowerDiminishingHigh

Configuration:

# Retrieval settings
TOP_K = 5                           # Number of chunks to retrieve
SIMILARITY_THRESHOLD = 0.7          # Minimum cosine similarity
MAX_CONTEXT_LENGTH = 8000          # Max tokens in generation context
RERANK_TOP_K = True                # Re-rank results by relevance

Similarity Thresholds

# Quality gates
SIMILARITY_THRESHOLDS = {
    "high_confidence": 0.85,        # Show with high confidence
    "medium_confidence": 0.75,      # Show with caution
    "low_confidence": 0.65,         # Show as fallback
    "min_threshold": 0.5            # Minimum to include
}

Advanced Retrieval

# Hybrid search (if enabled)
HYBRID_SEARCH = {
    "enabled": False,               # Enable BM25 + vector search
    "bm25_weight": 0.3,            # BM25 contribution (0-1)
    "vector_weight": 0.7,          # Vector contribution (0-1)
    "normalize_scores": True
}
 
# Metadata filtering
METADATA_FILTERS = {
    "enabled": True,
    "fields": ["source", "category", "severity", "team"],
    "default_filters": {}
}

🚀 Performance Configuration

Server Settings

# FastAPI/Uvicorn configuration
SERVER_CONFIG = {
    "host": "0.0.0.0",
    "port": 8080,
    "workers": 1,                   # Single worker for consistency
    "timeout_keep_alive": 30,
    "timeout_graceful_shutdown": 30,
    "max_requests": 1000,
    "max_requests_jitter": 50
}

Caching

# Response caching
CACHE_CONFIG = {
    "enabled": True,
    "backend": "memory",            # memory, redis, file
    "ttl": 3600,                   # 1 hour
    "max_size": 1000,              # Max cached responses
    "cache_embeddings": True,       # Cache query embeddings
    "cache_generations": True       # Cache generated responses
}

Concurrency

# Async configuration
ASYNC_CONFIG = {
    "embedding_batch_size": 100,    # Batch embed requests
    "max_concurrent_requests": 10,  # Parallel OpenAI calls
    "request_timeout": 30,          # OpenAI timeout
    "retry_attempts": 3,
    "retry_delay": 1.0
}

📊 Monitoring Configuration

Datadog Integration

# Datadog configuration
DATADOG_CONFIG = {
    "enabled": True,
    "service": "devops-rag",
    "env": "production",
    "version": "1.0.0",
    "trace_sample_rate": 0.1,       # 10% sampling
    "log_level": "INFO",
    "custom_metrics": {
        "query_latency": True,
        "similarity_scores": True,
        "token_usage": True,
        "error_rates": True
    }
}

Environment Variables:

# Datadog
DD_API_KEY=your_datadog_api_key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
DD_TRACE_SAMPLE_RATE=0.1
DD_LOGS_ENABLED=true
 
# Custom metrics
DD_RUNTIME_METRICS_ENABLED=true
DD_PROFILING_ENABLED=true

Logging Configuration

# Logging setup
LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "default": {
            "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
        },
        "json": {
            "format": "%(timestamp)s %(level)s %(name)s %(message)s",
            "class": "pythonjsonlogger.jsonlogger.JsonFormatter"
        }
    },
    "handlers": {
        "console": {
            "level": "INFO",
            "class": "logging.StreamHandler",
            "formatter": "json"
        },
        "file": {
            "level": "DEBUG", 
            "class": "logging.handlers.RotatingFileHandler",
            "filename": "/app/logs/devops-rag.log",
            "maxBytes": 10485760,
            "backupCount": 5,
            "formatter": "json"
        }
    },
    "root": {
        "level": "INFO",
        "handlers": ["console", "file"]
    }
}

🔧 Index Configuration

Storage Backend

# Vector index storage
INDEX_CONFIG = {
    "backend": "json",              # json, postgres, pinecone
    "file_path": "/app/data/index.json",
    "auto_save": True,
    "save_interval": 300,           # 5 minutes
    "backup_enabled": True,
    "backup_interval": 3600,        # 1 hour
    "compression": "gzip"
}

Index Building

# Indexing behavior
INDEXING_CONFIG = {
    "auto_index": True,             # Auto-index new files
    "watch_directories": ["/app/runbooks"],
    "file_patterns": ["*.md", "*.txt", "*.rst"],
    "exclude_patterns": [".*", "__pycache__", "*.tmp"],
    "incremental": True,            # Only re-index changed files
    "parallel_processing": True,
    "batch_size": 50
}

🌍 Environment Variables Reference

Required Variables

# OpenAI (Required)
OPENAI_API_KEY=sk-your-openai-api-key-here
 
# Optional but recommended
DD_API_KEY=your-datadog-api-key

Model Configuration

# Embedding model
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
 
# Generation model  
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1

Chunking & Retrieval

# Chunking
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
 
# Retrieval
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000

Performance

# Server
PORT=8080
WORKERS=1
TIMEOUT=300
 
# Caching
CACHE_ENABLED=true
CACHE_TTL=3600

Paths

# File paths
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json
LOG_DIR=/app/logs

📁 Configuration File Examples

Complete config.py

"""
DevOps RAG Configuration
Place this file at /app/config.py in your container
"""
 
import os
from typing import Dict, Any
 
# ==================== MODEL CONFIGURATION ====================
 
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-3-small")
GENERATION_MODEL = os.getenv("GENERATION_MODEL", "gpt-4o-mini")
 
OPENAI_CONFIG = {
    "api_key": os.getenv("OPENAI_API_KEY"),
    "max_retries": 3,
    "timeout": 30,
    "organization": os.getenv("OPENAI_ORG_ID"),
}
 
# ==================== CHUNKING CONFIGURATION ====================
 
CHUNK_SIZE = int(os.getenv("CHUNK_SIZE", "512"))
CHUNK_OVERLAP = int(os.getenv("CHUNK_OVERLAP", "64"))
MIN_CHUNK_SIZE = int(os.getenv("MIN_CHUNK_SIZE", "100"))
 
CHUNKING_CONFIG = {
    "strategy": "semantic",
    "chunk_size": CHUNK_SIZE,
    "overlap": CHUNK_OVERLAP,
    "separators": ["\n\n", "\n", ". ", " ", ""],
    "keep_separator": True,
    "respect_boundaries": True
}
 
# ==================== RETRIEVAL CONFIGURATION ====================
 
TOP_K = int(os.getenv("TOP_K", "5"))
SIMILARITY_THRESHOLD = float(os.getenv("SIMILARITY_THRESHOLD", "0.7"))
MAX_CONTEXT_LENGTH = int(os.getenv("MAX_CONTEXT_LENGTH", "8000"))
 
RETRIEVAL_CONFIG = {
    "top_k": TOP_K,
    "similarity_threshold": SIMILARITY_THRESHOLD,
    "max_context_length": MAX_CONTEXT_LENGTH,
    "rerank_results": True,
    "diversity_penalty": 0.1
}
 
# ==================== PERFORMANCE CONFIGURATION ====================
 
CACHE_CONFIG = {
    "enabled": os.getenv("CACHE_ENABLED", "true").lower() == "true",
    "backend": "memory",
    "ttl": int(os.getenv("CACHE_TTL", "3600")),
    "max_size": 1000
}
 
ASYNC_CONFIG = {
    "max_concurrent": 10,
    "timeout": 30,
    "batch_size": 100
}
 
# ==================== STORAGE CONFIGURATION ====================
 
INDEX_CONFIG = {
    "backend": "json",
    "file_path": os.getenv("INDEX_FILE", "/app/data/index.json"),
    "auto_save": True,
    "backup_enabled": True
}
 
# ==================== MONITORING CONFIGURATION ====================
 
DATADOG_CONFIG = {
    "enabled": bool(os.getenv("DD_API_KEY")),
    "service": os.getenv("DD_SERVICE", "devops-rag"),
    "env": os.getenv("DD_ENV", "production"),
    "version": os.getenv("DD_VERSION", "1.0.0"),
    "trace_sample_rate": float(os.getenv("DD_TRACE_SAMPLE_RATE", "0.1"))
}
 
# ==================== PATHS ====================
 
RUNBOOKS_DIR = os.getenv("RUNBOOKS_DIR", "/app/runbooks")
DATA_DIR = os.getenv("DATA_DIR", "/app/data")
LOG_DIR = os.getenv("LOG_DIR", "/app/logs")
 
# ==================== VALIDATION ====================
 
def validate_config() -> Dict[str, Any]:
    """Validate configuration and return status"""
    issues = []
    
    if not OPENAI_CONFIG["api_key"]:
        issues.append("OPENAI_API_KEY not set")
    
    if CHUNK_SIZE < 50 or CHUNK_SIZE > 2000:
        issues.append(f"CHUNK_SIZE {CHUNK_SIZE} out of range (50-2000)")
        
    if TOP_K < 1 or TOP_K > 20:
        issues.append(f"TOP_K {TOP_K} out of range (1-20)")
    
    return {
        "valid": len(issues) == 0,
        "issues": issues,
        "config": {
            "embedding_model": EMBEDDING_MODEL,
            "generation_model": GENERATION_MODEL,
            "chunk_size": CHUNK_SIZE,
            "top_k": TOP_K
        }
    }
 
if __name__ == "__main__":
    result = validate_config()
    print(f"Config valid: {result['valid']}")
    if result['issues']:
        print("Issues:", result['issues'])

Docker Environment File

# .env file for Docker deployment
 
# ============ REQUIRED ============
OPENAI_API_KEY=sk-your-openai-api-key-here
 
# ============ DATADOG (OPTIONAL) ============
DD_API_KEY=your-datadog-api-key
DD_SITE=datadoghq.com
DD_SERVICE=devops-rag
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_ENABLED=true
 
# ============ MODEL CONFIGURATION ============
EMBEDDING_MODEL=text-embedding-3-small
GENERATION_MODEL=gpt-4o-mini
MAX_TOKENS=1000
TEMPERATURE=0.1
 
# ============ CHUNKING ============
CHUNK_SIZE=512
CHUNK_OVERLAP=64
MIN_CHUNK_SIZE=100
 
# ============ RETRIEVAL ============
TOP_K=5
SIMILARITY_THRESHOLD=0.7
MAX_CONTEXT_LENGTH=8000
 
# ============ PERFORMANCE ============
PORT=8080
WORKERS=1
TIMEOUT=300
CACHE_ENABLED=true
CACHE_TTL=3600
 
# ============ PATHS ============
RUNBOOKS_DIR=/app/runbooks
DATA_DIR=/app/data
INDEX_FILE=/app/data/index.json

🔧 Troubleshooting

Configuration Issues

# Validate configuration
docker exec devops-rag python -c "
import config
result = config.validate_config()
print('Valid:', result['valid'])
if result['issues']:
    print('Issues:', result['issues'])
"
 
# Check environment variables
docker exec devops-rag env | grep -E "(OPENAI|DD_|CHUNK|TOP_K)"

Performance Tuning

Slow Queries (>2s):

  • Reduce TOP_K from 5 to 3
  • Use text-embedding-3-small
  • Enable caching

High Memory Usage:

  • Reduce CHUNK_SIZE to 256
  • Lower MAX_CONTEXT_LENGTH
  • Disable embedding caching

Poor Accuracy:

  • Increase TOP_K to 8
  • Lower SIMILARITY_THRESHOLD to 0.6
  • Use text-embedding-3-large

Index Issues

# Rebuild index with current config
docker exec devops-rag python cli.py ingest --force
 
# Check index statistics
docker exec devops-rag python cli.py stats
 
# Validate index integrity
docker exec devops-rag python -c "
import json
with open('/app/data/index.json') as f:
    index = json.load(f)
    print(f'Chunks: {len(index.get(\"chunks\", []))}')
    print(f'Sources: {len(index.get(\"sources\", []))}')
"

Next Steps: