Configuration Guide

Comprehensive guide for configuring RAKṢĀ scanner settings, custom rules, and runtime behavior.

Environment Variables

Core Settings

VariableDefaultDescription
PORT8080HTTP server port
SCAN_UPLOAD_DIR/tmp/raksha-scansTemporary scan directory
MAX_UPLOAD_MB50Maximum upload file size (MB)
WORKERS1Number of uvicorn workers

Scanner Configuration

VariableDefaultDescription
ENABLE_SEMGREPautoEnable Semgrep scanner (auto, true, false)
ENABLE_BANDITautoEnable Bandit scanner (auto, true, false)
ENABLE_PATTERNStrueEnable built-in pattern scanner
SCAN_TIMEOUT300Maximum scan duration (seconds)

Datadog APM

VariableDefaultDescription
DD_API_KEY-Datadog API key for APM
DD_SERVICEavyay-rakshaService name in Datadog
DD_ENVproductionEnvironment tag
DD_VERSION1.0.0Version tag
DD_AGENT_HOSTlocalhostDatadog Agent hostname

Advanced Settings

VariableDefaultDescription
LOG_LEVELINFOLogging level (DEBUG, INFO, WARN, ERROR)
CORS_ORIGINS["*"]Allowed CORS origins (JSON array)
RESULT_TTL_HOURS24How long to keep scan results
GITHUB_TOKEN-GitHub token for private repository access

Configuration File

Create /app/config.yaml for complex configurations:

# RAKṢĀ Configuration
scanner:
  engines:
    patterns: 
      enabled: true
      severity_threshold: "medium"
    semgrep:
      enabled: true
      config: "auto"  # or path to custom config
      timeout: 120
    bandit:
      enabled: true
      config_file: "/app/rules/bandit.yaml"
      severity_threshold: "low"
  
  file_filters:
    max_size_mb: 10
    include_patterns:
      - "*.py"
      - "*.js" 
      - "*.ts"
      - "*.java"
      - "*.go"
      - "*.php"
      - "*.rb"
      - "*.cpp"
      - "*.c"
    exclude_patterns:
      - "*/node_modules/*"
      - "*/venv/*"
      - "*/vendor/*"
      - "*.min.js"
      - "*.test.*"
      - "*/tests/*"
    
  custom_rules:
    directory: "/app/rules"
    auto_reload: true
 
server:
  max_concurrent_scans: 10
  result_cache_size: 1000
  temp_cleanup_interval: 3600  # seconds
 
logging:
  level: "INFO"
  format: "json"
  datadog: true
 
security:
  rate_limit:
    requests_per_minute: 60
    burst_size: 10
  upload:
    allowed_extensions: [".zip", ".tar", ".tar.gz", ".tgz"]
    virus_scanning: false  # Enable with ClamAV

Load configuration:

docker run -v /path/to/config.yaml:/app/config.yaml ghcr.io/gaurav21/raksha:latest

Custom Security Rules

Pattern-Based Rules

Create custom YAML rule files in /app/rules/:

/app/rules/custom-patterns.yaml

# Custom Security Patterns for RAKṢĀ
patterns:
  - id: "hardcoded-aws-keys"
    title: "Hardcoded AWS Credentials"
    description: "AWS access keys or secret keys found in source code"
    severity: "critical"
    pattern: "(AKIA[0-9A-Z]{16}|aws_secret_access_key\\s*=\\s*['\"][0-9a-zA-Z/+=]{40}['\"])"
    confidence: "high"
    cwe: "CWE-798"
    languages: ["*"]
    remediation: |
      Move AWS credentials to environment variables or AWS IAM roles:
      - Use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
      - For EC2/Lambda, use IAM roles instead of hardcoded keys
      - Consider AWS Secrets Manager for secure credential storage
    references:
      - "https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html"
 
  - id: "sql-injection-concat"
    title: "SQL Injection via String Concatenation"
    description: "SQL query constructed using string concatenation"
    severity: "high"
    pattern: "(query|sql)\\s*[+]=?\\s*['\"].+['\"]\\s*\\+\\s*(\\w+|\\$\\{\\w+\\})"
    confidence: "medium"
    cwe: "CWE-89"
    languages: ["java", "javascript", "python", "php"]
    remediation: |
      Use parameterized queries instead of string concatenation:
      - Java: PreparedStatement with placeholders
      - Python: Use SQLAlchemy or parameterized queries
      - PHP: PDO prepared statements
      - JavaScript: Use query builders like Knex.js
 
  - id: "weak-crypto-md5"
    title: "Weak Cryptographic Hash (MD5)"
    description: "MD5 hash function usage detected"
    severity: "medium"
    pattern: "(md5|MD5)\\s*\\("
    confidence: "medium"
    cwe: "CWE-327"
    languages: ["*"]
    remediation: |
      Replace MD5 with stronger hash functions:
      - Use SHA-256 or SHA-3 for data integrity
      - Use bcrypt, scrypt, or Argon2 for password hashing
      - Consider HMAC for message authentication
 
  - id: "debug-mode-enabled"
    title: "Debug Mode Enabled in Production"
    description: "Application debug mode appears to be enabled"
    severity: "low"
    pattern: "(DEBUG\\s*=\\s*[Tt]rue|debug\\s*:\\s*true|--debug|development\\s*mode)"
    confidence: "low"
    cwe: "CWE-489"
    languages: ["*"]
    remediation: |
      Disable debug mode in production:
      - Set DEBUG=False in Django settings
      - Use NODE_ENV=production for Node.js
      - Remove --debug flags from production commands
 
  - id: "jwt-weak-secret"
    title: "Weak JWT Secret"
    description: "JWT signed with weak or default secret"
    severity: "high"
    pattern: "(jwt|JWT).*(secret|key)\\s*[=:]\\s*['\"]?(secret|key|password|123|abc)['\"]?"
    confidence: "medium"
    cwe: "CWE-798"
    languages: ["*"]
    remediation: |
      Use strong, unique JWT secrets:
      - Generate cryptographically secure random secrets (256+ bits)
      - Store secrets in environment variables or secure vaults
      - Rotate secrets regularly
      - Consider using RS256 (public key crypto) instead of HS256
 
file_types:
  python: [".py"]
  javascript: [".js", ".ts", ".jsx", ".tsx"]
  java: [".java"]
  php: [".php", ".phtml"]
  go: [".go"]
  ruby: [".rb"]
  cpp: [".cpp", ".cc", ".cxx", ".hpp", ".h"]
  csharp: [".cs"]
  all: ["*"]

Organization-Specific Rules

# /app/rules/company-security.yaml
patterns:
  - id: "company-api-key-leak"
    title: "Company API Key Exposed"
    description: "Internal API key pattern detected"
    severity: "critical"
    pattern: "(COMPANY_API_[A-Z0-9_]{20,}|cp_[a-f0-9]{32})"
    confidence: "high"
    cwe: "CWE-798"
    
  - id: "deprecated-crypto-lib"
    title: "Deprecated Cryptography Library"
    description: "Usage of deprecated crypto library detected"
    severity: "medium"  
    pattern: "(import\\s+pycrypto|from\\s+Crypto|require\\(['\"]crypto['\"])"
    confidence: "medium"
    cwe: "CWE-327"
    remediation: |
      Replace deprecated crypto libraries:
      - Python: Use cryptography library instead of pycrypto
      - Node.js: Use built-in crypto module carefully
      - Consider higher-level libraries like libsodium
 
  - id: "admin-interface-exposed"
    title: "Admin Interface Potentially Exposed"
    description: "Admin URLs or interfaces found in client-side code"
    severity: "medium"
    pattern: "(/admin|/administrator|/wp-admin|\\.admin\\.|admin_panel)"
    confidence: "low"
    languages: ["javascript", "html"]
    remediation: |
      Secure admin interfaces:
      - Use separate subdomains for admin interfaces
      - Implement IP whitelisting
      - Add additional authentication layers
      - Avoid exposing admin URLs in client-side code

Semgrep Custom Rules

Create /app/rules/semgrep-custom.yaml:

rules:
  - id: company-specific-sql-injection
    pattern-either:
      - pattern: |
          $QUERY = "..." + $VAR + "..."
          $DB.execute($QUERY)
      - pattern: |
          $DB.query(f"...{$VAR}...")
    message: "SQL injection vulnerability via string formatting"
    severity: ERROR
    languages: [python]
    metadata:
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"
      confidence: MEDIUM
 
  - id: hardcoded-encryption-key
    patterns:
      - pattern-inside: |
          class $CLASS:
              ...
      - pattern: $KEY = "..."
      - pattern-not: $KEY = ""
      - metavariable-pattern:
          metavariable: $KEY
          patterns:
            - pattern-regex: ".*(key|secret|password|token).*"
    message: "Hardcoded encryption key detected"
    severity: ERROR
    languages: [python, java, javascript]
 
  - id: unsafe-redirect
    pattern-either:
      - pattern: redirect($URL)
      - pattern: window.location = $URL
      - pattern: res.redirect($URL)
    message: "Potential open redirect vulnerability"
    severity: WARNING
    languages: [python, javascript]
    metadata:
      cwe: "CWE-601"

Scanner-Specific Configuration

Semgrep Configuration

Create /app/rules/semgrep.yml:

# Semgrep configuration
rules:
  - security
  - owasp-top-ten
  - cwe-top-25
  
exclude:
  - "*.test.*"
  - "test_*"
  - "tests/"
  - "node_modules/"
  - ".git/"
 
max_target_bytes: 1000000  # 1MB per file
timeout: 30  # seconds per rule
 
# Custom rule paths
include:
  - /app/rules/semgrep-custom.yaml

Bandit Configuration

Create /app/rules/bandit.yaml:

# Bandit configuration for Python security scanning
tests: 
  # Enable all tests except specific ones
  - B101  # assert_used
  - B102  # exec_used
  - B103  # set_bad_file_permissions
  - B104  # hardcoded_bind_all_interfaces
  - B105  # hardcoded_password_string
  - B106  # hardcoded_password_funcarg
  - B107  # hardcoded_password_default
  - B108  # hardcoded_tmp_directory
  - B110  # try_except_pass
  - B112  # try_except_continue
  - B201  # flask_debug_true
  - B301  # pickle
  - B302  # marshal
  - B303  # md5
  - B304  # des
  - B305  # cipher
  - B306  # random
  - B307  # eval
  - B308  # mark_safe
  - B309  # httpsconnection
  - B310  # urllib_urlopen
  - B311  # random
  - B312  # telnetlib
  - B313  # xml_bad_cElementTree
  - B314  # xml_bad_ElementTree
  - B315  # xml_bad_expatreader
  - B316  # xml_bad_expatbuilder
  - B317  # xml_bad_sax
  - B318  # xml_bad_minidom
  - B319  # xml_bad_pulldom
  - B320  # xml_bad_etree
  - B321  # ftplib
  - B322  # input
  - B323  # unverified_context
  - B324  # hashlib_new_insecure_functions
  - B325  # tempnam
  - B401  # import_telnetlib
  - B402  # import_ftplib
  - B403  # import_pickle
  - B404  # import_subprocess
  - B405  # import_xml_etree
  - B406  # import_xml_sax
  - B407  # import_xml_expat
  - B408  # import_xml_minidom
  - B409  # import_xml_pulldom
  - B410  # import_lxml
  - B411  # import_xmlrpclib
  - B412  # import_httpoxy
  - B413  # import_pycrypto
  - B501  # request_with_no_cert_validation
  - B502  # ssl_with_bad_version
  - B503  # ssl_with_bad_defaults
  - B504  # ssl_with_no_version
  - B505  # weak_cryptographic_key
  - B506  # yaml_load
  - B507  # ssh_no_host_key_verification
  - B601  # paramiko_calls
  - B602  # subprocess_popen_with_shell_equals_true
  - B603  # subprocess_without_shell_equals_false
  - B604  # any_other_function_with_shell_equals_true
  - B605  # start_process_with_a_shell
  - B606  # start_process_with_no_shell
  - B607  # start_process_with_partial_path
  - B608  # hardcoded_sql_expressions
  - B609  # linux_commands_wildcard_injection
  - B610  # django_extra_used
  - B611  # django_rawsql_used
 
skips:
  - "*/tests/*"
  - "*/test_*.py"
  - "*_test.py"
  - "*/conftest.py"
 
# Exclude certain directories
exclude_dirs:
  - '/tests'
  - '/test'
  - '/.venv'
  - '/venv'
  - '/env'
  - '/.git'
  - '/node_modules'
  - '/__pycache__'
  - '/.pytest_cache'
 
# Severity levels
severity:
  LOW: 0
  MEDIUM: 1  
  HIGH: 2
  CRITICAL: 3
 
# Report format
formatter: json

Runtime Configuration

Docker Environment

# Development configuration
docker run -d \
  --name raksha-dev \
  -p 8430:8080 \
  -e LOG_LEVEL=DEBUG \
  -e ENABLE_SEMGREP=true \
  -e ENABLE_BANDIT=true \
  -e MAX_UPLOAD_MB=20 \
  -v $(pwd)/rules:/app/rules:ro \
  -v $(pwd)/config.yaml:/app/config.yaml:ro \
  ghcr.io/gaurav21/raksha:latest
 
# Production configuration
docker run -d \
  --name raksha-prod \
  -p 80:8080 \
  --restart=unless-stopped \
  -e LOG_LEVEL=INFO \
  -e SCAN_TIMEOUT=600 \
  -e RESULT_TTL_HOURS=48 \
  -e DD_API_KEY=${DD_API_KEY} \
  -e DD_ENV=production \
  -v /opt/raksha/rules:/app/rules:ro \
  -v /var/tmp/raksha:/tmp/raksha-scans \
  ghcr.io/gaurav21/raksha:latest

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: raksha-config
data:
  config.yaml: |
    scanner:
      engines:
        patterns:
          enabled: true
          severity_threshold: "medium"
        semgrep:
          enabled: true
          timeout: 120
        bandit:
          enabled: true
          severity_threshold: "low"
      file_filters:
        max_size_mb: 10
        exclude_patterns:
          - "*/node_modules/*"
          - "*/venv/*"
          - "*/vendor/*"
    server:
      max_concurrent_scans: 5
      result_cache_size: 500
    logging:
      level: "INFO"
      format: "json"
---
apiVersion: v1
kind: ConfigMap  
metadata:
  name: raksha-rules
data:
  custom-patterns.yaml: |
    patterns:
      - id: "org-api-key"
        title: "Organization API Key"
        severity: "critical"
        pattern: "ORG_API_[A-Z0-9]{32}"
        # ... rest of rule

Mount in deployment:

spec:
  containers:
  - name: raksha
    volumeMounts:
    - name: config
      mountPath: /app/config.yaml
      subPath: config.yaml
    - name: rules
      mountPath: /app/rules
  volumes:
  - name: config
    configMap:
      name: raksha-config
  - name: rules
    configMap:
      name: raksha-rules

Severity Mapping

Custom Severity Levels

# /app/config/severity-mapping.yaml
severity_mapping:
  patterns:
    # Override default severities
    "hardcoded-password": "critical"
    "sql-injection": "critical"
    "xss-vulnerability": "high"
    "weak-crypto": "medium"
    "code-quality": "low"
    
  cwe_mapping:
    # Map CWE codes to severities
    "CWE-89": "critical"   # SQL Injection
    "CWE-79": "high"       # XSS
    "CWE-798": "critical"  # Hardcoded Credentials
    "CWE-327": "medium"    # Weak Crypto
    "CWE-22": "high"       # Path Traversal
    
  scanner_weights:
    # Weight findings by scanner confidence
    semgrep: 1.0
    bandit: 0.9
    patterns: 0.8
    
thresholds:
  fail_build:
    critical: 1    # Fail if any critical issues
    high: 3        # Fail if 3+ high issues
    total: 10      # Fail if 10+ total issues
    
  notification:
    critical: 1    # Alert on any critical
    high: 1        # Alert on any high

Performance Tuning

Scan Optimization

# /app/config/performance.yaml
scanning:
  parallel_files: 4          # Files to scan concurrently
  max_file_size_mb: 50       # Skip files larger than this
  timeout_per_file: 30       # Seconds
  memory_limit_mb: 2048      # Scanner memory limit
  
  file_type_limits:
    javascript: 100          # Max JS files to scan
    python: 200              # Max Python files
    "*": 500                 # Max total files
    
caching:
  enable_file_hash_cache: true
  cache_duration_hours: 24
  max_cache_size_mb: 1024
  
threading:
  scanner_workers: 2         # Parallel scanner processes
  io_workers: 4              # File I/O workers

Resource Limits

# Docker Compose with resource limits
version: '3.8'
services:
  raksha:
    image: ghcr.io/gaurav21/raksha:latest
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '0.5'
          memory: 1G
    environment:
      - SCAN_TIMEOUT=300
      - MAX_CONCURRENT_SCANS=3
    ulimits:
      nofile: 4096
      nproc: 2048

Monitoring Configuration

Logging Configuration

# /app/config/logging.yaml
logging:
  version: 1
  disable_existing_loggers: false
  
  formatters:
    json:
      class: pythonjsonlogger.jsonlogger.JsonFormatter
      format: '%(asctime)s %(name)s %(levelname)s %(message)s'
    
    detailed:
      format: '%(asctime)s - %(name)s - %(levelname)s - %(filename)s:%(lineno)d - %(message)s'
  
  handlers:
    console:
      class: logging.StreamHandler
      level: INFO
      formatter: json
      stream: ext://sys.stdout
    
    file:
      class: logging.handlers.RotatingFileHandler
      level: DEBUG
      formatter: detailed
      filename: /var/log/raksha/app.log
      maxBytes: 10485760  # 10MB
      backupCount: 3
      
    datadog:
      class: datadog.DogStatsdLogHandler
      level: WARNING
      
  loggers:
    raksha:
      level: DEBUG
      handlers: [console, file, datadog]
      propagate: false
      
    uvicorn:
      level: INFO
      handlers: [console]
      propagate: false
      
  root:
    level: INFO
    handlers: [console]

Metrics Configuration

# Custom metrics for monitoring
import time
from datadog import statsd
 
# Track scan metrics
@statsd.timed('raksha.scan.duration')
def run_scan(directory):
    start_time = time.time()
    try:
        result = perform_scan(directory)
        statsd.increment('raksha.scan.success')
        statsd.histogram('raksha.scan.files', result.total_files)
        statsd.histogram('raksha.scan.findings', result.total_findings)
        return result
    except Exception as e:
        statsd.increment('raksha.scan.error')
        raise
    finally:
        statsd.histogram('raksha.scan.duration', time.time() - start_time)

Next: CI/CD Integration Guide for automating security scanning in your development pipeline.