DevOps RAGAPI Reference

API Reference - DevOps RAG

Complete REST API documentation for the DevOps RAG system. All examples use the live Cloud Run deployment at https://devops-rag-449012790678.asia-southeast1.run.app.

Base URL

  • Production: https://devops-rag-449012790678.asia-southeast1.run.app
  • Local: http://localhost:8080

Authentication

No authentication required for the current version. The API is stateless and uses the server’s OpenAI API key for embeddings and generation.


Endpoints

GET /health

Health check endpoint with index status.

Response Schema

{
  "status": "ok" | "error",
  "index_ready": boolean,
  "total_chunks": integer,
  "total_sources": integer
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/health

Response:

{
  "status": "ok",
  "index_ready": true,
  "total_chunks": 45,
  "total_sources": 18
}

POST /ask

Main query endpoint. Retrieves relevant context from runbooks and generates an answer with citations.

Request Schema

{
  "question": string,        // Required: Your question
  "top_k": integer,         // Optional: Number of chunks to retrieve (default: 5)
  "verbose": boolean        // Optional: Include debug info (default: false)
}

Response Schema

{
  "answer": string,                    // Generated answer with markdown
  "sources": [string],                // List of source document names
  "citations": [                      // Detailed citation info
    {
      "source": string,               // Source document name
      "relevance": float,             // Cosine similarity score (0-1)
      "excerpt": string,              // Relevant text excerpt
      "chunk_id": string              // Unique chunk identifier
    }
  ],
  "context_chunks": integer,          // Number of chunks used
  "top_score": float,                // Highest relevance score
  "avg_score": float,                // Average relevance score
  "latency_ms": float                // Query processing time
}

Examples

Basic Query:

curl -X POST https://devops-rag-449012790678.asia-southeast1.run.app/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How do I fix CrashLoopBackOff in Kubernetes?"
  }'

Advanced Query with Options:

curl -X POST https://devops-rag-449012790678.asia-southeast1.run.app/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the best practices for database backup?",
    "top_k": 3,
    "verbose": true
  }'

Response Example:

{
  "answer": "## Database Backup Best Practices\n\n1. **Automated Scheduled Backups**:\n   - Set up cron jobs for regular backups\n   - Use database-specific tools (pg_dump for PostgreSQL, mysqldump for MySQL)\n   - Store backups in multiple locations\n\n2. **Test Backup Integrity**:\n   ```bash\n   # Test PostgreSQL backup\n   pg_restore --list backup.sql\n   ```\n\n3. **Retention Policy**:\n   - Keep daily backups for 30 days\n   - Keep weekly backups for 3 months\n   - Keep monthly backups for 1 year",
  "sources": ["05-database-operations.md"],
  "citations": [
    {
      "source": "05-database-operations.md",
      "relevance": 0.8756,
      "excerpt": "Database backup is critical for disaster recovery. Always test your backups by restoring to a test environment...",
      "chunk_id": "05-database-operations.md:2"
    },
    {
      "source": "05-database-operations.md", 
      "relevance": 0.8234,
      "excerpt": "Backup retention should follow the 3-2-1 rule: 3 copies of data, 2 different media types, 1 offsite...",
      "chunk_id": "05-database-operations.md:4"
    }
  ],
  "context_chunks": 3,
  "top_score": 0.8756,
  "avg_score": 0.8495,
  "latency_ms": 1247.8
}

GET /stats

Returns vector index statistics and metadata.

Response Schema

{
  "total_chunks": integer,           // Total indexed chunks
  "total_sources": integer,         // Number of source documents
  "sources": [string],              // List of all source files
  "embedding_model": string,        // Current embedding model
  "chunk_config": {                 // Chunking configuration
    "chunk_size": integer,
    "chunk_overlap": integer
  }
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/stats

Response:

{
  "total_chunks": 45,
  "total_sources": 18,
  "sources": [
    "01-kubernetes-troubleshooting.md",
    "02-deployment-rollback.md",
    "03-linux-server-maintenance.md",
    "04-monitoring-alerting.md",
    "05-database-operations.md"
  ],
  "embedding_model": "text-embedding-3-small",
  "chunk_config": {
    "chunk_size": 512,
    "chunk_overlap": 64
  }
}

GET /sources

Lists all ingested runbook sources.

Response Schema

{
  "sources": [string],              // Array of source document names
  "total": integer                  // Total count
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/sources

Response:

{
  "sources": [
    "01-kubernetes-troubleshooting.md",
    "02-deployment-rollback.md", 
    "03-linux-server-maintenance.md",
    "04-monitoring-alerting.md",
    "05-database-operations.md",
    "06-datadog-apm-troubleshooting.md",
    "07-aws-incident-response.md",
    "08-docker-container-ops.md",
    "09-terraform-infrastructure.md",
    "10-networking-dns-troubleshooting.md",
    "11-kubernetes-networking.md",
    "12-cicd-pipeline.md",
    "13-prometheus-grafana.md",
    "14-kubernetes-rbac-security.md",
    "15-datadog-infrastructure-monitoring.md",
    "16-elasticsearch-logging.md",
    "17-aws-eks-operations.md",
    "18-incident-management.md"
  ],
  "total": 18
}

GET /graph

Returns knowledge graph visualization data showing relationships between runbooks and concepts.

Response Schema

{
  "nodes": [
    {
      "id": string,                 // Unique node identifier
      "label": string,              // Display name
      "type": "document" | "concept",
      "size": integer              // Relative importance
    }
  ],
  "edges": [
    {
      "source": string,            // Source node ID
      "target": string,            // Target node ID  
      "weight": float              // Relationship strength (0-1)
    }
  ]
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/graph

GET /clusters

Returns topic clusters discovered from the knowledge base.

Response Schema

{
  "clusters": [
    {
      "id": integer,
      "name": string,              // Cluster theme/topic
      "documents": [string],       // Documents in this cluster
      "keywords": [string],        // Key terms
      "coherence_score": float     // Cluster quality (0-1)
    }
  ]
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/clusters

GET /patterns

Returns common operational patterns detected across runbooks.

Response Schema

{
  "patterns": [
    {
      "name": string,              // Pattern name
      "frequency": integer,        // How often it appears
      "documents": [string],       // Where it appears
      "description": string        // What it represents
    }
  ]
}

Example

curl https://devops-rag-449012790678.asia-southeast1.run.app/patterns

Error Responses

All endpoints return errors in this format:

{
  "detail": string                 // Error description
}

Common HTTP Status Codes

  • 200: Success
  • 400: Bad Request (invalid JSON, missing required fields)
  • 422: Validation Error (invalid parameter types)
  • 500: Internal Server Error (OpenAI API issues, index corruption)

Example Error

curl -X POST https://devops-rag-449012790678.asia-southeast1.run.app/ask \
  -H "Content-Type: application/json" \
  -d '{}'

Response (400):

{
  "detail": [
    {
      "type": "missing",
      "loc": ["body", "question"],
      "msg": "Field required"
    }
  ]
}

Rate Limits

No explicit rate limits are enforced, but queries are subject to:

  • OpenAI API rate limits (varies by plan)
  • Cloud Run concurrent request limits (1000 by default)
  • Average query latency: ~500ms

OpenAPI Specification

Interactive API documentation is available at:


Client Examples

Python

import requests
 
def ask_devops_rag(question, top_k=5):
    response = requests.post(
        "https://devops-rag-449012790678.asia-southeast1.run.app/ask",
        json={"question": question, "top_k": top_k}
    )
    return response.json()
 
# Usage
result = ask_devops_rag("How do I scale a Kubernetes deployment?")
print(result["answer"])

JavaScript

async function askDevOpsRAG(question, topK = 5) {
  const response = await fetch(
    'https://devops-rag-449012790678.asia-southeast1.run.app/ask',
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ question, top_k: topK })
    }
  );
  return response.json();
}
 
// Usage
const result = await askDevOpsRAG('How do I troubleshoot high CPU usage?');
console.log(result.answer);

cURL

#!/bin/bash
ask_rag() {
  curl -s -X POST https://devops-rag-449012790678.asia-southeast1.run.app/ask \
    -H "Content-Type: application/json" \
    -d "{\"question\": \"$1\"}" | jq -r '.answer'
}
 
# Usage
ask_rag "How do I check disk usage on Linux?"