MĀRGA — LLM RouterOverview

MĀRGA (मार्ग) Enterprise LLM Router

मार्ग (MĀRGA) - Sanskrit for “path” or “route”
Part of the अव्यय (Avyay) AI Platform

MĀRGA is an enterprise-grade LLM (Large Language Model) router that provides intelligent routing, failover, and load balancing across multiple AI providers. It offers an OpenAI-compatible API while seamlessly integrating with OpenAI, Anthropic, Ollama, and other LLM providers.

🏗️ Architecture

                           ┌─────────────────┐
                           │   Client Apps   │
                           └─────────────────┘

                              OpenAI-compatible API

┌─────────────────────────────────────────────────────────────┐
│                     MĀRGA Router                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │   Router    │  │  Metrics    │  │   Health Checks     │ │
│  │   Engine    │  │ Collector   │  │                     │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

            ┌──────────────────────────────────────────────────┐
            │                                                  │
            ▼                        ▼                        ▼
   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
   │     OpenAI      │    │    Anthropic    │    │     Ollama      │
   │   (GPT-4, etc)  │    │ (Claude, etc)   │    │ (Local Models)  │
   └─────────────────┘    └─────────────────┘    └─────────────────┘

✨ Key Features

🔄 Intelligent Routing

  • Multi-Provider Support: OpenAI, Anthropic, Ollama, Together AI
  • Model Mapping: Route gpt-4gpt-4o, claude-3claude-3-5-sonnet
  • Priority-Based Selection: Configure provider preference order
  • Load Balancing: Round-robin, weighted, least-connections

🛡️ Enterprise-Grade Reliability

  • Automatic Failover: Seamless provider switching on failures
  • Health Monitoring: Continuous provider health checks
  • Rate Limiting: Global and per-client request throttling
  • Circuit Breaker: Prevent cascading failures

📊 Advanced Monitoring

  • Prometheus Metrics: Request latency, token usage, error rates
  • Datadog Integration: Full observability and alerting
  • Provider Analytics: Performance comparison and cost optimization
  • Request Tracing: End-to-end request tracking

🔐 Security & Compliance

  • API Key Authentication: Secure endpoint access
  • CORS Support: Cross-origin resource sharing
  • Request Validation: Input sanitization and validation
  • Audit Logging: Complete request/response logging

💰 Cost Optimization

  • Smart Routing: Route to cheapest available provider
  • Usage Analytics: Track costs per model and provider
  • Budget Controls: Set spending limits and alerts
  • Token Optimization: Minimize unnecessary token usage

🚀 Quick Start

# Pull the image
docker pull ghcr.io/gaurav21/marga:latest
 
# Run with minimal config
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=your-key-here \
  -e ANTHROPIC_API_KEY=your-key-here \
  ghcr.io/gaurav21/marga:latest

Docker Compose

# Clone and setup
git clone https://github.com/gaurav21/avyay-marga
cd avyay-marga
cp .env.example .env
 
# Edit .env with your API keys
vi .env
 
# Start all services
docker-compose up -d

Test the API

curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello, MĀRGA!"}],
    "max_tokens": 100
  }'

📚 Documentation

🌍 Use Cases

1. Multi-Provider Failover

Ensure 99.9% uptime by automatically switching between OpenAI, Anthropic, and local models when providers experience outages.

2. Cost Optimization

Route requests to the most cost-effective provider based on your usage patterns and budget constraints.

3. A/B Testing

Compare model performance by routing traffic between different providers and analyzing response quality.

4. Data Compliance

Keep sensitive data local by routing to on-premises Ollama models while using cloud providers for general requests.

🏢 Enterprise Features

  • High Availability: Multi-region deployment with automatic failover
  • Scalability: Horizontal scaling with load balancer integration
  • Security: Enterprise SSO, audit logging, compliance reporting
  • Support: 24/7 technical support and SLA guarantees
  • Custom Integration: API customization and provider development

🔧 Requirements

  • Runtime: Go 1.21+ or Docker
  • Memory: 512MB minimum, 1GB recommended
  • CPU: 1 core minimum, 2+ cores recommended
  • Storage: 1GB for logs and metrics
  • Network: Outbound HTTPS access to provider APIs

📈 Performance

  • Latency: < 50ms routing overhead
  • Throughput: 10,000+ requests/minute per instance
  • Availability: 99.9% uptime SLA
  • Scaling: Linear scaling up to 100 instances

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

Licensed under the MIT License. See LICENSE for details.

🆘 Support


Made with ❤️ by the Avyay (अव्यय) team