MĀRGA (मार्ग) Enterprise LLM Router

मार्ग (MĀRGA) - Sanskrit for “path” or “route”
Part of the अव्यय (Avyay) AI Platform

MĀRGA is an enterprise-grade LLM (Large Language Model) router that provides intelligent routing, failover, and load balancing across multiple AI providers. It offers an OpenAI-compatible API while seamlessly integrating with OpenAI, Anthropic, Ollama, and other LLM providers.

🏗️ Architecture

                           ┌─────────────────┐
                           │   Client Apps   │
                           └─────────────────┘
                                     │
                              OpenAI-compatible API
                                     │
┌─────────────────────────────────────────────────────────────┐
│                     MĀRGA Router                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │   Router    │  │  Metrics    │  │   Health Checks     │ │
│  │   Engine    │  │ Collector   │  │                     │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
                                     │
            ┌──────────────────────────────────────────────────┐
            │                                                  │
            ▼                        ▼                        ▼
   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
   │     OpenAI      │    │    Anthropic    │    │     Ollama      │
   │   (GPT-4, etc)  │    │ (Claude, etc)   │    │ (Local Models)  │
   └─────────────────┘    └─────────────────┘    └─────────────────┘

✨ Key Features

🔄 Intelligent Routing

Multi-Provider Support: OpenAI, Anthropic, Ollama, Together AI
Model Mapping: Route gpt-4 → gpt-4o, claude-3 → claude-3-5-sonnet
Priority-Based Selection: Configure provider preference order
Load Balancing: Round-robin, weighted, least-connections

🛡️ Enterprise-Grade Reliability

Automatic Failover: Seamless provider switching on failures
Health Monitoring: Continuous provider health checks
Rate Limiting: Global and per-client request throttling
Circuit Breaker: Prevent cascading failures

📊 Advanced Monitoring

Prometheus Metrics: Request latency, token usage, error rates
Datadog Integration: Full observability and alerting
Provider Analytics: Performance comparison and cost optimization
Request Tracing: End-to-end request tracking

🔐 Security & Compliance

API Key Authentication: Secure endpoint access
CORS Support: Cross-origin resource sharing
Request Validation: Input sanitization and validation
Audit Logging: Complete request/response logging

💰 Cost Optimization

Smart Routing: Route to cheapest available provider
Usage Analytics: Track costs per model and provider
Budget Controls: Set spending limits and alerts
Token Optimization: Minimize unnecessary token usage

🚀 Quick Start

Docker (Recommended)

# Pull the image
docker pull ghcr.io/gaurav21/marga:latest
 
# Run with minimal config
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=your-key-here \
  -e ANTHROPIC_API_KEY=your-key-here \
  ghcr.io/gaurav21/marga:latest

Docker Compose

# Clone and setup
git clone https://github.com/gaurav21/avyay-marga
cd avyay-marga
cp .env.example .env
 
# Edit .env with your API keys
vi .env
 
# Start all services
docker-compose up -d

Test the API

curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello, MĀRGA!"}],
    "max_tokens": 100
  }'

📚 Documentation

Quick Start Guide - Get up and running in 5 minutes
API Reference - Complete endpoint documentation
Deployment Guide - Production deployment options
Configuration Reference - All config options explained
Use Cases - Real-world implementation examples
Monitoring Guide - Observability and alerting setup

🌍 Use Cases

1. Multi-Provider Failover

Ensure 99.9% uptime by automatically switching between OpenAI, Anthropic, and local models when providers experience outages.

2. Cost Optimization

Route requests to the most cost-effective provider based on your usage patterns and budget constraints.

3. A/B Testing

Compare model performance by routing traffic between different providers and analyzing response quality.

4. Data Compliance

Keep sensitive data local by routing to on-premises Ollama models while using cloud providers for general requests.

🏢 Enterprise Features

High Availability: Multi-region deployment with automatic failover
Scalability: Horizontal scaling with load balancer integration
Security: Enterprise SSO, audit logging, compliance reporting
Support: 24/7 technical support and SLA guarantees
Custom Integration: API customization and provider development

🔧 Requirements

Runtime: Go 1.21+ or Docker
Memory: 512MB minimum, 1GB recommended
CPU: 1 core minimum, 2+ cores recommended
Storage: 1GB for logs and metrics
Network: Outbound HTTPS access to provider APIs

📈 Performance

Latency: < 50ms routing overhead
Throughput: 10,000+ requests/minute per instance
Availability: 99.9% uptime SLA
Scaling: Linear scaling up to 100 instances

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

Licensed under the MIT License. See LICENSE for details.

🆘 Support

Documentation: docs.avyay.ai/marga
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@avyay.ai

Made with ❤️ by the Avyay (अव्यय) team

Getting Started Quick Start