MĀRGA (मार्ग) Enterprise LLM Router
मार्ग (MĀRGA) - Sanskrit for “path” or “route”
Part of the अव्यय (Avyay) AI Platform
MĀRGA is an enterprise-grade LLM (Large Language Model) router that provides intelligent routing, failover, and load balancing across multiple AI providers. It offers an OpenAI-compatible API while seamlessly integrating with OpenAI, Anthropic, Ollama, and other LLM providers.
🏗️ Architecture
┌─────────────────┐
│ Client Apps │
└─────────────────┘
│
OpenAI-compatible API
│
┌─────────────────────────────────────────────────────────────┐
│ MĀRGA Router │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Router │ │ Metrics │ │ Health Checks │ │
│ │ Engine │ │ Collector │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────────────────────────────────────┐
│ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ OpenAI │ │ Anthropic │ │ Ollama │
│ (GPT-4, etc) │ │ (Claude, etc) │ │ (Local Models) │
└─────────────────┘ └─────────────────┘ └─────────────────┘✨ Key Features
🔄 Intelligent Routing
- Multi-Provider Support: OpenAI, Anthropic, Ollama, Together AI
- Model Mapping: Route
gpt-4→gpt-4o,claude-3→claude-3-5-sonnet - Priority-Based Selection: Configure provider preference order
- Load Balancing: Round-robin, weighted, least-connections
🛡️ Enterprise-Grade Reliability
- Automatic Failover: Seamless provider switching on failures
- Health Monitoring: Continuous provider health checks
- Rate Limiting: Global and per-client request throttling
- Circuit Breaker: Prevent cascading failures
📊 Advanced Monitoring
- Prometheus Metrics: Request latency, token usage, error rates
- Datadog Integration: Full observability and alerting
- Provider Analytics: Performance comparison and cost optimization
- Request Tracing: End-to-end request tracking
🔐 Security & Compliance
- API Key Authentication: Secure endpoint access
- CORS Support: Cross-origin resource sharing
- Request Validation: Input sanitization and validation
- Audit Logging: Complete request/response logging
💰 Cost Optimization
- Smart Routing: Route to cheapest available provider
- Usage Analytics: Track costs per model and provider
- Budget Controls: Set spending limits and alerts
- Token Optimization: Minimize unnecessary token usage
🚀 Quick Start
Docker (Recommended)
# Pull the image
docker pull ghcr.io/gaurav21/marga:latest
# Run with minimal config
docker run -p 8080:8080 \
-e OPENAI_API_KEY=your-key-here \
-e ANTHROPIC_API_KEY=your-key-here \
ghcr.io/gaurav21/marga:latestDocker Compose
# Clone and setup
git clone https://github.com/gaurav21/avyay-marga
cd avyay-marga
cp .env.example .env
# Edit .env with your API keys
vi .env
# Start all services
docker-compose up -dTest the API
curl -X POST https://marga-449012790678.asia-southeast1.run.app/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello, MĀRGA!"}],
"max_tokens": 100
}'📚 Documentation
- Quick Start Guide - Get up and running in 5 minutes
- API Reference - Complete endpoint documentation
- Deployment Guide - Production deployment options
- Configuration Reference - All config options explained
- Use Cases - Real-world implementation examples
- Monitoring Guide - Observability and alerting setup
🌍 Use Cases
1. Multi-Provider Failover
Ensure 99.9% uptime by automatically switching between OpenAI, Anthropic, and local models when providers experience outages.
2. Cost Optimization
Route requests to the most cost-effective provider based on your usage patterns and budget constraints.
3. A/B Testing
Compare model performance by routing traffic between different providers and analyzing response quality.
4. Data Compliance
Keep sensitive data local by routing to on-premises Ollama models while using cloud providers for general requests.
🏢 Enterprise Features
- High Availability: Multi-region deployment with automatic failover
- Scalability: Horizontal scaling with load balancer integration
- Security: Enterprise SSO, audit logging, compliance reporting
- Support: 24/7 technical support and SLA guarantees
- Custom Integration: API customization and provider development
🔧 Requirements
- Runtime: Go 1.21+ or Docker
- Memory: 512MB minimum, 1GB recommended
- CPU: 1 core minimum, 2+ cores recommended
- Storage: 1GB for logs and metrics
- Network: Outbound HTTPS access to provider APIs
📈 Performance
- Latency: < 50ms routing overhead
- Throughput: 10,000+ requests/minute per instance
- Availability: 99.9% uptime SLA
- Scaling: Linear scaling up to 100 instances
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
📄 License
Licensed under the MIT License. See LICENSE for details.
🆘 Support
- Documentation: docs.avyay.ai/marga
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@avyay.ai
Made with ❤️ by the Avyay (अव्यय) team