adaptive-classifier: Cut your LLM costs with smart query routing (32.4% cost savings demonstrated)
Hey Folks! I'm excited to share a new open-source library that can help optimize your LLM deployment costs. The adaptive-classifier library learns to route queries between your models based on complexity, continuously improving through real-world usage. We tested it on the arena-hard-auto dataset, routing between a high-cost and low-cost model (2x cost difference). The results were impressive: 32.4% cost savings with adaptation enabled Same overall success rate (22%) as baseline System automatically learned from 110 new examples during evaluation Successfully routed 80.4% of queries to the cheaper model Perfect for setups where you're running multiple LLama models (like Llama-3.1-70B alongside Llama-3.1-8B) and want to optimize costs without sacrificing capability. The library integrates easily with any transformer-based models and includes built-in state persistence. Check out the repo for implementation details and benchmarks. Would love to hear your experiences if you try it out! Repo - https://github.com/codelion/adaptive-classifier
Hey Folks! I'm excited to share a new open-source library that can help optimize your LLM deployment costs. The adaptive-classifier library learns to route queries between your models based on complexity, continuously improving through real-world usage.
We tested it on the arena-hard-auto dataset, routing between a high-cost and low-cost model (2x cost difference). The results were impressive:
- 32.4% cost savings with adaptation enabled
- Same overall success rate (22%) as baseline
- System automatically learned from 110 new examples during evaluation
- Successfully routed 80.4% of queries to the cheaper model
Perfect for setups where you're running multiple LLama models (like Llama-3.1-70B alongside Llama-3.1-8B) and want to optimize costs without sacrificing capability. The library integrates easily with any transformer-based models and includes built-in state persistence.
Check out the repo for implementation details and benchmarks. Would love to hear your experiences if you try it out!
What's Your Reaction?