Just tried LitServe from the good folks at @LightningAI!
Between llama.cpp and vLLM, there is a small gap where a few large models are not deployable!
That's where LitServe comes in!
LitServe is a high-throughput serving engine for AI models built on FastAPI.
Yes, built on FastAPI. That's where the advantage and the issue lie.
It's extremely flexible and supports multi-modality and a variety of models out of the box.
But in my testing, it lags far behind in speed compared to vLLM.
Also, no OpenAI API-compatible endpoint is available as of now.
But as we move to multi-modal models and agents, this serves as a good starting point. However, it’s got to become faster...
GitHub: https://github.com/Lightning-AI/LitServe
Between llama.cpp and vLLM, there is a small gap where a few large models are not deployable!
That's where LitServe comes in!
LitServe is a high-throughput serving engine for AI models built on FastAPI.
Yes, built on FastAPI. That's where the advantage and the issue lie.
It's extremely flexible and supports multi-modality and a variety of models out of the box.
But in my testing, it lags far behind in speed compared to vLLM.
Also, no OpenAI API-compatible endpoint is available as of now.
But as we move to multi-modal models and agents, this serves as a good starting point. However, it’s got to become faster...
GitHub: https://github.com/Lightning-AI/LitServe