TGI Multi-LoRA: Deploy Once, Serve 30 modelsAre you tired... | TGI Multi-LoRA: Deploy Once, Serve 30 modelsAre you tired...
TGI Multi-LoRA: Deploy Once, Serve 30 models
Are you tired of the complexity and expense of managing multiple AI models? What if you could deploy once and serve 30 models? In today's ML world, organizations looking to leverage the value of their data will likely end up in a fine-tuned world, building a multitude of models, each one highly specialized for a specific task. But how can you keep up with the hassle and cost of deploying a model for each use case? The answer is Multi-LoRA serving.

Motivation
As an organization, building a multitude of models via fine-tuning makes sense for multiple reasons.

Performance - There is compelling evidence that smaller, specialized models outperform their larger, general-purpose counterparts on the tasks that they were trained on. Predibase [5] showed that you can get better performance than GPT-4 using task-specific LoRAs with a base like mistralai/Mistral-7B-v0.1.

Adaptability - Models like Mistral or Llama are extremely versatile. You can pick one of them as your base model and build many specialized models, even when the downstream tasks are very different. Also, note that you aren't locked in as you can easily swap that base and fine-tune it with your data on another base (more on this later).

Independence - For each task that your organization cares about, different teams can work on different fine tunes, allowing for independence in data preparation, configurations, evaluation criteria, and cadence of model updates.

Privacy - Specialized models offer flexibility with training data segregation and access restrictions to different users based on data privacy requirements. Additionally, in cases where running models locally is important, a small model can be made highly capable for a specific task while keeping its size small enough to run on device.https://github.com/huggingface/blog/blob/main/multi-lora-serving.md blog/multi-lora-serving.md at main · huggingface/blog