Question about LightEval ๐Ÿค—:I've been searching for an LL... | Question about LightEval ๐Ÿค—:I've been searching for an LL...
Question about LightEval ๐Ÿค—:

I've been searching for an LLM evaluation suite that can, out-of-the-box, compare the outputs of a model(s) without any enhancements vs. the same model with better prompt engineering, vs. the same model with RAG vs. the same model with fine-tuning.

I unfortunately have not found a tool that fits my exact description, but of course I ran into LightEval.

A huge pain-point of building large-scale projects that use LLMs is that prior to building an MVP, it is difficult to evaluate whether better prompt engineering, or RAG, or fine-tuning, or some combination of all is needed for satisfactory LLM output in terms of the project's given use case.

Time and resources is then wasted R&D'ing exactly what LLM enhancements are needed.

I believe an out-of-the-box solution to compare models w/ or w/out the aforementioned LLM enhancements could help teams of any size better decide what LLM enhancements are needed prior to building.

I wanted to know if the LightEval team or Hugging Face in general is thinking about such a tool.