๐ Golden-Retriever enhances Retrieval Augmented Generation (RAG) for industrial knowledge bases. Addresses challenges with domain-specific jargon and context interpretation.
๐ Results: Golden-Retriever improves total score of Meta-Llama-3-70B by 79.2% over vanilla LLM, 40.7% over RAG. Average improvement across three LLMs: 57.3% over vanilla LLM, 35.0% over RAG.
๐ Introduces reflection-based question augmentation before document retrieval. Identifies jargon, clarifies meaning based on context, augments question accordingly.
๐ Offline process: OCR extracts text from various document formats. LLMs summarize and contextualize to enhance document database.
๐ Online process: LLM identifies jargon and context in user query. Queries jargon dictionary for accurate definitions. Augments original question with clear context and resolved ambiguities.
๐ Jargon identification uses LLM instead of string-exact-match. Adapts to new terms, misspellings. Outputs structured list of identified terms.
๐ Context identification uses pre-specified context names and descriptions. LLM identifies context using few-shot examples with Chain-of-Thought prompting.
๐ Jargon dictionary queried using SQL. Retrieves extended definitions, descriptions, notes about identified terms.
๐ Augmented question integrates original query, context information, detailed jargon definitions. Explicitly states context, clarifies ambiguous terms.
๐ Fallback mechanism for unidentified jargon. Synthesizes response indicating missing information, instructs user to check spelling or contact knowledge base manager.
๐ Evaluation: Question-answering experiment using multiple-choice questions from new-hire training documents. Covers six domains, 9-10 questions each. Compared with vanilla LLM and RAG.
๐ Results: Golden-Retriever improves total score of Meta-Llama-3-70B by 79.2% over vanilla LLM, 40.7% over RAG. Average improvement across three LLMs: 57.3% over vanilla LLM, 35.0% over RAG.
๐ Introduces reflection-based question augmentation before document retrieval. Identifies jargon, clarifies meaning based on context, augments question accordingly.
๐ Offline process: OCR extracts text from various document formats. LLMs summarize and contextualize to enhance document database.
๐ Online process: LLM identifies jargon and context in user query. Queries jargon dictionary for accurate definitions. Augments original question with clear context and resolved ambiguities.
๐ Jargon identification uses LLM instead of string-exact-match. Adapts to new terms, misspellings. Outputs structured list of identified terms.
๐ Context identification uses pre-specified context names and descriptions. LLM identifies context using few-shot examples with Chain-of-Thought prompting.
๐ Jargon dictionary queried using SQL. Retrieves extended definitions, descriptions, notes about identified terms.
๐ Augmented question integrates original query, context information, detailed jargon definitions. Explicitly states context, clarifies ambiguous terms.
๐ Fallback mechanism for unidentified jargon. Synthesizes response indicating missing information, instructs user to check spelling or contact knowledge base manager.
๐ Evaluation: Question-answering experiment using multiple-choice questions from new-hire training documents. Covers six domains, 9-10 questions each. Compared with vanilla LLM and RAG.