UB Paderborn / Katalog / Suche / Details

Ergebnis 3 von 3524824

2025

Autor(en) / Beteiligte

Titel

Introducing the VERITAS Evaluation Framework: Valid Elo Ratings for Integrated Technologies & AI Systems

Ort / Verlag

ProQuest Dissertations & Theses

Erscheinungsjahr

2025

Link zum Volltext

Quelle

ProQuest Dissertations & Theses A&I

Beschreibungen/Notizen

Evaluating generative AI systems, such as retrieval augmented generation (RAG), is difficult due to the breadth of use cases and lack of adequate benchmarks. Typically, evaluation of an end-to-end system is separated from the evaluation of its component parts. This separation obscures how adjustments made to the components translate to changes in user preferences for the end-to-end system. This thesis presents a new evaluation framework to address this: Valid Elo Ratings for Integrated Technologies & AI Systems (VERITAS). We also develop an AI System from first-principles for the use case of single document Form 10-K Q&A called B.E.A.R. or Blue-chip Edgar Augmented Retrieval. As part of VERITAS, evaluation datasets are created for a stratified sample of companies in the S&P 500 Index. We leverage these to generate Elo ratings for perfect system implementations of retrieval and response generation. We then evaluate where alternative implementations of B.E.A.R. fall within these theoretical limits.

Sprache: Englisch
Identifikatoren: ISBN: 9798381696141
Titel-ID: cdi_proquest_journals_2925792958

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX