Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 3 von 3524824

Details

Autor(en) / Beteiligte
Titel
Introducing the VERITAS Evaluation Framework: Valid Elo Ratings for Integrated Technologies & AI Systems
Ort / Verlag
ProQuest Dissertations & Theses
Erscheinungsjahr
2025
Link zum Volltext
Quelle
ProQuest Dissertations & Theses A&I
Beschreibungen/Notizen
  • Evaluating generative AI systems, such as retrieval augmented generation (RAG), is difficult due to the breadth of use cases and lack of adequate benchmarks. Typically, evaluation of an end-to-end system is separated from the evaluation of its component parts. This separation obscures how adjustments made to the components translate to changes in user preferences for the end-to-end system. This thesis presents a new evaluation framework to address this: Valid Elo Ratings for Integrated Technologies & AI Systems (VERITAS). We also develop an AI System from first-principles for the use case of single document Form 10-K Q&A called B.E.A.R. or Blue-chip Edgar Augmented Retrieval. As part of VERITAS, evaluation datasets are created for a stratified sample of companies in the S&P 500 Index. We leverage these to generate Elo ratings for perfect system implementations of retrieval and response generation. We then evaluate where alternative implementations of B.E.A.R. fall within these theoretical limits.
Sprache
Englisch
Identifikatoren
ISBN: 9798381696141
Titel-ID: cdi_proquest_journals_2925792958

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX