UB Paderborn / Katalog / Suche / Details

Proceedings of the National Academy of Sciences - PNAS, 2024-04, Vol.121 (18), p.e2307304121

2024

Autor(en) / Beteiligte

Titel

AI model disgorgement: Methods and choices

Ist Teil von

Proceedings of the National Academy of Sciences - PNAS, 2024-04, Vol.121 (18), p.e2307304121

Ort / Verlag

United States

Erscheinungsjahr

2024

Link zum Volltext

Quelle

MEDLINE

Beschreibungen/Notizen

Over the past few years, machine learning models have significantly increased in size and complexity, especially in the area of generative AI such as large language models. These models require massive amounts of data and compute capacity to train, to the extent that concerns over the training data (such as protected or private content) cannot be practically addressed by retraining the model "from scratch" with the questionable data removed or altered. Furthermore, despite significant efforts and controls dedicated to ensuring that training corpora are properly curated and composed, the sheer volume required makes it infeasible to manually inspect each datum comprising a training corpus. One potential approach to training corpus data defects is model disgorgement, by which we broadly mean the elimination or reduction of not only any improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible use of intellectual property. In this paper, we survey the landscape of model disgorgement methods and introduce a taxonomy of disgorgement techniques that are applicable to modern ML systems. In particular, we investigate the various meanings of "removing the effects" of data on the trained model in a way that does not require retraining from scratch.

Sprache: Englisch
Identifikatoren: eISSN: 1091-6490
DOI: 10.1073/pnas.2307304121
Titel-ID: cdi_pubmed_primary_38640257

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX