UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 12 von 314

Automatically countering imbalance and its empirical relationship to cost

Data mining and knowledge discovery, 2008-10, Vol.17 (2), p.225-252

2008

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Automatically countering imbalance and its empirical relationship to cost

Ist Teil von

Data mining and knowledge discovery, 2008-10, Vol.17 (2), p.225-252

Ort / Verlag

Boston: Springer US

Erscheinungsjahr

2008

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Learning from imbalanced data sets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in large-scale simulations, there is a correspondingly high cost for the misclassification of rare events. Under such circumstances, the data set is often re-sampled to generate models with high minority class accuracy. However, the sampling methods face a common, but important, criticism: how to automatically discover the proper amount and type of sampling? To address this problem, we propose a wrapper paradigm that discovers the amount of re-sampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve (AUROC), cost, cost-curves, and the cost dependent f-measure. Our analysis of the wrapper is twofold. First, we report the interaction between different evaluation and wrapper optimization functions. Second, we present a set of results in a cost- sensitive environment, including scenarios of unknown or changing cost matrices. We also compared the performance of the wrapper approach versus cost-sensitive learning methods—MetaCost and the Cost-Sensitive Classifiers—and found the wrapper to outperform the cost-sensitive classifiers in a cost-sensitive environment. Lastly, we obtained the lowest cost per test example compared to any result we are aware of for the KDD-99 Cup intrusion detection data set.

Sprache: Englisch
Identifikatoren: ISSN: 1384-5810
eISSN: 1573-756X
DOI: 10.1007/s10618-008-0087-0
Titel-ID: cdi_proquest_journals_230104515

Format: –
Schlagworte: Artificial Intelligence, Chemistry and Earth Sciences, Computer Science, Costs, Data Mining and Knowledge Discovery, Datasets, Fraud, Fraud prevention, Information Storage and Retrieval, Physics, Sampling techniques, Statistics for Engineering

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Automatically countering imbalance and its empirical relationship to cost

Details

Weiterführende Literatur