Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 11 von 9645

Details

Autor(en) / Beteiligte
Titel
Synthesizing entity matching rules by examples
Ist Teil von
  • Proceedings of the VLDB Endowment, 2017-10, Vol.11 (2), p.189-202
Erscheinungsjahr
2017
Quelle
ACM Digital Library
Beschreibungen/Notizen
  • Entity matching (EM) is a critical part of data integration. We study how to synthesize entity matching rules from positive-negative matching examples. The core of our solution is program synthesis , a powerful tool to automatically generate rules (or programs) that satisfy a given high-level specification, via a predefined grammar. This grammar describes a General Boolean Formula ( GBF ) that can include arbitrary attribute matching predicates combined by conjunctions (∧), disjunctions (∨) and negations (¬), and is expressive enough to model EM problems, from capturing arbitrary attribute combinations to handling missing attribute values. The rules in the form of GBF are more concise than traditional EM rules represented in Disjunctive Normal Form ( DNF ). Consequently, they are more interpretable than decision trees and other machine learning algorithms that output deep trees with many branches. We present a new synthesis algorithm that, given only positive-negative examples as input, synthesizes EM rules that are effective over the entire dataset. Extensive experiments show that we outperform other interpretable rules (e.g., decision trees with low depth) in effectiveness, and are comparable with non-interpretable tools (e.g., decision trees with high depth, gradient-boosting trees, random forests and SVM).
Sprache
Englisch
Identifikatoren
ISSN: 2150-8097
eISSN: 2150-8097
DOI: 10.14778/3149193.3149199
Titel-ID: cdi_crossref_primary_10_14778_3149193_3149199
Format

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX