UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 3 von 103

Similarity query support in big data management systems

Information systems (Oxford), 2020-02, Vol.88, p.101455, Article 101455

2020

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Similarity query support in big data management systems

Ist Teil von

Information systems (Oxford), 2020-02, Vol.88, p.101455, Article 101455

Ort / Verlag

Oxford: Elsevier Ltd

Erscheinungsjahr

2020

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results. •Extends the existing query language of a parallel DBMS to support similarity queries.•Uses existing operators in the system to implement state-of-the-art techniques.•Presents a novel framework called the ”AQL+” to optimize similarity queries.•Includes empirical similarity query experiments using several large, real datasets.•Compares the approach with three other parallel systems to show its relative efficacy.

Sprache: Englisch
Identifikatoren: ISSN: 0306-4379
eISSN: 1873-6076
DOI: 10.1016/j.is.2019.101455
Titel-ID: cdi_proquest_journals_2333949439

Format: –
Schlagworte: Algorithms, Big Data, Data base management systems, Data management, Information systems, Operators, Optimization, Parallel database, Performance indices, Queries, Query languages, Query processing, Run time (computers), Similarity, Similarity query

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Similarity query support in big data management systems

Details

Weiterführende Literatur