UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 11 von 102

Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine

Companion of the 2024 International Conference on Management of Data, 2024, p.5-17

2024

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine

Ist Teil von

Companion of the 2024 International Conference on Management of Data, 2024, p.5-17

Ort / Verlag

New York, NY, USA: ACM

Erscheinungsjahr

2024

Quelle

ACM Digital Library

Beschreibungen/Notizen

Apache Arrow DataFusion is a fast, embeddable, and extensible query engine written in Rust that uses Apache Arrow as its memory model. In this paper we describe the technologies on which it is built, and how it fits in long-term database implementation trends. We then enumerate its features, optimizations, architecture and extension APIs to illustrate the breadth of requirements of modern OLAP engines as well as the interfaces needed by systems built with them. Finally, we demonstrate open standards and extensible design do not preclude state-of-the-art performance using a series of experimental comparisons to DuckDB. While the individual techniques used in DataFusion have been previously described many times, it differs from other industrial strength engines by providing competitive performance and an open architecture that can be customized using more than 10 major extension APIs. This flexibility has led to use in many commercial and open source databases, machine learning pipelines, and other data-intensive systems. We anticipate that the accessibility and versatility of DataFusion, along with its competitive performance, will further the proliferation of high-performance custom data infrastructures tailored to specific needs assembled from modular components. While the individual techniques used in DataFusion have been previously described many times, it differs from other industrial strength engines by providing competitive performance and an open architecture that can be customized using more than 10 major extension APIs. This flexibility has led to use in many commercial and open source databases, machine learning pipelines, and other data-intensive systems. We anticipate that the accessibility and versatility of DataFusion, along with its competitive performance, will further the proliferation of high-performance custom data infrastructures tailored to specific needs assembled from modular components.

Sprache: Englisch
Identifikatoren: ISBN: 9798400704222
DOI: 10.1145/3626246.3653368
Titel-ID: cdi_acm_books_10_1145_3626246_3653368_brief

Format: –
Schlagworte: Information systems -- Data management systems -- Database design and models -- Relational database model, Information systems -- Data management systems -- Database management system engines, Information systems -- Data management systems -- Database management system engines -- Database query processing, Information systems -- Data management systems -- Database management system engines -- DBMS engine architectures, Information systems -- Data management systems -- Database management system engines -- Online analytical processing engines, Software and its engineering -- Software organization and properties -- Extra-functional properties -- Software performance, Software and its engineering -- Software organization and properties -- Extra-functional properties -- Software usability, Software and its engineering -- Software organization and properties -- Software system structures -- Abstraction, modeling and modularity

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine

Details

Weiterführende Literatur