Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Proceedings of the 2018 International Conference on Management of Data, 2018, p.841-855
2018
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Lightweight Cardinality Estimation in LSM-based Systems
Ist Teil von
  • Proceedings of the 2018 International Conference on Management of Data, 2018, p.841-855
Ort / Verlag
New York, NY, USA: ACM
Erscheinungsjahr
2018
Quelle
ACM Digital Library Complete
Beschreibungen/Notizen
  • Data sources, such as social media, mobile apps and IoT sensors, generate billions of records each day. Keeping up with this influx of data while providing useful analytics to the users is a major challenge for today's data-intensive systems. A popular solution that allows such systems to handle rapidly incoming data is to rely on log-structured merge (LSM) storage models. LSM-based systems provide a tunable trade-off between ingesting vast amounts of data at a high rate and running efficient analytical queries on top of that data. For queries, it is well-known that the query processing performance largely depends on the ability to generate efficient execution plans. Previous research showed that OLAP query workloads rely on having small, yet precise, statistical summaries of the underlying data, which can drive the cost-based query optimization. In this paper we address the problem of computing data statistics for workloads with rapid data ingestion and propose a lightweight statistics-collection framework that exploits the properties of LSM storage. Our approach is designed to piggyback on the events (flush and merge) of the LSM lifecycle. This allows us to easily create an initial statistics and then keep them in sync with rapidly changing data while minimizing the overhead to the existing system. We have implemented and adapted well-known algorithms to produce various types of statistical synopses, including equi-width histograms, equi-height histograms, and wavelets. We performed an in-depth empirical evaluation that considers both the cardinality estimation accuracy and runtime overheads of collecting and using statistics. The experiments were conducted by prototyping our approach on top of Apache AsterixDB, an open source Big Data management system that has an entirely LSM-based storage backend.
Sprache
Englisch
Identifikatoren
ISBN: 1450347037, 9781450347037
DOI: 10.1145/3183713.3183761
Titel-ID: cdi_acm_books_10_1145_3183713_3183761_brief

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX