Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Big data, 2020-02, Vol.8 (1), p.38-61
2020
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
SOOM: Sort-Based Optimizer for Big Data Multi-Query
Ist Teil von
  • Big data, 2020-02, Vol.8 (1), p.38-61
Ort / Verlag
United States
Erscheinungsjahr
2020
Quelle
MEDLINE
Beschreibungen/Notizen
  • Mostly, sorting of data is a common operation in many applications, which causes the consumption of resources and thus leads to computation overheads. Regarding the context of Big Data multi-query, the shared sort operations are fairly large, which incur high-cost I/Os whether explicit or implicit. In particular, Big Data multi-query, including aggregation and sort operations, takes long execution time due to reshuffle of the same data multiple times using similar tasks. Therefore, exploiting the sharing data and the sharing sort opportunities of similar tasks can offer the possibility of reusing the previous results to optimize multi-query. For considering sharing data, our previous work, Multi-Query Optimization Using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of the sharing data opportunities among multi-query. However, time overheads regarding redundant data in-network movement (i.e., shuffling time to transfer intermediate data for sort operations) have not been considered. Therefore, the MOTH system has been extended to SOOM (Sort-Based Optimizer over MOTH) system to exploit sharing sort opportunities, including explicit sorts of sort queries and implicit sorts of aggregation queries. The proposed SOOM system consists of two additional modules to exploit sharing sort opportunities, namely query explorer and sort exploiter, which leverage our existing MOTH system to fulfill optimizing multiple aggregation and sort queries. The experimental evaluation has shown that the SOOM system outperforms the naive and the state-of-art techniques regarding query execution time among queries by 45% and 30%, respectively, while introducing maximal intermediate data size reduction by 67% and 61% in average, respectively, over Hadoop-like infrastructures.
Sprache
Englisch
Identifikatoren
ISSN: 2167-6461
eISSN: 2167-647X
DOI: 10.1089/big.2019.0023
Titel-ID: cdi_pubmed_primary_31999479

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX