Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...

Details

Autor(en) / Beteiligte
Titel
Jaql: a scripting language for large scale semistructured data analysis
Ist Teil von
  • Proceedings of the VLDB Endowment, 2011-08, Vol.4 (12), p.1272-1283
Erscheinungsjahr
2011
Link zum Volltext
Quelle
ACM
Beschreibungen/Notizen
  • This paper describes Jaql, a declarative scripting language for analyzing large semistructured datasets in parallel using Hadoop's MapReduce framework. Jaql is currently used in IBM's InfoSphere BigInsights [5] and Cognos Consumer Insight [9] products. Jaql's design features are: (1) a flexible data model, (2) reusability, (3) varying levels of abstraction, and (4) scalability. Jaql's data model is inspired by JSON and can be used to represent datasets that vary from flat, relational tables to collections of semistructured documents. A Jaql script can start without any schema and evolve over time from a partial to a rigid schema. Reusability is provided through the use of higher-order functions and by packaging related functions into modules. Most Jaql scripts work at a high level of abstraction for concise specification of logical operations (e.g., join), but Jaql's notion of physical transparency also provides a lower level of abstraction if necessary. This allows users to pin down the evaluation plan of a script for greater control or even add new operators. The Jaql compiler automatically rewrites Jaql scripts so they can run in parallel on Hadoop. In addition to describing Jaql's design, we present the results of scale-up experiments on Hadoop running Jaql scripts for intranet data analysis and log processing.
Sprache
Englisch
Identifikatoren
ISSN: 2150-8097
eISSN: 2150-8097
DOI: 10.14778/3402755.3402761
Titel-ID: cdi_crossref_primary_10_14778_3402755_3402761
Format

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX