Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 13 von 1091
International Conference on Management of Data: Proceedings of the 2007 ACM SIGMOD international conference on Management of data; 11-14 June 2007, 2007, p.199-210
2007

Details

Autor(en) / Beteiligte
Titel
On synopses for distinct-value estimation under multiset operations
Ist Teil von
  • International Conference on Management of Data: Proceedings of the 2007 ACM SIGMOD international conference on Management of data; 11-14 June 2007, 2007, p.199-210
Ort / Verlag
ACM
Erscheinungsjahr
2007
Link zum Volltext
Quelle
ACM Digital Library
Beschreibungen/Notizen
  • The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. We provide DV estimation techniques that are designed for use within a flexible and scalable "synopsis warehouse" architecture. In this setting, incoming data is split into partitions and a synopsis is created for each partition; each synopsis can then be used to quickly estimate the number of DVs in its corresponding partition. By combining and extending a number of results in the literature, we obtain both appropriate synopses and novel DV estimators to use in conjunction with these synopses. Our synopses can be created in parallel, and can then be easily combined to yield synopses and DV estimates for arbitrary unions, intersections or differences of partitions. Our synopses can also handle deletions of individual partition elements. We use the theory of order statistics to show that our DV estimators are unbiased, and to establish moment formulas and sharp error bounds. Based on a novel limit theorem, we can exploit results due to Cohen in order to select synopsis sizes when initially designing the warehouse. Experiments and theory indicate that our synopses and estimators lead to lower computational costs and more accurate DV estimates than previous approaches.
Sprache
Englisch
Identifikatoren
ISBN: 9781595936868, 1595936866
DOI: 10.1145/1247480.1247504
Titel-ID: cdi_proquest_miscellaneous_31399378

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX