Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 18 von 1091
Information processing letters, 2023-08, Vol.182, p.106382, Article 106382
2023

Details

Autor(en) / Beteiligte
Titel
Exact PPS sampling with bounded sample size
Ist Teil von
  • Information processing letters, 2023-08, Vol.182, p.106382, Article 106382
Ort / Verlag
Elsevier B.V
Erscheinungsjahr
2023
Link zum Volltext
Quelle
ScienceDirect Journals (5 years ago - present)
Beschreibungen/Notizen
  • Probability proportional to size (PPS) sampling schemes with a target sample size aim to produce a sample comprising a specified number n of items while ensuring that each item in the population appears in the sample with a probability proportional to its specified “weight” (also called its “size”). These two objectives, however, cannot always be achieved simultaneously. Existing PPS schemes prioritize control of the sample size, violating the PPS property if necessary. We provide a new PPS scheme, called EB-PPS, that allows a different trade-off: EB-PPS enforces the PPS property at all times while ensuring that the sample size never exceeds the target value n. The sample size is exactly equal to n if possible, and otherwise has maximal expected value and minimal variance. Thus we bound the sample size, thereby avoiding storage overflows and helping to control the time required for analytics over the sample, while allowing the user complete control over the sample contents. In the context of training classifiers at scale under imbalanced loss functions, we show that such control yields superior classifiers. The method is both simple to implement and efficient, being a one-pass streaming algorithm with an amortized processing time of O(1) per item, which makes it computationally preferable even in cases where both EB-PPS and prior algorithms can ensure the PPS property and a target sample size simultaneously. •New sampling scheme having item probabilities always proportional to their weights.•User sets maximum sample size and sample is always as large and stable as possible.•Algorithm can handle streaming data with amortized constant processing time per item.•Improves accuracy of downstream classification tasks compared to prior approaches.
Sprache
Englisch
Identifikatoren
ISSN: 0020-0190
eISSN: 1872-6119
DOI: 10.1016/j.ipl.2023.106382
Titel-ID: cdi_crossref_primary_10_1016_j_ipl_2023_106382

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX