Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Exploratory Data Analysis with Database-backed Dataframes: A Case Study on Airbnb Data
Ist Teil von
2021 IEEE International Conference on Big Data (Big Data), 2021, p.3119-3129
Ort / Verlag
IEEE
Erscheinungsjahr
2021
Quelle
IEEE/IET Electronic Library (IEL)
Beschreibungen/Notizen
Choosing between various scalable dataframe libraries can be an overwhelming task for data scientists but it is critical because each framework deploys a different optimization technique that could affect the overall performance. Comparing each framework on a set of analytical tasks in isolation might not fully represent the unique characteristics of big data analyses. This paper describes a case study of applying PolyFrame, a database-backed dataframe library, on an end-to-end exploratory data analysis involving Airbnb data. PolyFrame is a scalable data analytics library that provides a Pandas-like dataframe interface on top of a variety of database systems. The familiarity of its interface enables data scientists to interact with large collections of data through a scale-independent data analysis experience without needing significant database or distributed systems knowledge. Throughout this case study we also highlight the scalability benefits and limitations of database-backed dataframes via a performance comparison with Pandas dataframes for each of the stages of the analysis.