UB Paderborn / Katalog / Details

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

International journal of parallel programming, 2020-06, Vol.48 (3), p.496-514

2020

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Ist Teil von

International journal of parallel programming, 2020-06, Vol.48 (3), p.496-514

Ort / Verlag

New York: Springer US

Erscheinungsjahr

2020

Quelle

Springer Journals

Beschreibungen/Notizen

The tedious challenging of Big Data is to store and retrieve of required data from the search engines. Problem Defined There is an obligation for the quick and efficient retrieval of useful information for the many organizations. The elementary idea is to arrange these computing files of organization into individual folders in an hierarchical order of folders. Manually, to order these files into folders, there is an ardent need to know about the file contents and name of the files to give impression of files, so that it provides an alignment of certain set of files as a bunch. Problem Statement Manual grouping of files has its own complications, for example when these files are in numerous amounts and also their contents cannot be illustrious by their labels. Therefore, it’s an intense requirement for Document clustering with data processing machines for enthusiastic results. Existing System A couple of analyzers are impending with dynamic algorithms and comprehensive analogy of extant algorithms, but, yet, these have been restricted to organizations and colleges. After recent updated rules of NMF their raised a self interest in document clustering. These rules gave trust in its performances with better results when compared to Latent Semantic Indexing with Singular Value Decomposition. Proposed System A new working miniature called Novel K-means Non-Negative Matrix Factorization (KNMF) is implemented using renovated guidelines of NMF which has been diagnosed for clustering documents consequently. A new data set called Newsgroup20 is considered for the exploratory purpose. Removal of common clutter/stop words using keywords from Key Phrase Extraction Algorithm and a new proposed Iterated Lovin stemming will be utilized in preprocessing step inassisting to KNMF. Compared to the Porter stemmer and Lovins stemmer algorithms, Iterative Lovins algorithm is providing 5% more reduction. 60% of the document terms are been minimized to root as remaining terms are already root words. Eventually, an appeal to these processes named “Progressive Text mining radical” is developed inlateral exertion of K-Means algorithm from the defined Apache Mahout Project which is used to analyze the performance of the MapReduce framework in Hadoop.

Sprache: Englisch
Identifikatoren: ISSN: 0885-7458
eISSN: 1573-7640
DOI: 10.1007/s10766-018-0591-9
Titel-ID: cdi_crossref_primary_10_1007_s10766_018_0591_9

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction

Details

Weiterführende Literatur