UB Paderborn / Katalog / Suche / Details

Ergebnis 8 von 8

Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, p.60-70

Autor(en) / Beteiligte

Titel

Enhancing LSTM-based Word Segmentation Using Unlabeled Data

Ist Teil von

Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, p.60-70

Ort / Verlag

Cham: Springer International Publishing

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Word segmentation problem is widely solved as the sequence labeling problem. The traditional way to this kind of problem is machine learning method like conditional random field with hand-crafted features. Recently, deep learning approaches have achieved state-of-the-art performance on word segmentation task and a popular method of them is LSTM networks. This paper gives a method to introduce numerical statistics-based features counted on unlabeled data into LSTM networks and analyzes how it enhances the performance of word segmentation model. We add pre-trained character-bigram embedding, pointwise mutual information, accessor variety and punctuation variety into our model and compare their performances on different datasets including three datasets from CoNLL-2017 shared task and three datasets of simplified Chinese. We achieve the state-of-the-art performance on two of them and get comparable results on the rest.

Sprache: Englisch
Identifikatoren: ISBN: 3319690043, 9783319690049
ISSN: 0302-9743
eISSN: 1611-3349
DOI: 10.1007/978-3-319-69005-6_6
Titel-ID: cdi_springer_books_10_1007_978_3_319_69005_6_6

Format: –
Schlagworte: Neural network, Statistics-based features, Unlabeled data, Word segmentation

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX