Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 8 von 8
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, p.60-70

Details

Autor(en) / Beteiligte
Titel
Enhancing LSTM-based Word Segmentation Using Unlabeled Data
Ist Teil von
  • Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, p.60-70
Ort / Verlag
Cham: Springer International Publishing
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • Word segmentation problem is widely solved as the sequence labeling problem. The traditional way to this kind of problem is machine learning method like conditional random field with hand-crafted features. Recently, deep learning approaches have achieved state-of-the-art performance on word segmentation task and a popular method of them is LSTM networks. This paper gives a method to introduce numerical statistics-based features counted on unlabeled data into LSTM networks and analyzes how it enhances the performance of word segmentation model. We add pre-trained character-bigram embedding, pointwise mutual information, accessor variety and punctuation variety into our model and compare their performances on different datasets including three datasets from CoNLL-2017 shared task and three datasets of simplified Chinese. We achieve the state-of-the-art performance on two of them and get comparable results on the rest.
Sprache
Englisch
Identifikatoren
ISBN: 3319690043, 9783319690049
ISSN: 0302-9743
eISSN: 1611-3349
DOI: 10.1007/978-3-319-69005-6_6
Titel-ID: cdi_springer_books_10_1007_978_3_319_69005_6_6

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX