Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 9 von 5560
IEEE access, 2019, Vol.7, p.176600-176612
2019
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
Ist Teil von
  • IEEE access, 2019, Vol.7, p.176600-176612
Ort / Verlag
Piscataway: IEEE
Erscheinungsjahr
2019
Quelle
EZB Electronic Journals Library
Beschreibungen/Notizen
  • General language model BERT pre-trained on cross-domain text corpus, BookCorpus and Wikipedia, achieves excellent performance on a couple of natural language processing tasks through the way of fine-tuning in the downstream tasks. But it still lacks of task-specific knowledge and domain-related knowledge for further improving the performance of BERT model and more detailed fine-tuning strategy analyses are necessary. To address these problem, a BERT-based text classification model BERT4TC is proposed via constructing auxiliary sentence to turn the classification task into a binary sentence-pair one, aiming to address the limited training data problem and task-awareness problem. The architecture and implementation details of BERT4TC are also presented, as well as a post-training approach for addressing the domain challenge of BERT. Finally, extensive experiments are conducted on seven public widely-studied datasets for analyzing the fine-tuning strategies from the perspectives of learning rate, sequence length and hidden state vector selection. After that, BERT4TC models with different auxiliary sentences and post-training objectives are compared and analyzed in depth. The experiment results show that BERT4TC with suitable auxiliary sentence significantly outperforms both typical feature-based methods and fine-tuning methods, and achieves new state-of-the-art performance on multi-class classification datasets. For binary sentiment classification datasets, our BERT4TC post-trained with suitable domain-related corpus also achieves better results compared with original BERT model.
Sprache
Englisch
Identifikatoren
ISSN: 2169-3536
eISSN: 2169-3536
DOI: 10.1109/ACCESS.2019.2953990
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_588736ef393b4db4a59bc381fe893489

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX