Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
A Hierarchal BERT Structure for Native Speaker Writing Detection
Ist Teil von
2022 China Automation Congress (CAC), 2022, p.3705-3710
Ort / Verlag
IEEE
Erscheinungsjahr
2022
Quelle
IEEE Electronic Library Online
Beschreibungen/Notizen
Native speaker detection has always focused on speech data. However, the writing style of native speakers is also different from that of non-native speakers. Therefore, for the first time, we performed native speaker detection on text data so that non-native speakers can better learn the writing style of native speakers. Native speaker writing detection is relatively difficult due to the long sequences and complex semantics in the writings. Therefore, we use BERT-based methods. However, BERT suffers from the exponentially increasing computational complexity because of the self-attention mechanism, which limits the length of text input. Consequently, in this paper, we present a hierarchical BERT model to solve this problem. Our model first cuts the long text into segments and obtains segment representation vectors from BERT. Then, we extract the temporal and interactional information between segments to form a text-level representation vector for writing detection. We conducted experiments on a self-made native speaker writing detection dataset. The results demonstrate that our model can accurately recognize native speakers' writing. In addition, we have successfully used it in various long text classification tasks and achieved improvement over the baseline models. We also show the importance of both temporal and interaction information for text-level representation.