Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Malicious Variant Text Alignment Algorithm Based on Improved Soundshape Codes
Ist Teil von
2023 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS), 2023, p.790-793
Ort / Verlag
IEEE
Erscheinungsjahr
2023
Link zum Volltext
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
In order to improve the extraction ability of illegal information from various platforms in the network, this paper proposes a non-equal-length malicious foreign characters alignment algorithm based on improved soundshape code combined with Needleman-Wunsch algorithm. The algorithm first uses the improved soundshape code to transform Chinese characters into a special coding sequence that integrates soundshape, morphological and semantic features, and then according to the principle of maximizing the sum of the similarity of characters at each position in the sequence, the Needleman-Wunsch sequence comparison algorithm combining the character similarity function is utilized to complete the alignment of the original text with the deformed text by correspondence and padding. Experiments were conducted on a constructed dataset containing malicious variants, and the results show that the algorithm improves the alignment effect due to reduced character padding and accurate text correspondence.