Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 15 von 15
2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2023, Vol.2023, p.1-4
2023
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Kmer-Node2Vec: a Fast and Efficient Method for Kmer Embedding from the Kmer Co-occurrence Graph, with Applications to DNA Sequences
Ist Teil von
  • 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2023, Vol.2023, p.1-4
Ort / Verlag
United States: IEEE
Erscheinungsjahr
2023
Quelle
MEDLINE
Beschreibungen/Notizen
  • Learning low-dimensional continuous vector representation for short k-mers divided from long DNA sequences is key to DNA sequence modeling that can be utilized in many bioinformatics investigations, such as DNA sequence retrieval and classification. DNA2Vec is the most widely used method for DNA sequence embedding. However, it poorly scales to large data sets due to its extremely long training time in kmer embedding. In this paper, we propose a novel efficient graph-based kmer embedding method, named Kmer-Node2Vec, to tackle this concern. Our method converts the large DNA corpus into one kmer co-occurrence graph, and extracts kmer relation on the graph by random walks to learn fast and high-quality kmer embedding. Extensive experiments show that our method is faster than DNA2Vec by 29 times for training on a 4GB data set, and on par with DNA2Vec in terms of task-specific accuracy of sequence retrieval and classification.
Sprache
Englisch
Identifikatoren
eISSN: 2694-0604
DOI: 10.1109/EMBC40787.2023.10341090
Titel-ID: cdi_pubmed_primary_38083774

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX