Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 26 von 97

Details

Autor(en) / Beteiligte
Titel
Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files
Ist Teil von
  • Briefings in bioinformatics, 2021-07, Vol.22 (4)
Ort / Verlag
England: Oxford University Press
Erscheinungsjahr
2021
Quelle
EBSCOhost Business Source Ultimate
Beschreibungen/Notizen
  • Abstract FASTA and FASTQ are the most widely used biological data formats that have become the de facto standard to exchange sequence data between bioinformatics tools. With the avalanche of next-generation sequencing data, the amount of sequence data being deposited and accessed in FASTA/Q formats is increasing dramatically. However, the existing tools have very low efficiency at random retrieval of subsequences due to the requirement of loading the entire index into memory. In addition, most existing tools have no capability to build index for large FASTA/Q files because of the limited memory. Furthermore, the tools do not provide support to randomly accessing sequences from FASTA/Q files compressed by gzip, which is extensively adopted by most public databases to compress data for saving storage. In this study, we developed pyfastx as a versatile Python package with commonly used command-line tools to overcome the above limitations. Compared to other tools, pyfastx yielded the highest performance in terms of building index and random access to sequences, particularly when dealing with large FASTA/Q files with hundreds of millions of sequences. A key advantage of pyfastx over other tools is that it offers an efficient way to randomly extract subsequences directly from gzip compressed FASTA/Q files without needing to uncompress beforehand. Pyfastx can easily be installed from PyPI (https://pypi.org/project/pyfastx) and the source code is freely available at https://github.com/lmdu/pyfastx.
Sprache
Englisch
Identifikatoren
ISSN: 1467-5463
eISSN: 1477-4054
DOI: 10.1093/bib/bbaa368
Titel-ID: cdi_proquest_miscellaneous_2471537418

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX