+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on LinkedInFollow on LinkedIn

+ Translate

RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

Bioinformatics 28(1): 125-126

With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode. Implemented in C++, the source code is freely available for download at the RAPSearch2 website: Available at the RAPSearch2 website.

(PDF emailed within 0-6 h: $19.90)

Accession: 055333338

Download citation: RISBibTeXText

PMID: 22039206

DOI: 10.1093/bioinformatics/btr595

Related references

PHOG-BLAST--a new generation tool for fast similarity search of protein families. Bmc Evolutionary Biology 6(): 51-51, 2006

RAPSearch: a fast protein similarity search tool for short reads. Bmc Bioinformatics 12(): 159-159, 2011

lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data. Bioinformatics 35(1): 20-27, 2018

LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly. Journal of Computational Biology 23(3): 137-149, 2016

EasyQC: Tool with Interactive User Interface for Efficient Next-Generation Sequencing Data Quality Control. Journal of Computational Biology, 2018

gSearch: a fast and flexible general search tool for whole-genome sequencing. Bioinformatics 28(16): 2176-2177, 2013

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data. International Journal of Molecular Sciences 18(10): -, 2017

Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biology 15(11): 509-509, 2015

PSI: indexing protein structures for fast similarity search. Bioinformatics 19 Suppl 1(): I81-I83, 2003

PSimScan: algorithm and utility for fast protein similarity search. Plos One 8(3): E58505-E58505, 2013

The DNA Data Deluge: Fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. IEEE Spectrum 50(7): 26-33, 2014

corseq: fast and efficient identification of favoured codons from next generation sequencing reads. Peerj 6: E5099-E5099, 2018

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads. Bioinformatics 32(21): 3215-3223, 2016

Fast structure similarity searches among protein models: efficient clustering of protein fragments. Algorithms for Molecular Biology 7(1): 16-16, 2012

Protein structure alignment and fast similarity search using local shape signatures. Journal of Bioinformatics and Computational Biology 2(1): 215-239, 2004