HTTPS and S3 links to genomic index files freely available in the AWS cloud
Home |
---|
Bowtie |
HISAT |
Kraken/Bracken |
Centrifuge |
SPUMONI |
SPUMONI 2 |
Centrifuge is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples.
Collection | Date | Size | HTTPS URL | S3 URL |
---|---|---|---|---|
Refseq: bacteria, archaea, viral, human (compressed) | December, 2016 | 5.4 GB | .tar.gz | .tar.gz |
Refseq: bacteria, archaea, viral, human | December, 2016 | 7.9 GB | .tar.gz | .tar.gz |
Refseq: bacteria, archaea (compressed) | April, 2018 | 6.2 GB | .tar.gz | .tar.gz |
NCBI: nucleotide non-redundant sequences | March, 2018 | 64 GB | .tar.gz | .tar.gz |
Centrifuge is the work of Daehwan Kim, Li Song, Florian Breitwieser, Chanhee Park, Steven Salzberg among others. Please see the Centrifuge website for more information on the software, authors, and how to cite it.
A team from Lawrence Livermore National Laboratory (LLNL) have constructed a Centrifuge database spanning all of the BLAST nt sequences. This is described in a recent manuscript. This database can be downloaded as a collection of 7zip archives. You will need to have the 7zip softare (i.e. the 7z
command) installed. Altogether, the compressed archives occupy 284G. These commands will download the archives:
curl https://genome-idx.s3.amazonaws.com/centrifuge/llnl/nt_wntr23/nt_wntr23_filt.cf.7z.[001-071] -O
Then you must decompress them with the command:
7z x nt_wntr23_filt.cf.7z.001
This index was constructed by Jose Manuel MartÃ, Car Reen Kok, James B. Thissen, Nisha J. Mulakken, Aram Avila-Herrera, Crystal J. Jaing, Jonathan E. Allen, and Nicholas A. Be at LLNL.