Index zone

HTTPS and S3 links to genomic index files freely available in the AWS cloud

Home
Bowtie
HISAT
Kraken/Bracken
Centrifuge
SPUMONI
SPUMONI 2

Centrifuge indexes

Centrifuge is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples.

Collection Date Size HTTPS URL S3 URL
Refseq: bacteria, archaea, viral, human (compressed) December, 2016 5.4 GB .tar.gz .tar.gz
Refseq: bacteria, archaea, viral, human December, 2016 7.9 GB .tar.gz .tar.gz
Refseq: bacteria, archaea (compressed) April, 2018 6.2 GB .tar.gz .tar.gz
NCBI: nucleotide non-redundant sequences March, 2018 64 GB .tar.gz .tar.gz

Centrifuge is the work of Daehwan Kim, Li Song, Florian Breitwieser, Chanhee Park, Steven Salzberg among others. Please see the Centrifuge website for more information on the software, authors, and how to cite it.

nt Database from Lawrence Livermore National Laboratory

A team from Lawrence Livermore National Laboratory (LLNL) have constructed a Centrifuge database spanning all of the BLAST nt sequences. This is described in a recent manuscript. This database can be downloaded as a collection of 7zip archives. You will need to have the 7zip softare (i.e. the 7z command) installed. Altogether, the compressed archives occupy 284G. These commands will download the archives:

curl https://genome-idx.s3.amazonaws.com/centrifuge/llnl/nt_wntr23/nt_wntr23_filt.cf.7z.[001-071] -O

Then you must decompress them with the command:

7z x nt_wntr23_filt.cf.7z.001

This index was constructed by Jose Manuel Martí, Car Reen Kok, James B. Thissen, Nisha J. Mulakken, Aram Avila-Herrera, Crystal J. Jaing, Jonathan E. Allen, and Nicholas A. Be at LLNL.