Index zone

HTTPS and S3 links to genomic index files freely available in the AWS cloud

Home
Bowtie
HISAT
Kraken/Bracken
Centrifuge
SPUMONI

Kraken 2 & Bracken indexes

Kraken 2 is a fast and memory efficient tool for taxonomic assignment of metagenomics sequencing reads. Bracken is a related tool that additionally estimates relative abundances of species or genera. See the Kraken 2 manual for more information about the individual libraries and their relationship to public repositories like Refseq.

Kraken 2 / Bracken Refseq indexes

Starting Fall 2020, we began creating indexes for more combinations of RefSeq databases. All packages contain a Kraken 2 database along with Bracken databases built for 50, 75, 100, 150, 200, 250 and 300-mers. In some cases we used the --max-db-size option to cap the size of the database produced. This makes the index smaller at the expense of some sensitivity and accuracy. In all cases we use the defaults for k-mer length, minimizer length, and minimizer spacing.

Links in the “Inspect” column are to files containing the output of running kraken2-inspect on the index, giving a quick way of checking what genomes & taxa are represented.

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL S3 URL Inspect
Viral viral 5/17/2021 0.4 0.5 .tar.gz .tar.gz .txt
MinusB archaea, viral, plasmid, human1, UniVec_Core 5/17/2021 5.2 7.5 .tar.gz .tar.gz .txt
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 5/17/2021 38.6 50.1 .tar.gz .tar.gz .txt
Standard-8 Standard with DB capped at 8 GB 5/17/2021 5.5 7.5 .tar.gz .tar.gz .txt
Standard-16 Standard with DB capped at 16 GB 5/17/2021 11.2 14.9 .tar.gz .tar.gz .txt
PlusPF Standard plus protozoa & fungi 5/17/2021 41.0 53.2 .tar.gz .tar.gz .txt
PlusPF-8 PlusPF with DB capped at 8 GB 5/17/2021 5.5 7.5 .tar.gz .tar.gz .txt
PlusPF-16 PlusPF with DB capped at 16 GB 5/17/2021 11.2 14.9 .tar.gz .tar.gz .txt
PlusPFP 2 Standard plus protozoa, fungi & plant (not in this release2; see archive) N/A N/A N/A N/A N/A N/A
PlusPFP-8 PlusPFP with DB capped at 8 GB 5/17/2021 5.2 7.5 .tar.gz .tar.gz .txt
PlusPFP-16 PlusPFP with DB capped at 16 GB 5/17/2021 10.6 14.9 .tar.gz .tar.gz .txt
EuPathDB483 Eukaryotic pathogen genomes with contaminants removed 11/13/2020 26.4 34.1 .tar.gz .tar.gz .txt
  1. Human libraries are created with the --no-mask argument
  2. The PlusPFP database has become too large for the server we currently use to build indexes; we will try to include it in future releases but it is omitted from the 5/17/2021 release
  3. Index is built using sequences from EuPathDB project, with contamination removed using the method of Lu & Salzberg

Kraken 2 / Bracken 16s RNA indexes

All packages contain a Kraken 2 database along with Bracken databases built for 100mers, 150mers, and 200mers.

Collection Size (MB) HTTPS URL S3 URL
Greengenes 13.5 73.2 .tar.gz .tar.gz
RDP 11.5 168 .tar.gz .tar.gz
Silva 132 117 .tar.gz .tar.gz
Silva 138 112 .tar.gz .tar.gz

Older Kraken 2 / Bracken Refseq indexes

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL S3 URL Inspect
5/17/2021              
Viral viral 12/2/2020 0.4 0.4 [.tar.gz][k2_viral_20210515] .tar.gz .txt
MinusB archaea, viral, plasmid, human1, UniVec_Core 12/2/2020 5.1 7.4 .tar.gz .tar.gz .txt
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 12/2/2020 36.0 46.8 .tar.gz .tar.gz .txt
Standard-8 Standard with DB capped at 8 GB 12/2/2020 5.5 7.5 .tar.gz .tar.gz .txt
Standard-16 Standard with DB capped at 16 GB 12/2/2020 11.2 14.9 .tar.gz .tar.gz .txt
PlusPF Standard plus protozoa & fungi (fixed from 12/2/20 version4) 1/27/2021 38.4 49.8 .tar.gz .tar.gz .txt
PlusPF-8 PlusPF with DB capped at 8 GB (fixed from 12/2/20 version4) 1/27/2021 5.5 7.5 .tar.gz .tar.gz .txt
PlusPF-16 PlusPF with DB capped at 16 GB (fixed from 12/2/20 version4) 1/27/2021 11.2 14.9 .tar.gz .tar.gz .txt
PlusPFP Standard plus protozoa, fungi & plant (fixed from 12/2/20 version4) 1/27/2021 71.8 96.3 .tar.gz .tar.gz .txt
PlusPFP-8 PlusPFP with DB capped at 8 GB (fixed from 12/2/20 version4) 1/27/2021 5.2 7.5 .tar.gz .tar.gz .txt
PlusPFP-16 PlusPFP with DB capped at 16 GB (fixed from 12/2/20 version4) 1/27/2021 10.7 14.9 .tar.gz .tar.gz .txt
9/19/2020              
MinusB archaea, viral, plasmid, human1, UniVec_Core 9/19/2020 5.0 7.3 .tar.gz .tar.gz .txt
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 9/19/2020 36.0 47.0 .tar.gz .tar.gz .txt
Standard-8 Standard with DB capped at 8 GB 9/19/2020 5.5 7.4 .tar.gz .tar.gz .txt
Standard-16 Standard with DB capped at 16 GB 9/19/2020 11.2 14.9 .tar.gz .tar.gz .txt
PlusPF Standard plus protozoa & fungi 9/19/2020 37.0 48.0 .tar.gz .tar.gz .txt
PlusPF-8 PlusPF with DB capped at 8 GB 9/19/2020 5.5 7.4 .tar.gz .tar.gz .txt
PlusPF-16 PlusPF with DB capped at 16 GB 9/19/2020 11.2 14.9 .tar.gz .tar.gz .txt
PlusPFP Standard plus protozoa, fungi & plant 9/19/2020 66.5 90.0 .tar.gz .tar.gz .txt
PlusPFP-8 PlusPFP with DB capped at 8 GB 9/19/2020 5.3 7.4 .tar.gz .tar.gz .txt
PlusPFP-16 PlusPFP with DB capped at 16 GB 9/19/2020 10.7 14.9 .tar.gz .tar.gz .txt
  1. The PlusPF database (including PlusPF-8 and PlusPF-16), as well as the PlusPFP database (including PlusPFP-8 and PlusPFP-16) posted on 5/17/2021 mistakenly omitted genomes from Refseq “fungi”. We posted the fixed databases on 1/27 and 1/28/2021.

Older “Minikraken” indexes

The following table points to the “Minikraken” indexes we created initially. All packages contain a Kraken 2 database along with Bracken databases built for 100, 150, and 200-mers. Some also contain Bracken databases for 50, 75 and 250-mers.

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL S3 URL
Minikraken v1 Refseq: bacteria, archaea, viral 3/2020 5.6 8 .tar.gz .tar.gz
Minikraken v2 Refseq: bacteria, archaea, viral, human* 3/2020 5.5 8 .tar.gz .tar.gz

Kraken, Kraken 2, Bracken and KrakenUniq are the work of Derrick Wood, Steven Salzberg, Jennifer Lu, Florian Breitwieser, Daniel Baker, Martin Steinegger and Ben Langmead among others. Please see the Kraken, Kraken 2, KrakenUniq and Bracken websites for more information on the software, authors, and how to cite the work.