Index zone

HTTPS and S3 links to genomic index files freely available in the AWS cloud

Home
Bowtie
HISAT
Kraken/Bracken
Centrifuge
SPUMONI
SPUMONI 2

Kraken 2, KrakenUniq and Bracken indexes

Kraken 2 is a fast and memory efficient tool for taxonomic assignment of metagenomics sequencing reads. Bracken is a related tool that additionally estimates relative abundances of species or genera. See the Kraken 2 manual for more information about the individual libraries and their relationship to public repositories like Refseq. See also the Kraken protocol for advice on how to use it.

Kraken 2 / Bracken Refseq indexes

Latest: September 2024 update

All packages contain a Kraken 2 database along with Bracken databases built for 50, 75, 100, 150, 200, 250 and 300-mers. In some cases (i.e. for collections with “-8” or “-16” in the name) we used the --max-db-size option to cap the size of the database produced. This makes the index smaller at the expense of some sensitivity and accuracy. In all cases we use the defaults for k-mer length, minimizer length, and minimizer spacing.

Starting September, 2024, we switched our nt index to use the new “core_nt” database. This is larger than the Refseq databases and we update it when we can; so far, this has been less frequent than our regular quarterly updates. We are working on improving the frequency of the core_nt updates.

Starting December, 2024, we added a new index for the latest GTDB release, which spans bacteria and archaea.

Links in the Inspect column are to files containing the output of running kraken2-inspect on the index, giving a quick way of checking what taxa are represented. Similarly, links in the Library column are to library_report.tsv files that give a way to check what sequences were included. The library_report.tsv file lists the sequence IDs from the library FASTA file as well as the URL they came from.

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL Inspect Library MD5
Viral Refeq viral 9/4/2024 0.5 0.6 .tar.gz .txt .tsv .md5
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 9/4/2024 7.3 10.3 .tar.gz .txt .tsv .md5
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 9/4/2024 62 80 .tar.gz .txt .tsv .md5
Standard-8 Standard with DB capped at 8 GB 9/4/2024 5.5 7.5 .tar.gz .txt .tsv .md5
Standard-16 Standard with DB capped at 16 GB 9/4/2024 11 15 .tar.gz .txt .tsv .md5
PlusPF Standard plus Refeq protozoa & fungi 9/4/2024 66 86 .tar.gz .txt .tsv .md5
PlusPF-8 PlusPF with DB capped at 8 GB 9/4/2024 5.5 7.5 .tar.gz .txt .tsv .md5
PlusPF-16 PlusPF with DB capped at 16 GB 9/4/2024 11 15 .tar.gz .txt .tsv .md5
PlusPFP Standard plus Refeq protozoa, fungi & plant 9/4/2024 138 188 .tar.gz .txt .tsv .md5
PlusPFP-8 PlusPFP with DB capped at 8 GB 9/4/2024 5.2 7.5 .tar.gz .txt .tsv .md5
PlusPFP-16 PlusPFP with DB capped at 16 GB 9/4/2024 10 15 .tar.gz .txt .tsv .md5
EuPathDB462 Eukaryotic pathogen genomes with contaminants removed 4/18/2023 8.4 11 .tar.gz .txt N/A N/A
core_nt Database Very large collection, inclusive of GenBank, RefSeq, TPA and PDB 9/4/2024 181.8 233.3 .tar.gz .txt .tsv .md5
GTDB v220 (genomic_files_reps) Bacterial and archaeal 12/13/2024 387 497 .tar.gz .txt .tsv .md5
  1. Human libraries are created with the --no-mask argument
  2. Index is built using sequences from EuPathDB project, with contamination removed using the method of Lu & Salzberg

Each index includes an inspect.txt file giving the output of the kraken-inspect command.

As of March 2023, each index includes a ktaxonomy.tsv file giving the output of the make_ktaxonomy.py script from KrakenTools.

As of October 2023, each index also includes a library_report.tsv file giving information about the sequences included in the library.

As of June 2024, each index also includes a .md5 file that lists checksums for files in the archive.

As of September, 2024, indexes also include names.dmp and nodes.dmp files to the index archives so as to preserve information about the taxonomy used.

Corresponding S3 URLs can be obtained by removing https://genome-idx.s3.amazonaws.com from the beginning of the URLs linked to above and replacing with s3://genome-idx.

Kraken 2 / Bracken 16s RNA indexes

All packages contain a Kraken 2 database along with Bracken databases built for 100mers, 150mers, and 200mers.

Collection Size (MB) HTTPS URL
Greengenes 13.5 73.2 .tar.gz
RDP 11.5 168 .tar.gz
Silva 132 117 .tar.gz
Silva 138 112 .tar.gz

Corresponding S3 URLs can be obtained by removing https://genome-idx.s3.amazonaws.com from the beginning of the URLs linked to above and replacing with s3://genome-idx.

KrakenUniq indexes

Download both the .tar.gz and database.kdb files to same directory, then expand the .tar.gz file to obtain the full set of files needed for KrakenUniq and Bracken.

Collection Contains Date Index size (GB) HTTPS URL
Standard archaea, bacteria, viral, human, UniVec_Core 6/16/2022 377 .kdb .tar.gz
MicrobialDB archaea, bacteria, viral, human, UniVec_Core, Eukaryotic pathogen genomes (EuPathDB54) with contaminants removed 8/8/2023 535 .kdb .tar.gz

Older Kraken 2 / Bracken Refseq indexes

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL Inspect Library MD5
June, 2024                
Viral Refeq viral 6/5/2024 0.5 0.6 .tar.gz .txt .tsv .md5
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 6/5/2024 7.1 10.2 .tar.gz .txt .tsv .md5
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 6/5/2024 60 78 .tar.gz .txt .tsv .md5
Standard-8 Standard with DB capped at 8 GB 6/5/2024 5.5 7.5 .tar.gz .txt .tsv .md5
Standard-16 Standard with DB capped at 16 GB 6/5/2024 11 15 .tar.gz .txt .tsv .md5
PlusPF Standard plus Refeq protozoa & fungi 6/5/2024 64 83 .tar.gz .txt .tsv .md5
PlusPF-8 PlusPF with DB capped at 8 GB 6/5/2024 5.5 7.5 .tar.gz .txt .tsv .md5
PlusPF-16 PlusPF with DB capped at 16 GB 6/5/2024 11 15 .tar.gz .txt .tsv .md5
PlusPFP Standard plus Refeq protozoa, fungi & plant 6/5/2024 135 182 .tar.gz .txt .tsv .md5
PlusPFP-8 PlusPFP with DB capped at 8 GB 6/5/2024 5.1 7.5 .tar.gz .txt .tsv .md5
PlusPFP-16 PlusPFP with DB capped at 16 GB 6/5/2024 11 15 .tar.gz .txt .tsv .md5
May, 2024                
nt Database Very large collection, inclusive of GenBank, RefSeq, TPA and PDB 5/30/2024 684 889 .tar.gz .txt .tsv .md5
January, 2024                
Viral Refeq viral 1/12/2024 0.5 0.6 .tar.gz .txt .tsv  
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 1/12/2024 6.7 9.7 .tar.gz .txt .tsv  
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 1/12/2024 55 72 .tar.gz .txt .tsv  
Standard-8 Standard with DB capped at 8 GB 1/12/2024 5.5 7.5 .tar.gz .txt .tsv  
Standard-16 Standard with DB capped at 16 GB 1/12/2024 11 15 .tar.gz .txt .tsv  
PlusPF Standard plus Refeq protozoa & fungi 1/12/2024 59 77 .tar.gz .txt .tsv  
PlusPF-8 PlusPF with DB capped at 8 GB 1/12/2024 5.5 7.5 .tar.gz .txt .tsv  
PlusPF-16 PlusPF with DB capped at 16 GB 1/12/2024 11 15 .tar.gz .txt .tsv  
PlusPFP Standard plus Refeq protozoa, fungi & plant 1/12/2024 126 171 .tar.gz .txt .tsv  
PlusPFP-8 PlusPFP with DB capped at 8 GB 1/12/2024 5.1 7.5 .tar.gz .txt .tsv  
PlusPFP-16 PlusPFP with DB capped at 16 GB 1/12/2024 11 15 .tar.gz .txt .tsv  
November, 2023                
nt Database Very large collection, inclusive of GenBank, RefSeq, TPA and PDB 11/29/2023 550 710 .tar.gz .txt .tsv  
October, 2023                
Viral Refeq viral 10/9/2023 0.5 0.6 .tar.gz .txt .tsv  
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 10/9/2023 6.6 9.5 .tar.gz .txt .tsv  
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 10/9/2023 53 70 .tar.gz .txt .tsv  
Standard-8 Standard with DB capped at 8 GB 10/9/2023 5.5 7.5 .tar.gz .txt .tsv  
Standard-16 Standard with DB capped at 16 GB 10/9/2023 11 15 .tar.gz .txt .tsv  
PlusPF Standard plus Refeq protozoa & fungi 10/9/2023 57 74 .tar.gz .txt .tsv  
PlusPF-8 PlusPF with DB capped at 8 GB 10/9/2023 5.5 7.5 .tar.gz .txt .tsv  
PlusPF-16 PlusPF with DB capped at 16 GB 10/9/2023 11 15 .tar.gz .txt .tsv  
PlusPFP (ISSUE: see below) Standard plus Refeq protozoa, fungi & plant 10/9/2023 124 148 .tar.gz .txt .tsv  
PlusPFP-8 PlusPFP with DB capped at 8 GB 10/9/2023 5.1 7.5 .tar.gz .txt .tsv  
PlusPFP-16 PlusPFP with DB capped at 16 GB 10/9/2023 11 15 .tar.gz .txt .tsv  
June, 2023                
Viral Refeq viral 6/5/2023 0.5 0.6 .tar.gz .txt    
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 6/5/2023 6.5 9.4 .tar.gz .txt    
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 6/5/2023 51 67 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 6/5/2023 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 6/5/2023 11 15 .tar.gz .txt    
PlusPF Standard plus Refeq protozoa & fungi 6/5/2023 55 71 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 6/5/2023 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 6/5/2023 11 15 .tar.gz .txt    
PlusPFP (ISSUE: see below) Standard plus Refeq protozoa, fungi & plant 6/5/2023 108 148 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 6/5/2023 5.1 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 6/5/2023 10 15 .tar.gz .txt    
May, 2023                
nt Database (ISSUE: see below) Very large collection, inclusive of GenBank, RefSeq, TPA and PDB 5/2/2023 360 480 .tar.gz .txt    
March, 2023                
Viral Refeq viral 3/14/2023 0.4 0.5 .tar.gz .txt    
MinusB Refeq archaea, viral, plasmid, human1, UniVec_Core 3/14/2023 6.4 9.0* .tar.gz .txt    
Standard Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core 3/14/2023 49 64 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 3/14/2023 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 3/14/2023 11 15 .tar.gz .txt    
PlusPF Standard plus Refeq protozoa & fungi 3/14/2023 53 69 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 3/14/2023 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 3/14/2023 11 15 .tar.gz .txt    
PlusPFP (ISSUE: see below) Standard plus Refeq protozoa, fungi & plant 3/14/2023 106 144 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 3/14/2023 5.1 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 3/14/2023 11 15 .tar.gz .txt    
December, 2022                
Viral viral 12/9/2022 0.4 0.5 .tar.gz .txt    
MinusB archaea, viral, plasmid, human1, UniVec_Core 12/9/2022 6.1 8.7 .tar.gz .txt    
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 12/9/2022 48 62 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 12/9/2022 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 12/9/2022 11 15 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi 12/9/2022 51 66 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 12/9/2022 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 12/9/2022 11 15 .tar.gz .txt    
PlusPFP (ISSUE: see below) Standard plus protozoa, fungi & plant 12/9/2022 104 142 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 12/9/2022 5.1 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 12/9/2022 11 15 .tar.gz .txt    
September, 2022                
Viral viral 9/8/2022 0.4 0.5 .tar.gz .txt    
MinusB3 archaea, viral, plasmid, human1, UniVec_Core 9/26/2022 5.9 8.5 .tar.gz .txt    
Standard3 archaea, bacteria, viral, plasmid, human1, UniVec_Core 9/26/2022 46 60 .tar.gz .txt    
Standard-83 Standard with DB capped at 8 GB 9/26/2022 5.5 7.5 .tar.gz .txt    
Standard-163 Standard with DB capped at 16 GB 9/26/2022 11 15 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi 9/8/2022 49 64 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 9/8/2022 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 9/8/2022 11 15 .tar.gz .txt    
PlusPFP (ISSUE: see below) Standard plus protozoa, fungi & plant 9/8/2022 99 129 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 9/8/2022 5.1 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 9/8/2022 11 15 .tar.gz .txt    
June, 2022                
Viral viral 6/7/2022 0.4 0.5 .tar.gz .txt    
MinusB archaea, viral, plasmid, human1, UniVec_Core 6/7/2022 5.8 8.2 .tar.gz .txt    
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 6/7/2022 44 58 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 6/7/2022 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 6/7/2022 12 15 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi 6/7/2022 47 61 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 6/7/2022 5.2 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 6/7/2022 12 15 .tar.gz .txt    
PlusPFP (ISSUE: see below) Standard plus protozoa, fungi & plant 6/7/2022 55 129 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 6/7/2022 5.5 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 6/7/2022 11 15 .tar.gz .txt    
May, 2021                
Viral viral 5/17/2021 0.4 0.5 .tar.gz .txt    
MinusB archaea, viral, plasmid, human1, UniVec_Core 5/17/2021 5.2 7.5 .tar.gz .txt    
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 5/17/2021 38.6 50.1 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 5/17/2021 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 5/17/2021 11.2 14.9 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi 5/17/2021 41.0 53.2 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 5/17/2021 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 5/17/2021 11.2 14.9 .tar.gz .txt    
PlusPFP Standard plus protozoa, fungi & plant (not in this release2; see archive) N/A N/A N/A N/A N/A    
PlusPFP-8 PlusPFP with DB capped at 8 GB 5/17/2021 5.2 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 5/17/2021 10.6 14.9 .tar.gz .txt    
January, 2021                
Viral viral 12/2/2020 0.4 0.4 .tar.gz .txt    
MinusB archaea, viral, plasmid, human1, UniVec_Core 12/2/2020 5.1 7.4 .tar.gz .txt    
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 12/2/2020 36.0 46.8 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 12/2/2020 5.5 7.5 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 12/2/2020 11.2 14.9 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi (fixed from 12/2/20 version3) 1/27/2021 38.4 49.8 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB (fixed from 12/2/20 version3) 1/27/2021 5.5 7.5 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB (fixed from 12/2/20 version3) 1/27/2021 11.2 14.9 .tar.gz .txt    
PlusPFP Standard plus protozoa, fungi & plant (fixed from 12/2/20 version3) 1/27/2021 71.8 96.3 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB (fixed from 12/2/20 version3) 1/27/2021 5.2 7.5 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB (fixed from 12/2/20 version3) 1/27/2021 10.7 14.9 .tar.gz .txt    
September, 2020                
MinusB archaea, viral, plasmid, human1, UniVec_Core 9/19/2020 5.0 7.3 .tar.gz .txt    
Standard archaea, bacteria, viral, plasmid, human1, UniVec_Core 9/19/2020 36.0 47.0 .tar.gz .txt    
Standard-8 Standard with DB capped at 8 GB 9/19/2020 5.5 7.4 .tar.gz .txt    
Standard-16 Standard with DB capped at 16 GB 9/19/2020 11.2 14.9 .tar.gz .txt    
PlusPF Standard plus protozoa & fungi 9/19/2020 37.0 48.0 .tar.gz .txt    
PlusPF-8 PlusPF with DB capped at 8 GB 9/19/2020 5.5 7.4 .tar.gz .txt    
PlusPF-16 PlusPF with DB capped at 16 GB 9/19/2020 11.2 14.9 .tar.gz .txt    
PlusPFP Standard plus protozoa, fungi & plant 9/19/2020 66.5 90.0 .tar.gz .txt    
PlusPFP-8 PlusPFP with DB capped at 8 GB 9/19/2020 5.3 7.4 .tar.gz .txt    
PlusPFP-16 PlusPFP with DB capped at 16 GB 9/19/2020 10.7 14.9 .tar.gz .txt    
  1. The PlusPF database (including PlusPF-8 and PlusPF-16), as well as the PlusPFP database (including PlusPFP-8 and PlusPFP-16) posted on 5/17/2021 mistakenly omitted genomes from Refseq “fungi”. We posted the fixed databases on 1/27 and 1/28/2021.
  2. The full PFP index releases, from June 2022–October 2023, as well as all the NT index releases were distributed with truncated Bracken database files. The truncated files were generally in the 10s or 100s of KB in size, significantly smaller than their non-truncated versions. Apologies for this error, which went unnoticed for a while. This was resolved as of the January 2024 release of PFP and will be resolved in the upcoming release of NT.

Older “Minikraken” indexes

The following table points to the “Minikraken” indexes we created initially. All packages contain a Kraken 2 database along with Bracken databases built for 100, 150, and 200-mers. Some also contain Bracken databases for 50, 75 and 250-mers.

Collection Contains Date Archive size (GB) Index size (GB) HTTPS URL
Minikraken v1 Refseq: bacteria, archaea, viral 3/2020 5.6 8 .tar.gz
Minikraken v2 Refseq: bacteria, archaea, viral, human* 3/2020 5.5 8 .tar.gz

Kraken, Kraken 2, Bracken and KrakenUniq are the work of Derrick Wood, Steven Salzberg, Jennifer Lu, Florian Breitwieser, Christopher Pockrandt, Aleksey Zimin, Daniel Baker, Martin Steinegger and Ben Langmead among others. Please see the Kraken, Kraken 2, KrakenUniq and Bracken websites for more information on the software, authors, and how to cite the work.