HTTPS and S3 links to genomic index files freely available in the AWS cloud
Home |
---|
Bowtie |
HISAT |
Kraken/Bracken |
Centrifuge |
SPUMONI |
SPUMONI 2 |
Kraken 2 is a fast and memory efficient tool for taxonomic assignment of metagenomics sequencing reads. Bracken is a related tool that additionally estimates relative abundances of species or genera. See the Kraken 2 manual for more information about the individual libraries and their relationship to public repositories like Refseq. See also the Kraken protocol for advice on how to use it.
Starting Fall 2020, we began creating indexes for more combinations of RefSeq databases.
All packages contain a Kraken 2 database along with Bracken databases built for 50, 75, 100, 150, 200, 250 and 300-mers.
In some cases (i.e. for collections with “-8” or “-16” in the name) we used the --max-db-size
option to cap the size of the database produced.
This makes the index smaller at the expense of some sensitivity and accuracy.
In all cases we use the defaults for k-mer length, minimizer length, and minimizer spacing.
Starting May, 2023, we began including an index built over the entire nt Database, inclusive of GenBank, RefSeq, TPA and PDB. This is very large! (Hundreds of gigabytes.) We update it when we can; so far, this has been less frequent than our regular quarterly updates. The most recent was on May 30, 2024. We are working on improving the frequency of the nt updates.
Links in the Inspect column are to files containing the output of running kraken2-inspect
on the index, giving a quick way of checking what taxa are represented. Similarly, links in the Library column are to library_report.tsv
files that give a way to check what sequences were included. The library_report.tsv
file lists the sequence IDs from the library FASTA file as well as the URL they came from.
Collection | Contains | Date | Archive size (GB) | Index size (GB) | HTTPS URL | Inspect | Library | MD5 |
---|---|---|---|---|---|---|---|---|
Viral | Refeq viral | 6/5/2024 | 0.5 | 0.6 | .tar.gz | .txt | .tsv | .md5 |
MinusB | Refeq archaea, viral, plasmid, human1, UniVec_Core | 6/5/2024 | 7.1 | 10.2 | .tar.gz | .txt | .tsv | .md5 |
Standard | Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core | 6/5/2024 | 60 | 78 | .tar.gz | .txt | .tsv | .md5 |
Standard-8 | Standard with DB capped at 8 GB | 6/5/2024 | 5.5 | 7.5 | .tar.gz | .txt | .tsv | .md5 |
Standard-16 | Standard with DB capped at 16 GB | 6/5/2024 | 11 | 15 | .tar.gz | .txt | .tsv | .md5 |
PlusPF | Standard plus Refeq protozoa & fungi | 6/5/2024 | 64 | 83 | .tar.gz | .txt | .tsv | .md5 |
PlusPF-8 | PlusPF with DB capped at 8 GB | 6/5/2024 | 5.5 | 7.5 | .tar.gz | .txt | .tsv | .md5 |
PlusPF-16 | PlusPF with DB capped at 16 GB | 6/5/2024 | 11 | 15 | .tar.gz | .txt | .tsv | .md5 |
PlusPFP | Standard plus Refeq protozoa, fungi & plant | 6/5/2024 | 135 | 182 | .tar.gz | .txt | .tsv | .md5 |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 6/5/2024 | 5.1 | 7.5 | .tar.gz | .txt | .tsv | .md5 |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 6/5/2024 | 11 | 15 | .tar.gz | .txt | .tsv | .md5 |
EuPathDB462 | Eukaryotic pathogen genomes with contaminants removed | 4/18/2023 | 8.4 | 11 | .tar.gz | .txt | N/A | N/A |
nt Database | Very large collection, inclusive of GenBank, RefSeq, TPA and PDB | 5/30/2024 | 684 | 889 | .tar.gz | .txt | .tsv | .md5 |
--no-mask
argumentEach index includes an inspect.txt
file giving the output of the kraken-inspect
command.
As of March 2023, each index includes a ktaxonomy.tsv
file giving the output of the make_ktaxonomy.py
script from KrakenTools.
As of October 2023, each index also includes a library_report.tsv
file giving information about the sequences included in the library.
As of June 2024, each index also includes a .md5
file that lists checksums for files in the archive.
Corresponding S3 URLs can be obtained by removing https://genome-idx.s3.amazonaws.com
from the beginning of the URLs linked to above and replacing with s3://genome-idx
.
All packages contain a Kraken 2 database along with Bracken databases built for 100mers, 150mers, and 200mers.
Collection | Size (MB) | HTTPS URL |
---|---|---|
Greengenes 13.5 | 73.2 | .tar.gz |
RDP 11.5 | 168 | .tar.gz |
Silva 132 | 117 | .tar.gz |
Silva 138 | 112 | .tar.gz |
Corresponding S3 URLs can be obtained by removing https://genome-idx.s3.amazonaws.com
from the beginning of the URLs linked to above and replacing with s3://genome-idx
.
Download both the .tar.gz
and database.kdb
files to same directory, then expand the .tar.gz
file to obtain the full set of files needed for KrakenUniq and Bracken.
Collection | Contains | Date | Index size (GB) | HTTPS URL |
---|---|---|---|---|
Standard | archaea, bacteria, viral, human, UniVec_Core | 6/16/2022 | 377 | .kdb .tar.gz |
MicrobialDB | archaea, bacteria, viral, human, UniVec_Core, Eukaryotic pathogen genomes (EuPathDB54) with contaminants removed | 8/8/2023 | 535 | .kdb .tar.gz |
Collection | Contains | Date | Archive size (GB) | Index size (GB) | HTTPS URL | Inspect | Library |
---|---|---|---|---|---|---|---|
January, 2024 | |||||||
Viral | Refeq viral | 1/12/2024 | 0.5 | 0.6 | .tar.gz | .txt | .tsv |
MinusB | Refeq archaea, viral, plasmid, human1, UniVec_Core | 1/12/2024 | 6.7 | 9.7 | .tar.gz | .txt | .tsv |
Standard | Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core | 1/12/2024 | 55 | 72 | .tar.gz | .txt | .tsv |
Standard-8 | Standard with DB capped at 8 GB | 1/12/2024 | 5.5 | 7.5 | .tar.gz | .txt | .tsv |
Standard-16 | Standard with DB capped at 16 GB | 1/12/2024 | 11 | 15 | .tar.gz | .txt | .tsv |
PlusPF | Standard plus Refeq protozoa & fungi | 1/12/2024 | 59 | 77 | .tar.gz | .txt | .tsv |
PlusPF-8 | PlusPF with DB capped at 8 GB | 1/12/2024 | 5.5 | 7.5 | .tar.gz | .txt | .tsv |
PlusPF-16 | PlusPF with DB capped at 16 GB | 1/12/2024 | 11 | 15 | .tar.gz | .txt | .tsv |
PlusPFP | Standard plus Refeq protozoa, fungi & plant | 1/12/2024 | 126 | 171 | .tar.gz | .txt | .tsv |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 1/12/2024 | 5.1 | 7.5 | .tar.gz | .txt | .tsv |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 1/12/2024 | 11 | 15 | .tar.gz | .txt | .tsv |
November, 2023 | |||||||
nt Database | Very large collection, inclusive of GenBank, RefSeq, TPA and PDB | 11/29/2023 | 550 | 710 | .tar.gz | .txt | .tsv |
October, 2023 | |||||||
Viral | Refeq viral | 10/9/2023 | 0.5 | 0.6 | .tar.gz | .txt | .tsv |
MinusB | Refeq archaea, viral, plasmid, human1, UniVec_Core | 10/9/2023 | 6.6 | 9.5 | .tar.gz | .txt | .tsv |
Standard | Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core | 10/9/2023 | 53 | 70 | .tar.gz | .txt | .tsv |
Standard-8 | Standard with DB capped at 8 GB | 10/9/2023 | 5.5 | 7.5 | .tar.gz | .txt | .tsv |
Standard-16 | Standard with DB capped at 16 GB | 10/9/2023 | 11 | 15 | .tar.gz | .txt | .tsv |
PlusPF | Standard plus Refeq protozoa & fungi | 10/9/2023 | 57 | 74 | .tar.gz | .txt | .tsv |
PlusPF-8 | PlusPF with DB capped at 8 GB | 10/9/2023 | 5.5 | 7.5 | .tar.gz | .txt | .tsv |
PlusPF-16 | PlusPF with DB capped at 16 GB | 10/9/2023 | 11 | 15 | .tar.gz | .txt | .tsv |
PlusPFP (ISSUE: see below) | Standard plus Refeq protozoa, fungi & plant | 10/9/2023 | 124 | 148 | .tar.gz | .txt | .tsv |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 10/9/2023 | 5.1 | 7.5 | .tar.gz | .txt | .tsv |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 10/9/2023 | 11 | 15 | .tar.gz | .txt | .tsv |
June, 2023 | |||||||
Viral | Refeq viral | 6/5/2023 | 0.5 | 0.6 | .tar.gz | .txt | |
MinusB | Refeq archaea, viral, plasmid, human1, UniVec_Core | 6/5/2023 | 6.5 | 9.4 | .tar.gz | .txt | |
Standard | Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core | 6/5/2023 | 51 | 67 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 6/5/2023 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 6/5/2023 | 11 | 15 | .tar.gz | .txt | |
PlusPF | Standard plus Refeq protozoa & fungi | 6/5/2023 | 55 | 71 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 6/5/2023 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 6/5/2023 | 11 | 15 | .tar.gz | .txt | |
PlusPFP (ISSUE: see below) | Standard plus Refeq protozoa, fungi & plant | 6/5/2023 | 108 | 148 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 6/5/2023 | 5.1 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 6/5/2023 | 10 | 15 | .tar.gz | .txt | |
May, 2023 | |||||||
nt Database (ISSUE: see below) | Very large collection, inclusive of GenBank, RefSeq, TPA and PDB | 5/2/2023 | 360 | 480 | .tar.gz | .txt | |
March, 2023 | |||||||
Viral | Refeq viral | 3/14/2023 | 0.4 | 0.5 | .tar.gz | .txt | |
MinusB | Refeq archaea, viral, plasmid, human1, UniVec_Core | 3/14/2023 | 6.4 | 9.0* | .tar.gz | .txt | |
Standard | Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core | 3/14/2023 | 49 | 64 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 3/14/2023 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 3/14/2023 | 11 | 15 | .tar.gz | .txt | |
PlusPF | Standard plus Refeq protozoa & fungi | 3/14/2023 | 53 | 69 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 3/14/2023 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 3/14/2023 | 11 | 15 | .tar.gz | .txt | |
PlusPFP (ISSUE: see below) | Standard plus Refeq protozoa, fungi & plant | 3/14/2023 | 106 | 144 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 3/14/2023 | 5.1 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 3/14/2023 | 11 | 15 | .tar.gz | .txt | |
December, 2022 | |||||||
Viral | viral | 12/9/2022 | 0.4 | 0.5 | .tar.gz | .txt | |
MinusB | archaea, viral, plasmid, human1, UniVec_Core | 12/9/2022 | 6.1 | 8.7 | .tar.gz | .txt | |
Standard | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 12/9/2022 | 48 | 62 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 12/9/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 12/9/2022 | 11 | 15 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi | 12/9/2022 | 51 | 66 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 12/9/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 12/9/2022 | 11 | 15 | .tar.gz | .txt | |
PlusPFP (ISSUE: see below) | Standard plus protozoa, fungi & plant | 12/9/2022 | 104 | 142 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 12/9/2022 | 5.1 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 12/9/2022 | 11 | 15 | .tar.gz | .txt | |
September, 2022 | |||||||
Viral | viral | 9/8/2022 | 0.4 | 0.5 | .tar.gz | .txt | |
MinusB3 | archaea, viral, plasmid, human1, UniVec_Core | 9/26/2022 | 5.9 | 8.5 | .tar.gz | .txt | |
Standard3 | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 9/26/2022 | 46 | 60 | .tar.gz | .txt | |
Standard-83 | Standard with DB capped at 8 GB | 9/26/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-163 | Standard with DB capped at 16 GB | 9/26/2022 | 11 | 15 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi | 9/8/2022 | 49 | 64 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 9/8/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 9/8/2022 | 11 | 15 | .tar.gz | .txt | |
PlusPFP (ISSUE: see below) | Standard plus protozoa, fungi & plant | 9/8/2022 | 99 | 129 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 9/8/2022 | 5.1 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 9/8/2022 | 11 | 15 | .tar.gz | .txt | |
June, 2022 | |||||||
Viral | viral | 6/7/2022 | 0.4 | 0.5 | .tar.gz | .txt | |
MinusB | archaea, viral, plasmid, human1, UniVec_Core | 6/7/2022 | 5.8 | 8.2 | .tar.gz | .txt | |
Standard | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 6/7/2022 | 44 | 58 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 6/7/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 6/7/2022 | 12 | 15 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi | 6/7/2022 | 47 | 61 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 6/7/2022 | 5.2 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 6/7/2022 | 12 | 15 | .tar.gz | .txt | |
PlusPFP (ISSUE: see below) | Standard plus protozoa, fungi & plant | 6/7/2022 | 55 | 129 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 6/7/2022 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 6/7/2022 | 11 | 15 | .tar.gz | .txt | |
May, 2021 | |||||||
Viral | viral | 5/17/2021 | 0.4 | 0.5 | .tar.gz | .txt | |
MinusB | archaea, viral, plasmid, human1, UniVec_Core | 5/17/2021 | 5.2 | 7.5 | .tar.gz | .txt | |
Standard | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 5/17/2021 | 38.6 | 50.1 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 5/17/2021 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 5/17/2021 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi | 5/17/2021 | 41.0 | 53.2 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 5/17/2021 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 5/17/2021 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPFP | Standard plus protozoa, fungi & plant (not in this release2; see archive) | N/A | N/A | N/A | N/A | N/A | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 5/17/2021 | 5.2 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 5/17/2021 | 10.6 | 14.9 | .tar.gz | .txt | |
January, 2021 | |||||||
Viral | viral | 12/2/2020 | 0.4 | 0.4 | .tar.gz | .txt | |
MinusB | archaea, viral, plasmid, human1, UniVec_Core | 12/2/2020 | 5.1 | 7.4 | .tar.gz | .txt | |
Standard | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 12/2/2020 | 36.0 | 46.8 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 12/2/2020 | 5.5 | 7.5 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 12/2/2020 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi (fixed from 12/2/20 version3) | 1/27/2021 | 38.4 | 49.8 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB (fixed from 12/2/20 version3) | 1/27/2021 | 5.5 | 7.5 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB (fixed from 12/2/20 version3) | 1/27/2021 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPFP | Standard plus protozoa, fungi & plant (fixed from 12/2/20 version3) | 1/27/2021 | 71.8 | 96.3 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB (fixed from 12/2/20 version3) | 1/27/2021 | 5.2 | 7.5 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB (fixed from 12/2/20 version3) | 1/27/2021 | 10.7 | 14.9 | .tar.gz | .txt | |
September, 2020 | |||||||
MinusB | archaea, viral, plasmid, human1, UniVec_Core | 9/19/2020 | 5.0 | 7.3 | .tar.gz | .txt | |
Standard | archaea, bacteria, viral, plasmid, human1, UniVec_Core | 9/19/2020 | 36.0 | 47.0 | .tar.gz | .txt | |
Standard-8 | Standard with DB capped at 8 GB | 9/19/2020 | 5.5 | 7.4 | .tar.gz | .txt | |
Standard-16 | Standard with DB capped at 16 GB | 9/19/2020 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPF | Standard plus protozoa & fungi | 9/19/2020 | 37.0 | 48.0 | .tar.gz | .txt | |
PlusPF-8 | PlusPF with DB capped at 8 GB | 9/19/2020 | 5.5 | 7.4 | .tar.gz | .txt | |
PlusPF-16 | PlusPF with DB capped at 16 GB | 9/19/2020 | 11.2 | 14.9 | .tar.gz | .txt | |
PlusPFP | Standard plus protozoa, fungi & plant | 9/19/2020 | 66.5 | 90.0 | .tar.gz | .txt | |
PlusPFP-8 | PlusPFP with DB capped at 8 GB | 9/19/2020 | 5.3 | 7.4 | .tar.gz | .txt | |
PlusPFP-16 | PlusPFP with DB capped at 16 GB | 9/19/2020 | 10.7 | 14.9 | .tar.gz | .txt |
The following table points to the “Minikraken” indexes we created initially. All packages contain a Kraken 2 database along with Bracken databases built for 100, 150, and 200-mers. Some also contain Bracken databases for 50, 75 and 250-mers.
Collection | Contains | Date | Archive size (GB) | Index size (GB) | HTTPS URL |
---|---|---|---|---|---|
Minikraken v1 | Refseq: bacteria, archaea, viral | 3/2020 | 5.6 | 8 | .tar.gz |
Minikraken v2 | Refseq: bacteria, archaea, viral, human* | 3/2020 | 5.5 | 8 | .tar.gz |
Kraken, Kraken 2, Bracken and KrakenUniq are the work of Derrick Wood, Steven Salzberg, Jennifer Lu, Florian Breitwieser, Christopher Pockrandt, Aleksey Zimin, Daniel Baker, Martin Steinegger and Ben Langmead among others. Please see the Kraken, Kraken 2, KrakenUniq and Bracken websites for more information on the software, authors, and how to cite the work.