Databases
Note
Databases here requires Metabuli v1.2.0 or later. For older versions, please refer to the Old Database page.
Pre-built databases are provided for common use cases. All databases can be downloaded from here.
Summary
Note
Please refer Metabuli-Braken page for instructions on how to use Braken with Metabuli's databases.
| Database Name | Taxonomy | Size(GB) | Braken | Contents | Link |
|---|---|---|---|---|---|
gtdb226 |
GTDB | 378 | - | GTDB R226 genomes | Download |
refseq_standard |
NCBI | 111 | Yes | RefSeq archaea, bacteria, virus, plasmid, protozoa, fungi, and human | Download |
hrgm2 |
GTDB | 85 | Yes | Human Reference Gut Microbiome v2 (HRGM2) | Download |
hrom |
GTDB | 42 | Yes | Human Reference Oral Microbiome (HROM) | Download |
gtdb226
- Citation: GTDB R226 (Parks et al., 2026)
- Species representative genomes with checkm2 completeness > 90% and contamination < 5%.
- Includes 90,791 species out of 143,614 species in GTDB R226.
- Human genome (T2T-CHM13v2.0) and RefSeq Virus (2026-03-31) are added.
buildoptions:--space-mask 11101110111 --custom-metamer reduced_15_pattern.txt --syncmer 1 --smer-len 6- As many genomes are included, syncmers are used to reduce database size and improve classification speed.
refseq_standard
- Metabuli version of Kraken2's PlusPF database (2026-02-26 update)
- The same set of genomes as Kraken2's PlusPF database are used.
- RefSeq Complete Genome or Chromosome level assemblies: archaea, bacteria, virus, protozoa, fungi, and human
- RefSeq plasmids and UniVec_Core
- Difference from Kraken2's PlusPF:
- Sequences deprecated between 2026-02-26 and 2026-04-02 are excluded.
-
List of excluded sequences
- NC_002193
- NC_010021
- NC_018496
- NC_018497
- NC_024996
- NC_030892
- NZ_CM136992
- NZ_CM136993
- NZ_CM136994
- NZ_CM136995
- NZ_CP103377
- NZ_CP103378
- NZ_CP103379
- NZ_CP103380
- NZ_CP126834
- NZ_CP126841
- NZ_CP126850
- NZ_CP168307
- NZ_CP180736
- NZ_CP180737
- NZ_CP181249
- NZ_CP199310
- NZ_JADRXB020000004
- NZ_JADRXB020000007
- NZ_JAWQLS010000002
- NZ_JAWQLS010000003
- NZ_JAWQLS010000004
- NZ_JAWQLS010000005
- NZ_JBPJAM010000036
- NZ_JBRYHD010000002
- NZ_JBRYHD010000003
- NZ_JBRYHE010000002
- NZ_JBRYHE010000003
- NZ_JBRYHF010000002
- NZ_JBRYHF010000003
- NZ_JBRYHG010000002
- NZ_JBRYHG010000003
- NZ_JBTORD010000003
- NZ_JBTORD010000004
-
- 4,936 more plamids are included as RefSeq plasmid set is updated.
- Sequences deprecated between 2026-02-26 and 2026-04-02 are excluded.
- The same set of genomes as Kraken2's PlusPF database are used.
- Braken support: Braken database is bundled with Kraken2's PlusPF database.
buildoptions:--space-mask 11101110111 --custom-metamer reduced_15_pattern.txt --syncmer 1 --smer-len 6- As many genomes are included, syncmers are used to reduce database size and improve classification speed.
hrgm2
- Citation: Human Reference Gut Microbiome v2 (HRGM2).
- HRGM2 statistics:
- Only near-complete genomes (Completeness ≥ 90%, Contamination ≤ 5%, and GUNC CSS < 0.45)
- 155,211 genomes representing 4,824 species.
- Human genome (T2T-CHM13v2.0) and RefSeq Virus (2026-03-31) are added.
- Braken support:
- Download Braken database from HRGM2 page here.
- NOTE: The HRGM2 Braken databases only include prokaryotic genomes. Viral and eukaryotic portions of Braken results should be interpreted with caution.
buildoptions:--space-mask 11101110111 --custom-metamer reduced_15_pattern.txt
hrom
- Citation: Human Reference Oral Microbiome (HROM).
- HROM statistics:
- 72,641 high-quality genomes representing 3,426 species are used. (Completeness ≥ 90%, Contamination ≤ 5%, and GUNC CSS < 0.45)
- Human genome (T2T-CHM13v2.0) and RefSeq Virus (2026-03-31) are added.
- Braken support:
- Download Braken database from HROM page here.
- NOTE: The HROM Braken databases only include prokaryotic genomes. Viral and eukaryotic portions of Braken results should be interpreted with caution.
buildoptions:--space-mask 11101110111 --custom-metamer reduced_15_pattern.txt