Example
This example demonstrates how to classify RNA-seq reads from a COVID-19 patient using the RefSeq Virus database. The whole process takes less than 10 minutes on a personal machine.
Step 1 — Download the RefSeq Virus Database (8.1 GiB)
Link: https://steineggerlab.s3.amazonaws.com/metabuli/archive/refseq_virus.tar.gz
Step 2 — Download RNA-seq reads (SRR14484345)
Tip
Download SRA Toolkit containing fasterq-dump from https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit.
Step 3 — Classify reads
metabuli classify \
SRR14484345_1.fq SRR14484345_2.fq \
OUTDIR/refseq_virus \
RESULT_DIR JOB_ID \
--max-ram RAM_SIZE
Replace RAM_SIZE with the amount of RAM (in GiB) available on your machine.
Step 4 — Inspect the report
Open RESULT_DIR/JOB_ID_report.tsv and look for a section like the following:
92.2331 510490 442 species 694009 Severe acute respiratory syndrome-related coronavirus
92.1433 509993 509993 no rank 2697049 Severe acute respiratory syndrome coronavirus 2
This shows that ~92% of reads were classified to SARS-CoV-2.
Next Steps
- Use
refine-resultsto filter or adjust the classification results. - Use
extractto pull out reads classified to a specific taxon. - Browse the interactive taxonomy chart in
RESULT_DIR/JOB_ID_krona.html.