Skip to content

sysbio-vo/pannotator

Repository files navigation

Pannotator Logo

Pannotator: prokaryotic genome annotation at scale

Pannotator is a scalable and robust pangenome-based prokaryotic genome annotation tool, designed to efficiently process hundreds of genomes. It is built upon Bakta to reliably annotate protein-coding and ncRNA genes, while leveraging the workflow scalability and reproducibility of Nextflow.

Description

  • Pannotator orchestrates Bakta annotation steps in a modular Nextflow pipeline. It supports the annotation of ncRNA cis-regulatory regions, oriC/oriV/oriT, assembly gaps, as well as tRNA, tmRNA, rRNA, ncRNA genes, CRISPR, CDS and pseudogenes via Bakta.
  • To minimise redundant computation, Pannotator clusters CDS features across genomes and annotates only representative sequences from each cluster, propagating annotations back to cluster members.

Installation

Prerequisites:

Examples

Annotate a folder of isolate samples with a full pipeline. During the first run, the pipeline searches for a Bakta database in the working directory and, if none is found, downloads the light Bakta database by default:

nextflow run main.nf --indir /path/to/folder/with/isolates/ -profile local

Change the output directory with the --outdir parameter.

nextflow run main.nf --indir /path/to/folder/with/isolates/ -profile local --outdir test_run

For a richer output, save intermediate files with --save_intermediate

nextflow run main.nf --indir /path/to/folder/with/isolates/ -profile local --outdir test_run --save_intermediate

If you already have a Bakta database downloaded, pass it as a parameter. By default, the database is assumed to be of type light. Make sure to indicate the correct type if needed. This is required to run the annotation steps that rely on the full database, such as pseudogene search.

nextflow run main.nf --indir /path/to/folder/with/isolates/ -profile local --outdir test_run --save_intermediate --bakta_db /path/to/full/Bakta/db/ --bakta_db_type full

Select among other execution profiles.

  • standard (default)
  • docker
  • singularity
  • conda

For more information regarding the profiles, please refer to the base config by PaM.

About

Fast prokaryotic genomes annotation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •