GitHub - nf-core/seqsubmit: nf-core pipeline for data submission to ENA

Introduction

nf-core/seqsubmit is a Nextflow pipeline for submitting sequence data to ENA. Currently, the pipeline supports three submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:

mags for Metagenome Assembled Genomes (MAGs) submission with GENOMESUBMIT workflow
bins for bins submission with GENOMESUBMIT workflow
metagenomic_assemblies for assembly submission with ASSEMBLYSUBMIT workflow

Requirements

Nextflow >=25.04.0
Webin account registered at https://www.ebi.ac.uk/ena/submit/webin/login
Raw reads used to assemble contigs submitted to INSDC and associated accessions available

Setup your environment secrets before running the pipeline:

nextflow secrets set WEBIN_ACCOUNT "Webin-XXX"

nextflow secrets set WEBIN_PASSWORD "XXX"

Make sure you update commands above with your authorised credentials.

Input samplesheets

`mags` and `bins` modes (`GENOMESUBMIT`)

The input must follow assets/schema_input_genome.json.

Required columns:

sample
fasta (must end with .fa.gz or .fasta.gz)
accession
assembly_software
binning_software
binning_parameters
stats_generation_software
metagenome
environmental_medium
broad_environment
local_environment
co-assembly

Columns that required for now, but will be optional in the nearest future:

completeness
contamination
genome_coverage
rRNA_presence
NCBI_lineage

Those fields are metadata required for genome_uploader package. They are described in docs.

Example samplesheet_genome.csv:

sample,fasta,accession,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,rRNA_presence,NCBI_lineage
lachnospira_eligens,data/bin_lachnospira_eligens.fa.gz,SRR24458089,spades_v3.15.5,metabat2_v2.6,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,false,marine,cable_bacteria,marine_sediment,false,d__Bacteria;p__Proteobacteria;s_unclassified_Proteobacteria

`metagenomic_assemblies` mode (`ASSEMBLYSUBMIT`)

The input must follow assets/schema_input_assembly.json.

Required columns:

sample
fasta (must end with .fa.gz or .fasta.gz)
run_accession
assembler
assembler_version

At least one of the following must be provided per row:

reads (fastq_1, optional fastq_2 for paired-end)
coverage

If coverage is missing and reads are provided, the workflow calculates average coverage with coverm.

Example samplesheet_assembly.csv:

sample,fasta,fastq_1,fastq_2,coverage,run_accession,assembler,assembler_version
assembly_1,data/contigs_1.fasta.gz,data/reads_1.fastq.gz,data/reads_2.fastq.gz,,ERR011322,SPAdes,3.15.5
assembly_2,data/contigs_2.fasta.gz,,,42.7,ERR011323,MEGAHIT,1.2.9

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Required parameters:

Parameter	Description
`--mode`	Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies]`
`--input`	Path to the samplesheet describing the data to be submitted
`--outdir`	Path to the output directory for pipeline results
`--submission_study`	ENA study accession (PRJ/ERP) to submit the data to
`--centre_name`	Name of the submitter's organisation

Optional parameters:

Parameter	Description
`--upload_tpa`	Flag to control the type of assembly study (third party assembly or not). Default: false
`--test_upload`	Upload to TEST ENA server instead of LIVE. Default: false
`--webincli_submit`	If set to false, submissions will be validated, but not submitted. Default: true

General command template:

nextflow run nf-core/seqsubmit \
   -profile <docker/singularity/...> \
   --mode <mags|bins|metagenomic_assemblies> \
   --input <samplesheet.csv> \
   --centre_name <your_centre> \
   --submission_study <your_study> \
   --outdir <outdir>

Validation run (submission to the ENA TEST server) in mags mode:

nextflow run nf-core/seqsubmit \
   -profile docker \
   --mode mags \
   --input assets/samplesheet_genomes.csv \
   --submission_study <your_study> \
   --centre_name TEST_CENTER \
   --webincli_submit true \
   --test_upload true \
   --outdir results/validate_mags

Validation run (submission to the ENA TEST server) in metagenomic_assemblies mode:

nextflow run nf-core/seqsubmit \
   -profile docker \
   --mode metagenomic_assemblies \
   --input assets/samplesheet_assembly.csv \
   --submission_study <your_study> \
   --centre_name TEST_CENTER \
   --webincli_submit true \
   --test_upload true \
   --outdir results/validate_assemblies

Live submission example:

nextflow run nf-core/seqsubmit \
   -profile docker \
   --mode metagenomic_assemblies \
   --input assets/samplesheet_assembly.csv \
   --submission_study PRJEB98843 \
   --test_upload false \
   --webincli_submit true \
   --outdir results/live_assembly

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

Key output locations in --outdir:

upload/manifests/: generated manifest files for submission
upload/webin_cli/: ENA Webin CLI reports
multiqc/: MultiQC summary report
pipeline_info/: execution reports, trace, DAG, and software versions

For full details, see the output documentation.

Credits

nf-core/seqsubmit was originally written by Martin Beracochea, Ekaterina Sakharova, Sofiia Ochkalova, Evangelos Karatzas.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #seqsubmit channel (you can join with this invite).

Citations

If you use this pipeline please make sure to cite all used software. This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

MGnify: the microbiome sequence data analysis resource in 2023

Richardson L, Allen B, Baldi G, Beracochea M, Bileschi ML, Burdett T, et al.

Vol. 51, Nucleic Acids Research. Oxford University Press (OUP); 2022. p. D753–9. Available from: http://dx.doi.org/10.1093/nar/gkac1080

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
tests		tests
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
ro-crate-metadata.json		ro-crate-metadata.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements

Input samplesheets

`mags` and `bins` modes (`GENOMESUBMIT`)

`metagenomic_assemblies` mode (`ASSEMBLYSUBMIT`)

Usage

Required parameters:

Optional parameters:

Pipeline output

Credits

Contributions and Support

Citations

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Input samplesheets

mags and bins modes (GENOMESUBMIT)

metagenomic_assemblies mode (ASSEMBLYSUBMIT)

Usage

Required parameters:

Optional parameters:

Pipeline output

Credits

Contributions and Support

Citations

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

`mags` and `bins` modes (`GENOMESUBMIT`)

`metagenomic_assemblies` mode (`ASSEMBLYSUBMIT`)