Skip to content

Clarification on high Nfail counts in modkit pileup output despite valid coverage #577

@dasidhartha2018

Description

@dasidhartha2018

Hi modkit team,

I am analyzing CpG methylation using modkit pileup on Oxford Nanopore data
(mapped BAMs, reference-guided).

Across multiple samples, I observe that the majority of reads contributing to
coverage end up in the Nfail column, even when valid_coverage ≥ 3.

Some details:

  • Genome: Plasmodium falciparum 3D7

  • Command used (example):

    modkit pileup
    --reference PlasmoDB-64_Pfalciparum3D7_Genome.fasta
    --modified-bases C
    --combine-mods
    input.bam output.bed

  • In the resulting bedMethyl files:

    • valid_coverage is often high (≥3 for many CpGs)
    • count_modified and count_canonical are low
    • most reads appear to be classified as Nfail
    • fraction of methylated CpGs among covered sites is very low

This behavior is consistent across samples and across different coverage cutoffs.

My questions:

  1. Is a high Nfail count expected behavior in cases of low-confidence or low-level CpG methylation?
  2. Which filters most commonly cause reads to be classified as Nfail?
    (e.g. modification probability threshold, basecall quality, alignment flags, context mismatch)
  3. Is there a recommended way to summarize or interpret datasets where
    valid coverage exists but most reads fail confidence filters?
  4. Would adjusting parameters like min-mod-prob be appropriate to explore this further?

The behavior seems biologically plausible for this organism, but I would like
to confirm that my interpretation of Nfail is correct.

Thanks for the great tool, and for any clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions