-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hi modkit team,
I am analyzing CpG methylation using modkit pileup on Oxford Nanopore data
(mapped BAMs, reference-guided).
Across multiple samples, I observe that the majority of reads contributing to
coverage end up in the Nfail column, even when valid_coverage ≥ 3.
Some details:
-
Genome: Plasmodium falciparum 3D7
-
Command used (example):
modkit pileup
--reference PlasmoDB-64_Pfalciparum3D7_Genome.fasta
--modified-bases C
--combine-mods
input.bam output.bed -
In the resulting bedMethyl files:
- valid_coverage is often high (≥3 for many CpGs)
- count_modified and count_canonical are low
- most reads appear to be classified as Nfail
- fraction of methylated CpGs among covered sites is very low
This behavior is consistent across samples and across different coverage cutoffs.
My questions:
- Is a high Nfail count expected behavior in cases of low-confidence or low-level CpG methylation?
- Which filters most commonly cause reads to be classified as Nfail?
(e.g. modification probability threshold, basecall quality, alignment flags, context mismatch) - Is there a recommended way to summarize or interpret datasets where
valid coverage exists but most reads fail confidence filters? - Would adjusting parameters like min-mod-prob be appropriate to explore this further?
The behavior seems biologically plausible for this organism, but I would like
to confirm that my interpretation of Nfail is correct.
Thanks for the great tool, and for any clarification!