Skip to content

feat(chemistries): add 10x-flexv2-gex-3p-config-b preset#201

Open
an-altosian wants to merge 1 commit into
COMBINE-lab:mainfrom
an-altosian:feat/preset-flexv2-config-b
Open

feat(chemistries): add 10x-flexv2-gex-3p-config-b preset#201
an-altosian wants to merge 1 commit into
COMBINE-lab:mainfrom
an-altosian:feat/preset-flexv2-config-b

Conversation

@an-altosian
Copy link
Copy Markdown
Contributor

@an-altosian an-altosian commented May 29, 2026

Summary

Adds a chemistry preset for 10x Genomics GEM-X Flex v2 sequencing Configuration B (R1=28 / R2=90), as documented in 10x's Sequencing Requirements for Single Cell Gene Expression Flex.

Users running Config B today have to override --geometry and --sample-bc-ori by hand (or use the new CLI flag from #199). This preset gives them a one-line --chemistry 10x-flexv2-gex-3p-config-b option that resolves all preset-controlled parameters automatically — geometry, sample BC orientation, cell BC whitelist, sample BC TSV, and per-organism probe sets.

Two fields differ from 10x-flexv2-gex-3p

All other assets (plist_name, remote_url, sample_bc_list.{plist_name,remote_url}, probe_sets.human/mouse, meta) are byte-identical to the Config A preset.

- "geometry": "1{b[16]u[12]x[0-3]f[TTGCTAGGACCG]s[10]x:}2{r:}",
+ "geometry": "1{b[16]u[12]x:}2{r[50]f[CCCATATAAGAAAACCTGAATACGCGGTT]s[10]x:}",
...
-     "sample_bc_ori": "reverse"
+     "sample_bc_ori": "forward"

Why geometry differs: In Config B, R1 ends at 28 bp (cell BC + UMI only — no room for the probe-side anchor TTGCTAGGACCG that Config A uses), and R2 reads 90 bp from the probe end. So the 29 bp constant CCCATATAAGAAAACCTGAATACGCGGTT (the RC of 10x's documented sense-strand AACCGCGTATTCAGGTTTTCTTATATGGG), the 10 bp sample BC, and the 50 bp probe-mappable region all live on R2 in that order.

Why sample_bc_ori differs: R2 reads the opposite strand vs R1's "view" of the library construct, so the canonical sample BC appears in the read in its forward-canonical form (whereas Config A reads the BC's RC on R1's strand).

Validation

End-to-end verified by running simpleaf multiplex-quant --chemistry 10x-flexv2-gex-3p-config-b --organism mouse --probe-set <mouse probe CSV> against an internal Config B Flex library and comparing against the corresponding cellranger ground truth:

  • High overall mapping rate against the auto-built probe index, consistent with what other Flex presets produce on Config A data.
  • Sample BC demultiplexing recovers exactly the wells declared in the cellranger multi config, with realistic cell counts per well; unused wells stay at the noise floor (≤1 cell / few reads).
  • The same input run against the 10x-flexv2-gex-3p (Config A) preset produces noise-floor sample-BC matches only, confirming the sample_bc_ori/geometry split between the two presets is real, not a CLI quirk.

Relation to #199

#199 exposes --sample-bc-ori as a CLI override, which is the general fix for cycle-plan variants. This PR is complementary, not redundant: the preset gives Config B users --chemistry 10x-flexv2-gex-3p-config-b as a one-line alternative to remembering the right geometry + override combo. Both code paths were validated against the same dataset and produce byte-identical results, confirming the preset's declared fields and the CLI override flow into the same downstream pipeline.

Test plan

  • JSON parses (validated via python3 -m json.tool).
  • Preset entry is sibling-style appended after 10x-flexv2-gex-3p with the same field order — minimal diff (+28 lines), no whitespace churn on unrelated entries.
  • End-to-end run with --chemistry 10x-flexv2-gex-3p-config-b against a real Config B Flex library: ~98% mapping rate, all expected sample wells recover with realistic cell counts, unused wells at noise floor.
  • Byte-identical results vs the CLI-override path in PR feat(multiplex-quant): expose --sample-bc-ori as a CLI override #199 (rules out preset-vs-CLI divergence).
  • simpleaf inspect lists the new preset alongside the existing ones (chemistries.json registry load path works).

Closes #198.

Adds a chemistry preset for the 10x Genomics GEM-X Flex v2 sequencing
Configuration B (R1=28 / R2=90), as documented in 10x's "Sequencing
Requirements for Single Cell Gene Expression Flex" guide.

Two fields differ from `10x-flexv2-gex-3p` (Configuration A); all other
assets (cell-BC whitelist, sample-BC TSV, probe sets) are byte-identical:

- `geometry`:
  Config A: 1{b[16]u[12]x[0-3]f[TTGCTAGGACCG]s[10]x:}2{r:}
  Config B: 1{b[16]u[12]x:}2{r[50]f[CCCATATAAGAAAACCTGAATACGCGGTT]s[10]x:}

  In Config B, R1 stops at 28 bp (cell BC + UMI only — no probe-side
  anchor), and R2 reads 90 bp from the probe end, so the probe insert,
  the 29 bp constant `CCCATATAAGAAAACCTGAATACGCGGTT` (the RC of 10x's
  documented `AACCGCGTATTCAGGTTTTCTTATATGGG`), and the 10 bp sample BC
  all live on R2.

- `sample_bc_list.sample_bc_ori`:
  Config A: "reverse"   (whitelist is RC of what R2 reads)
  Config B: "forward"   (R2 reads the opposite strand, so the
                        whitelist matches the read sequence as-is)

Closes COMBINE-lab#198.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add chemistry preset for GEM-X Flex v2 Configuration B (R1=28 / R2=90)

1 participant