feat(chemistries): add 10x-flexv2-gex-3p-config-b preset#201
Open
an-altosian wants to merge 1 commit into
Open
feat(chemistries): add 10x-flexv2-gex-3p-config-b preset#201an-altosian wants to merge 1 commit into
an-altosian wants to merge 1 commit into
Conversation
Adds a chemistry preset for the 10x Genomics GEM-X Flex v2 sequencing
Configuration B (R1=28 / R2=90), as documented in 10x's "Sequencing
Requirements for Single Cell Gene Expression Flex" guide.
Two fields differ from `10x-flexv2-gex-3p` (Configuration A); all other
assets (cell-BC whitelist, sample-BC TSV, probe sets) are byte-identical:
- `geometry`:
Config A: 1{b[16]u[12]x[0-3]f[TTGCTAGGACCG]s[10]x:}2{r:}
Config B: 1{b[16]u[12]x:}2{r[50]f[CCCATATAAGAAAACCTGAATACGCGGTT]s[10]x:}
In Config B, R1 stops at 28 bp (cell BC + UMI only — no probe-side
anchor), and R2 reads 90 bp from the probe end, so the probe insert,
the 29 bp constant `CCCATATAAGAAAACCTGAATACGCGGTT` (the RC of 10x's
documented `AACCGCGTATTCAGGTTTTCTTATATGGG`), and the 10 bp sample BC
all live on R2.
- `sample_bc_list.sample_bc_ori`:
Config A: "reverse" (whitelist is RC of what R2 reads)
Config B: "forward" (R2 reads the opposite strand, so the
whitelist matches the read sequence as-is)
Closes COMBINE-lab#198.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a chemistry preset for 10x Genomics GEM-X Flex v2 sequencing Configuration B (R1=28 / R2=90), as documented in 10x's Sequencing Requirements for Single Cell Gene Expression Flex.
Users running Config B today have to override
--geometryand--sample-bc-oriby hand (or use the new CLI flag from #199). This preset gives them a one-line--chemistry 10x-flexv2-gex-3p-config-boption that resolves all preset-controlled parameters automatically — geometry, sample BC orientation, cell BC whitelist, sample BC TSV, and per-organism probe sets.Two fields differ from
10x-flexv2-gex-3pAll other assets (
plist_name,remote_url,sample_bc_list.{plist_name,remote_url},probe_sets.human/mouse,meta) are byte-identical to the Config A preset.Why geometry differs: In Config B, R1 ends at 28 bp (cell BC + UMI only — no room for the probe-side anchor
TTGCTAGGACCGthat Config A uses), and R2 reads 90 bp from the probe end. So the 29 bp constantCCCATATAAGAAAACCTGAATACGCGGTT(the RC of 10x's documented sense-strandAACCGCGTATTCAGGTTTTCTTATATGGG), the 10 bp sample BC, and the 50 bp probe-mappable region all live on R2 in that order.Why sample_bc_ori differs: R2 reads the opposite strand vs R1's "view" of the library construct, so the canonical sample BC appears in the read in its forward-canonical form (whereas Config A reads the BC's RC on R1's strand).
Validation
End-to-end verified by running
simpleaf multiplex-quant --chemistry 10x-flexv2-gex-3p-config-b --organism mouse --probe-set <mouse probe CSV>against an internal Config B Flex library and comparing against the corresponding cellranger ground truth:10x-flexv2-gex-3p(Config A) preset produces noise-floor sample-BC matches only, confirming thesample_bc_ori/geometry split between the two presets is real, not a CLI quirk.Relation to #199
#199 exposes
--sample-bc-orias a CLI override, which is the general fix for cycle-plan variants. This PR is complementary, not redundant: the preset gives Config B users--chemistry 10x-flexv2-gex-3p-config-bas a one-line alternative to remembering the right geometry + override combo. Both code paths were validated against the same dataset and produce byte-identical results, confirming the preset's declared fields and the CLI override flow into the same downstream pipeline.Test plan
python3 -m json.tool).10x-flexv2-gex-3pwith the same field order — minimal diff (+28 lines), no whitespace churn on unrelated entries.--chemistry 10x-flexv2-gex-3p-config-bagainst a real Config B Flex library: ~98% mapping rate, all expected sample wells recover with realistic cell counts, unused wells at noise floor.simpleaf inspectlists the new preset alongside the existing ones (chemistries.json registry load path works).Closes #198.