Skip to content

Commit 7f15b06

Browse files
committed
Usage examples updates
Halfway through testing and updating the syntax for the usage examples. I still need to add images and fix the table formatting at the end of the document
1 parent a84a6c9 commit 7f15b06

File tree

2 files changed

+306
-0
lines changed

2 files changed

+306
-0
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,5 @@ RFDpoly documentation
1717
license_link.rst
1818
contributing_link.rst
1919
installation_guide.md
20+
rfdpoly_usage_examples.md
2021

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# RFDpoly usage examples
2+
3+
This document has several examples for how to use RFDpoly for RNA and DNA generation including:
4+
- Unconditional generation of multiple biopolymers together
5+
- Motif scaffolding tasks with multiple polymers
6+
- Control of base-pairing secondary structure to design RNA pseudoknots or (mini) DNA origami
7+
8+
```{note}
9+
If you come across this error:
10+
```bash
11+
Could note locate file '/current/working/directory/test_data/DBP035.pdb'. Tried...
12+
```
13+
You can fix it via the addition of
14+
```bash
15+
inference.input_pdb=/path/to/RFDpoly/rf_diffusion/test_data/DBP035.pdb
16+
```
17+
to the end of your list of configuration options.
18+
```
19+
20+
## Table of Contents
21+
- [Example 1: Unconditional RNA generation](#unconditional-rna-generation)
22+
- [Example 2: Example 2: Unconditional design of one protein chain and two DNA chains](#unconditional-protein-2-dna)
23+
- [Example 3: RNA riboswitch design with conditioning from Eterna puzzle structure string](#rna-riboswitch-conditioning-eterna)
24+
- [Example 4: DNA–protein scaffolding, inpaint two DNA chains](#dna-protein-scaffolding-inpaint-2-dna)
25+
- [Example 5: DNA–protein scaffolding, inpaint two DNA chains and one protein chain](#dna-protein-scaffolding-inpaint-2-dna-1-protein)
26+
- [Example 6: DNA origami with symmetric denoising](#dna-origami-symmetric-denoising)
27+
- [Example 7: RNA design with triple helix](#rna-design-triple-helix)
28+
- [Example 8: Control of RNA tertiary structure with multi-contact specification](#control-rna-tertiary-multi-contact)
29+
- [Example 9: Pseudocyclic symmetry using procedurally generated base-pair patterning](#pseudocyclic-symmetry-base-pair-patterning)
30+
- [Example 10: De novo Holliday junctions using strand exchange](#de-novo-holliday-junctions)
31+
- [Example 11: Sequence specification and sequence design](#sequence-specification-sequence-design)
32+
- [Unconditional design of RNA and protein](#unconditional-design-rna-protein)
33+
- [Unconditional design of DNA and protein](#unconditional-design-dna-protein)
34+
- [Residue Specification Argument](#residue-specification-arguments)
35+
36+
37+
(unconditional-rna-generation)=
38+
## Example 1: Unconditional RNA generation
39+
```
40+
apptainer run --nv /path/to/SE3nv.sif /path/to/RFDpoly/rf_diffusion/run_inference.py \
41+
--config-name=multi_polymer \
42+
diffuser.T=50 \
43+
inference.num_designs=5 \
44+
'contigmap.contigs=["90"]' \
45+
contigmap.polymer_chains=['rna'] \
46+
inference.output_prefix='./demo_outputs/RNA_uncond_standard_settings' \
47+
inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt
48+
```
49+
50+
(unconditional-protein-2-dna)=
51+
## Example 2: Unconditional design of one protein chain and two DNA chains
52+
```
53+
apptainer run --nv /path/to/SE3nv.sif /path/to/RFDpoly/rf_diffusion/run_inference.py \
54+
--config-name=multi_polymer \
55+
diffuser.T=50 \
56+
inference.num_designs=5 \
57+
"contigmap.contigs=[20\ 20\ 75]" \
58+
'contigmap.polymer_chains=["dna","dna","protein"]' \
59+
inference.output_prefix='./demo_outputs/DNA_prot_uncond_standard_settings'
60+
inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt
61+
```
62+
63+
(rna-riboswitch-conditioning-eterna)=
64+
## Example 3: RNA riboswitch design with conditioning from Eterna puzzle structure string
65+
66+
Puzzle source: [JR Openknot4 Week6 4RZD](https://eternagame.org/labs/13195459)
67+
```
68+
((((...............))))(((((((((........))))))))).......(((((((........)))))))(((...........))).....
69+
```
70+
Because Hydra cannot use parentheses in command-line arguments, replace “(” and “)” with “5”
71+
and “3”.
72+
```
73+
apptainer run --nv /path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
74+
--config-name=multi_polymer \
75+
diffuser.T=50 \
76+
inference.num_designs=5 \
77+
'contigmap.contigs=["100"]' \
78+
contigmap.polymer_chains=['rna'] \
79+
scaffoldguided.target_ss_string='5555...............3333555555555........333333333.......5555555........3333333555...........333.....' \
80+
inference.output_prefix='./demo_outputs/RNA_eterna_cond_standard_settings'
81+
inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt
82+
```
83+
Try this out on more [Eterna puzzles](https://eternagame.org/challenges/11843006)! To get the ss-string for a given puzzle, start playing, then
84+
right-click → “Copy Structure” or choose it from the menu in the upper left corner.
85+
**BUT WAIT!** What if you want to insert multiple pseudoknots into your design or insert them
86+
into specific positions of certain chains? You can provide a list of secondary structure strings as
87+
follows:
88+
```
89+
scaffoldguided.target_ss_string_list=[\'B1-90:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.\',\'A116-205:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.\']
90+
```
91+
92+
where the ranges specify chains and indices of pseudoknot insertion locations in the output contigs.
93+
94+
(dna-protein-scaffolding-inpaint-2-dna)=
95+
## Example 4: DNA–protein scaffolding, inpaint two DNA chains
96+
97+
The argument `inference.ij_visible` controls which motifs listed in the contigs have their relative
98+
orientations locked during inference (credit to [Dr. David Juergens](https://scholar.google.com/citations?user=k6PhNDQAAAAJ&hl=en) for this intuitive system). In the following
99+
contact map, lowercase letters denote motifs from the input PDB in the order they occur in the contigs
100+
(not to be confused with uppercase chain IDs):
101+
102+
```
103+
10,D8-13,6,B8-13,10\ 10,B18-23,6,D18-23,10\ A1-56,0\ C1-56,0
104+
a. b. c. d. e. f.
105+
```
106+
Motifs A1-56, B8-13, and B18-23 (b,c,e) form the first DNA+binder group “bce”, while C1-5,
107+
D8-13, and D18-23 (a,d,f) form “adf”. The relative orientations of “bce” and “adf” can now vary
108+
during inference.
109+
```
110+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
111+
--config-name=multi_polymer \
112+
diffuser.T=50 \
113+
inference.num_designs=5 \
114+
"contigmap.contigs=['10,D8-13,6,B8-13,10 10,B18-23,6,D18-23,10 A1-56,0 C1-56,0']" \
115+
inference.ij_visible='bce-adf' \
116+
"contigmap.polymer_chains=[dna,dna,protein,protein]" \
117+
inference.input_pdb='path/to/RFDpoly/test_data/combo_DBP009_DBP010_DBP011_with_DNA_v2.pdb' \
118+
inference.output_prefix='./demo_outputs/DNA_binders_scaffolding_test1_standard_settings'
119+
```
120+
121+
(dna-protein-scaffolding-inpaint-2-dna-1-protein)=
122+
## Example 5: DNA–protein scaffolding, inpaint two DNA chains and one protein chain
123+
Here, contigmap.polymer chains lists only three chains because binding proteins are merged into
124+
one.
125+
```
126+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
127+
--config-name=multi_polymer \
128+
diffuser.T=50 \
129+
inference.num_designs=5 \
130+
contigmap.contigs=[\'5,D8-13,2,B8-13,5\ 5,B18-23,2,D18-23,5\ A1-52,90,C4-56,0\'] \
131+
inference.ij_visible='bce-adf' \
132+
contigmap.polymer_chains=[\'dna\',\'dna\',\'protein\'] \
133+
scaffoldguided.target_ss_pairs=[\'A1-24,B1-24\'] \
134+
inference.input_pdb='/projects/ml/afavor/test_data/combo_DBP009_DBP010_DBP011_with_DNA_v2.pdb' \
135+
inference.output_prefix='./demo_outputs/DNA_binders_scaffolding_test2_standard_settings'
136+
```
137+
The new argument scaffoldguided.target ss pairs enforces base-pairing of polymer ranges
138+
(here A1-24 B1-24). Each corresponds to the two full DNA contigs (A,B) in the outputs, each of
139+
length 24.
140+
141+
(dna-origami-symmetric-denoising)=
142+
## Example 6: DNA origami with symmetric denoising
143+
The scaffoldguided.target ss pairs argument specifies paired ranges in the design; both
144+
ranges in each pair must have equal length. The first is 5→3, the second 3→5.
145+
```
146+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
147+
--config-name=multi_polymer \
148+
diffuser.T=50 \
149+
inference.num_designs=5 \
150+
contigmap.contigs=[\'60\ 60\ 60\ 60\'] \
151+
contigmap.polymer_chains=[\'dna\',\'dna\',\'dna\',\'dna\'] \
152+
scaffoldguided.target_ss_pairs=[\'A1-20,B1-20\',\'A21-40,C21-40\',\'A41-60,D41-60\',\'B21-40,D21-40\',\'B41-60,C41-60\',\'C1-20,D1-20\'] \
153+
inference.symmetry='d2' \
154+
inference.output_prefix='./demo_outputs/DNA_origami_standard_settings'
155+
```
156+
157+
(rna-design-triple-helix)=
158+
## Example 7: RNA design with triple helix
159+
Just as in the previous example, we can use base-paired ranges of sequence to control RNA
160+
topology. By default, paired regions are antiparallel, but specific orientations can be assigned (e.g., triple helices with parallel/antiparallel combinations).
161+
```
162+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
163+
--config-name=multi_polymer \
164+
diffuser.T=50 \
165+
inference.num_designs=5 \
166+
contigmap.contigs=[\'75\'] \
167+
contigmap.polymer_chains=[\'rna\'] \
168+
scaffoldguided.target_ss_pairs=[\'A5-20,A55-70\',\'A55-70,A30-45\'] \
169+
scaffoldguided.target_ss_pair_ori=[\'P\',\'A\'] \
170+
inference.output_prefix='./demo_outputs/Triple_helix_test'
171+
```
172+
Each orientation in the list scaffoldguided.target ss pair ori corresponds to the element at
173+
the same index in scaffoldguided.target ss pairs.
174+
175+
(control-rna-tertiary-multi-contact)=
176+
## Example 8: Control of RNA tertiary structure with multi-contact specification
177+
We can “staple” distal loops in RNA pseudoknots together by specifying regions of multi-base
178+
contacts using scaffoldguided.force multi contacts. Secondary-structure strings cannot encode
179+
beyond simple two-base pair configurations, so this feature enables higher-order tertiary interactions.
180+
We can also force loop placement via scaffoldguided.force loops list.
181+
```
182+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
183+
--config-name=multi_polymer \
184+
diffuser.T=50 \
185+
inference.num_designs=5 \
186+
contigmap.contigs=[\'80\'] \
187+
contigmap.polymer_chains=[\'rna\'] \
188+
scaffoldguided.target_ss_pairs=[\'A5-15,A25-35\',\'A45-55,A65-75\'] \
189+
scaffoldguided.force_multi_contacts=[\'A19,A61,A20\',\'A59,A21,A60\'] \
190+
scaffoldguided.force_loops_list=[\'A38-42\'] \
191+
inference.output_prefix='./demo_outputs/loop_touch_test'
192+
```
193+
194+
(pseudocyclic-symmetry-base-pair-patterning)=
195+
## Example 9: Pseudocyclic symmetry using procedurally generated base-pair patterning
196+
Below are arguments used to create pseudo-symmetry in a single chain forming a cyclic-symmetric shape (e.g., C2 pseudocycle).
197+
198+
```
199+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \
200+
inference.ckpt_path='/models/all_na_ss_cond/train_session2024-07-08_1720455712_BFF_5.00.pt' \
201+
diffuser.T=50 \
202+
inference.num_designs=5 \
203+
contigmap.contigs=[\'240\'] \
204+
inference.pseudo_symmetry='c2' \
205+
inference.n_repeats=2 \
206+
scaffoldguided.target_ss_pairs=[\'A4-5,A237-238\',\'A7,A236\',\'A11-12,A230-231\',\'A18-27,A137-146\',\'A44-47,A72-75\',\'A49-44,A64-69\',\'A63-69,A49-55\',\'A72-75,A44-47\',\'A90-99,A197-206\',\'A110-112,A130-132\',\'A115-115,A127-128\',\'A117-118,A124-125\',\'A124-125,A117-118\',\'A127-128,A115-116\',\'A130-132,A110-112\',\'A137-140,A24-27\',\'A141-146,A18-23\',\'A164-168,A191-195\',\'A169-175,A183-189\',\'A183-189,A169-175\',\'A191-195,A164-168\',\'A197-206,A90-99\',\'A210-214,A82-86\',\'A215-219,A77-81\',\'A230-231,A11-12\',\'A236,A7\',\'A237-238,A4-5\'] \
207+
contigmap.polymer_chains=[\'rna\'] \
208+
inference.output_prefix='./outputs_2025-02-03/pC2_test01__BFF_5.00'
209+
```
210+
211+
(de-novo-holliday-junctions)=
212+
## Example 10: De novo Holliday junctions using strand exchange
213+
We can use symmetry and strand exchange to design Holliday-junction-style complexes. Chain
214+
and index specifications in scaffoldguided.target ss pairs refer to chain IDs and indices in the
215+
output structure defined by contig topology.
216+
```
217+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \
218+
--config-name=multi_polymer \
219+
diffuser.T=50 \
220+
inference.symmetry='c2' \
221+
inference.num_designs=3 contigmap.inpaint_seq=[\'D60\',\'D57\',\'D58\',\'D11\',\'D15\',\'D19\',\'D8\',\'D61\',\'D18\',\'A9\',\'A59\',\'A62\',\'A8\',\'A4\',\'A5\'] \
222+
inference.num_designs=5 \
223+
inference.ckpt_path='/projects/ml/afavor/RFD/models/all_na_ss_cond/train_session2024-07-08_1720455712_BFF_5.00.pt' \
224+
contigmap.contigs=[\'A1-61,60,D14-65\ 15,B6-12,4,F1-8,15\ 15,E7-14,4,C4-10,15\ A1-61,60,D14-65\ 15,B6-12,4,F1-8,15\ 15,E7-14,4,C4-10,15\'] \
225+
inference.ij_visible='acf-bde-gil-hjk' \
226+
contigmap.polymer_chains=[\'protein\',\'dna\',\'dna\',\'protein\',\'dna\',\'dna\'] \
227+
scaffoldguided.target_ss_pairs=[\'B1-10,F40-49\',\'B40-49,F1-10\',\'B16-34,C16-34\',\'E16-34,F16-34\',\'C1-10,E40-49\',\'C40-49,E1-10\'] \
228+
inference.input_pdb='/projects/ml/afavor/test_data/DBP35opt_DBP48.pdb' \
229+
inference.output_prefix='./outputs_2025-02-03/DBP_scaffolding_test06__BFF_4.00'
230+
```
231+
232+
(sequence-specification-sequence-design)=
233+
## Example 11: Sequence specification and sequence design
234+
These are two new features that I've added recently, so I'm placing two examples at the top of the
235+
wiki page so that they're the first thing people see. Anyway, we can now specify the sequence of our
236+
structures to be whatever we want!
237+
Additionally, I've trained the model to do sequence prediction, so we can decode a sequence during
238+
the denoising trajectory (this allows us to generate outputs with all of the base atoms rendered for
239+
NA stuff, as well as nice sidechain interactions for protein stuff).
240+
The default behavior throughout RFdiffusion is to keep the sequence of diffused regions masked
241+
during the trajectory, even if the outputs seem to have residue labels. RoseTTAfold must “see”
242+
sequence labels in order to generate sidechains, so I added a flag, inference.update seq t=True,
243+
which allows the model to see either a user-specified sequence or the model's predicted sequence from
244+
the previous timestep. This gives us sidechains, and it is super cool.
245+
There are two ways to control the sequence info during a trajectory:
246+
• Turn on full-sequence visibility at some timestep towards the end of the trajectory, using
247+
inference.show seq under t=15.
248+
• Gradually decode a random selection of positions at each step below some point in the trajectory,
249+
using diffuser.aa decode steps=40.
250+
Both methods work nicely, so test them both during your design process, and let Andrew know if you
251+
find that one works better! Examples using both methods are shown below.
252+
253+
(unconditional-design-rna-protein)=
254+
### Unconditional design of RNA and protein
255+
While specifying the RNA sequence, letting the model design the protein sequence, and gradually revealing various sequence positions over the course of the last 40 steps:
256+
```
257+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \
258+
diffuser.T=50 \
259+
inference.num_designs=3 \
260+
contigmap.contigs=[\'43\ 20\ 75\'] \
261+
contigmap.polymer_chains=[\'rna\',\'rna\',\'protein\'] \
262+
inference.set_sequence=[\'A1-43:GGAUGUACUACCAGCUGAUGAGUCCCAAAUAGGACGAAACGCC\',\'B1-20:GGCGUCCUGGUAUCCAAUCC\'] \
263+
inference.update_seq_t=True \
264+
diffuser.aa_decode_steps=40 \
265+
inference.output_prefix='./demo_outputs/RNA-prot_seq-spec_and_seq-design_standard_settings'
266+
```
267+
268+
(unconditional-design-dna-protein)=
269+
### Unconditional design of DNA and protein
270+
While specifying the DNA sequence, letting the model design the protein sequence, and letting the model see for the last 15 steps (currently throws an error but works with autoregressive decoding):
271+
```
272+
/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \
273+
diffuser.T=50 \
274+
inference.num_designs=3 \
275+
contigmap.contigs=[\'33\ 33\ 75\'] \
276+
contigmap.polymer_chains=[\'dna\',\'dna\',\'protein\'] \
277+
scaffoldguided.target_ss_pairs=[\'A11-23,B11-23\'] \
278+
inference.set_sequence=[\'A11-23:TAGCAGGATGTGT\'] \
279+
inference.assume_canonical_pair_seq=True \
280+
inference.update_seq_t=True \
281+
inference.show_seq_under_t=15 \
282+
inference.output_prefix='./demo_outputs/DNA-prot_seq-spec_and_seq-design_standard_settings'
283+
```
284+
Notice in the DNA example above, only one of the dsDNA chains is specified. Since we specified
285+
the paired regions using scaffoldguided.target ss pairs=['A11-23,B11-23'], the model knows
286+
which bases should be paired. Then, we can use the flag inference.assume canonical pair seq=True
287+
to fill in canonical base-pair partners automatically.
288+
289+
290+
## Further Reading
291+
292+
###
293+
294+
(residue-specification-arguments)=
295+
## Residue Specification Arguments
296+
297+
Reference structure | Argument
298+
----------------------------------------------------------------------
299+
Input PDB | contigmap.contigs=[\'B1-14,5,…,H1-9,0\']
300+
----------------------------------------------------------------------
301+
| scaffoldguided.target_ss_pairs=…
302+
Output design | scaffoldguided.target_ss_string_list=…
303+
| contigmap.ij_visible=…
304+
----------------------------------------------------------------------
305+

0 commit comments

Comments
 (0)