|
| 1 | +# RFDpoly usage examples |
| 2 | + |
| 3 | +This document has several examples for how to use RFDpoly for RNA and DNA generation including: |
| 4 | +- Unconditional generation of multiple biopolymers together |
| 5 | +- Motif scaffolding tasks with multiple polymers |
| 6 | +- Control of base-pairing secondary structure to design RNA pseudoknots or (mini) DNA origami |
| 7 | + |
| 8 | +```{note} |
| 9 | +If you come across this error: |
| 10 | + ```bash |
| 11 | + Could note locate file '/current/working/directory/test_data/DBP035.pdb'. Tried... |
| 12 | + ``` |
| 13 | + You can fix it via the addition of |
| 14 | + ```bash |
| 15 | + inference.input_pdb=/path/to/RFDpoly/rf_diffusion/test_data/DBP035.pdb |
| 16 | + ``` |
| 17 | + to the end of your list of configuration options. |
| 18 | +``` |
| 19 | + |
| 20 | +## Table of Contents |
| 21 | +- [Example 1: Unconditional RNA generation](#unconditional-rna-generation) |
| 22 | +- [Example 2: Example 2: Unconditional design of one protein chain and two DNA chains](#unconditional-protein-2-dna) |
| 23 | +- [Example 3: RNA riboswitch design with conditioning from Eterna puzzle structure string](#rna-riboswitch-conditioning-eterna) |
| 24 | +- [Example 4: DNA–protein scaffolding, inpaint two DNA chains](#dna-protein-scaffolding-inpaint-2-dna) |
| 25 | +- [Example 5: DNA–protein scaffolding, inpaint two DNA chains and one protein chain](#dna-protein-scaffolding-inpaint-2-dna-1-protein) |
| 26 | +- [Example 6: DNA origami with symmetric denoising](#dna-origami-symmetric-denoising) |
| 27 | +- [Example 7: RNA design with triple helix](#rna-design-triple-helix) |
| 28 | +- [Example 8: Control of RNA tertiary structure with multi-contact specification](#control-rna-tertiary-multi-contact) |
| 29 | +- [Example 9: Pseudocyclic symmetry using procedurally generated base-pair patterning](#pseudocyclic-symmetry-base-pair-patterning) |
| 30 | +- [Example 10: De novo Holliday junctions using strand exchange](#de-novo-holliday-junctions) |
| 31 | +- [Example 11: Sequence specification and sequence design](#sequence-specification-sequence-design) |
| 32 | + - [Unconditional design of RNA and protein](#unconditional-design-rna-protein) |
| 33 | + - [Unconditional design of DNA and protein](#unconditional-design-dna-protein) |
| 34 | +- [Residue Specification Argument](#residue-specification-arguments) |
| 35 | + |
| 36 | + |
| 37 | +(unconditional-rna-generation)= |
| 38 | +## Example 1: Unconditional RNA generation |
| 39 | +``` |
| 40 | +apptainer run --nv /path/to/SE3nv.sif /path/to/RFDpoly/rf_diffusion/run_inference.py \ |
| 41 | +--config-name=multi_polymer \ |
| 42 | +diffuser.T=50 \ |
| 43 | +inference.num_designs=5 \ |
| 44 | +'contigmap.contigs=["90"]' \ |
| 45 | +contigmap.polymer_chains=['rna'] \ |
| 46 | +inference.output_prefix='./demo_outputs/RNA_uncond_standard_settings' \ |
| 47 | +inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt |
| 48 | +``` |
| 49 | + |
| 50 | +(unconditional-protein-2-dna)= |
| 51 | +## Example 2: Unconditional design of one protein chain and two DNA chains |
| 52 | +``` |
| 53 | +apptainer run --nv /path/to/SE3nv.sif /path/to/RFDpoly/rf_diffusion/run_inference.py \ |
| 54 | +--config-name=multi_polymer \ |
| 55 | +diffuser.T=50 \ |
| 56 | +inference.num_designs=5 \ |
| 57 | +"contigmap.contigs=[20\ 20\ 75]" \ |
| 58 | +'contigmap.polymer_chains=["dna","dna","protein"]' \ |
| 59 | +inference.output_prefix='./demo_outputs/DNA_prot_uncond_standard_settings' |
| 60 | +inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt |
| 61 | +``` |
| 62 | + |
| 63 | +(rna-riboswitch-conditioning-eterna)= |
| 64 | +## Example 3: RNA riboswitch design with conditioning from Eterna puzzle structure string |
| 65 | + |
| 66 | +Puzzle source: [JR Openknot4 Week6 4RZD](https://eternagame.org/labs/13195459) |
| 67 | +``` |
| 68 | +((((...............))))(((((((((........))))))))).......(((((((........)))))))(((...........)))..... |
| 69 | +``` |
| 70 | +Because Hydra cannot use parentheses in command-line arguments, replace “(” and “)” with “5” |
| 71 | +and “3”. |
| 72 | +``` |
| 73 | +apptainer run --nv /path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 74 | +--config-name=multi_polymer \ |
| 75 | +diffuser.T=50 \ |
| 76 | +inference.num_designs=5 \ |
| 77 | +'contigmap.contigs=["100"]' \ |
| 78 | +contigmap.polymer_chains=['rna'] \ |
| 79 | +scaffoldguided.target_ss_string='5555...............3333555555555........333333333.......5555555........3333333555...........333.....' \ |
| 80 | +inference.output_prefix='./demo_outputs/RNA_eterna_cond_standard_settings' |
| 81 | +inference.ckpt_path=/path/to/RFDpoly/weights/train_session2024-07-08_1720455712_BFF_3.00.pt |
| 82 | +``` |
| 83 | +Try this out on more [Eterna puzzles](https://eternagame.org/challenges/11843006)! To get the ss-string for a given puzzle, start playing, then |
| 84 | +right-click → “Copy Structure” or choose it from the menu in the upper left corner. |
| 85 | +**BUT WAIT!** What if you want to insert multiple pseudoknots into your design or insert them |
| 86 | +into specific positions of certain chains? You can provide a list of secondary structure strings as |
| 87 | +follows: |
| 88 | +``` |
| 89 | +scaffoldguided.target_ss_string_list=[\'B1-90:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.\',\'A116-205:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.\'] |
| 90 | +``` |
| 91 | + |
| 92 | +where the ranges specify chains and indices of pseudoknot insertion locations in the output contigs. |
| 93 | + |
| 94 | +(dna-protein-scaffolding-inpaint-2-dna)= |
| 95 | +## Example 4: DNA–protein scaffolding, inpaint two DNA chains |
| 96 | + |
| 97 | +The argument `inference.ij_visible` controls which motifs listed in the contigs have their relative |
| 98 | +orientations locked during inference (credit to [Dr. David Juergens](https://scholar.google.com/citations?user=k6PhNDQAAAAJ&hl=en) for this intuitive system). In the following |
| 99 | +contact map, lowercase letters denote motifs from the input PDB in the order they occur in the contigs |
| 100 | +(not to be confused with uppercase chain IDs): |
| 101 | + |
| 102 | +``` |
| 103 | +10,D8-13,6,B8-13,10\ 10,B18-23,6,D18-23,10\ A1-56,0\ C1-56,0 |
| 104 | + a. b. c. d. e. f. |
| 105 | +``` |
| 106 | +Motifs A1-56, B8-13, and B18-23 (b,c,e) form the first DNA+binder group “bce”, while C1-5, |
| 107 | +D8-13, and D18-23 (a,d,f) form “adf”. The relative orientations of “bce” and “adf” can now vary |
| 108 | +during inference. |
| 109 | +``` |
| 110 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 111 | +--config-name=multi_polymer \ |
| 112 | +diffuser.T=50 \ |
| 113 | +inference.num_designs=5 \ |
| 114 | +"contigmap.contigs=['10,D8-13,6,B8-13,10 10,B18-23,6,D18-23,10 A1-56,0 C1-56,0']" \ |
| 115 | +inference.ij_visible='bce-adf' \ |
| 116 | +"contigmap.polymer_chains=[dna,dna,protein,protein]" \ |
| 117 | +inference.input_pdb='path/to/RFDpoly/test_data/combo_DBP009_DBP010_DBP011_with_DNA_v2.pdb' \ |
| 118 | +inference.output_prefix='./demo_outputs/DNA_binders_scaffolding_test1_standard_settings' |
| 119 | +``` |
| 120 | + |
| 121 | +(dna-protein-scaffolding-inpaint-2-dna-1-protein)= |
| 122 | +## Example 5: DNA–protein scaffolding, inpaint two DNA chains and one protein chain |
| 123 | +Here, contigmap.polymer chains lists only three chains because binding proteins are merged into |
| 124 | +one. |
| 125 | +``` |
| 126 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 127 | +--config-name=multi_polymer \ |
| 128 | +diffuser.T=50 \ |
| 129 | +inference.num_designs=5 \ |
| 130 | +contigmap.contigs=[\'5,D8-13,2,B8-13,5\ 5,B18-23,2,D18-23,5\ A1-52,90,C4-56,0\'] \ |
| 131 | +inference.ij_visible='bce-adf' \ |
| 132 | +contigmap.polymer_chains=[\'dna\',\'dna\',\'protein\'] \ |
| 133 | +scaffoldguided.target_ss_pairs=[\'A1-24,B1-24\'] \ |
| 134 | +inference.input_pdb='/projects/ml/afavor/test_data/combo_DBP009_DBP010_DBP011_with_DNA_v2.pdb' \ |
| 135 | +inference.output_prefix='./demo_outputs/DNA_binders_scaffolding_test2_standard_settings' |
| 136 | +``` |
| 137 | +The new argument scaffoldguided.target ss pairs enforces base-pairing of polymer ranges |
| 138 | +(here A1-24 B1-24). Each corresponds to the two full DNA contigs (A,B) in the outputs, each of |
| 139 | +length 24. |
| 140 | + |
| 141 | +(dna-origami-symmetric-denoising)= |
| 142 | +## Example 6: DNA origami with symmetric denoising |
| 143 | +The scaffoldguided.target ss pairs argument specifies paired ranges in the design; both |
| 144 | +ranges in each pair must have equal length. The first is 5→3, the second 3→5. |
| 145 | +``` |
| 146 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 147 | +--config-name=multi_polymer \ |
| 148 | +diffuser.T=50 \ |
| 149 | +inference.num_designs=5 \ |
| 150 | +contigmap.contigs=[\'60\ 60\ 60\ 60\'] \ |
| 151 | +contigmap.polymer_chains=[\'dna\',\'dna\',\'dna\',\'dna\'] \ |
| 152 | +scaffoldguided.target_ss_pairs=[\'A1-20,B1-20\',\'A21-40,C21-40\',\'A41-60,D41-60\',\'B21-40,D21-40\',\'B41-60,C41-60\',\'C1-20,D1-20\'] \ |
| 153 | +inference.symmetry='d2' \ |
| 154 | +inference.output_prefix='./demo_outputs/DNA_origami_standard_settings' |
| 155 | +``` |
| 156 | + |
| 157 | +(rna-design-triple-helix)= |
| 158 | +## Example 7: RNA design with triple helix |
| 159 | +Just as in the previous example, we can use base-paired ranges of sequence to control RNA |
| 160 | +topology. By default, paired regions are antiparallel, but specific orientations can be assigned (e.g., triple helices with parallel/antiparallel combinations). |
| 161 | +``` |
| 162 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 163 | +--config-name=multi_polymer \ |
| 164 | +diffuser.T=50 \ |
| 165 | +inference.num_designs=5 \ |
| 166 | +contigmap.contigs=[\'75\'] \ |
| 167 | +contigmap.polymer_chains=[\'rna\'] \ |
| 168 | +scaffoldguided.target_ss_pairs=[\'A5-20,A55-70\',\'A55-70,A30-45\'] \ |
| 169 | +scaffoldguided.target_ss_pair_ori=[\'P\',\'A\'] \ |
| 170 | +inference.output_prefix='./demo_outputs/Triple_helix_test' |
| 171 | +``` |
| 172 | +Each orientation in the list scaffoldguided.target ss pair ori corresponds to the element at |
| 173 | +the same index in scaffoldguided.target ss pairs. |
| 174 | + |
| 175 | +(control-rna-tertiary-multi-contact)= |
| 176 | +## Example 8: Control of RNA tertiary structure with multi-contact specification |
| 177 | +We can “staple” distal loops in RNA pseudoknots together by specifying regions of multi-base |
| 178 | +contacts using scaffoldguided.force multi contacts. Secondary-structure strings cannot encode |
| 179 | +beyond simple two-base pair configurations, so this feature enables higher-order tertiary interactions. |
| 180 | +We can also force loop placement via scaffoldguided.force loops list. |
| 181 | +``` |
| 182 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 183 | +--config-name=multi_polymer \ |
| 184 | +diffuser.T=50 \ |
| 185 | +inference.num_designs=5 \ |
| 186 | +contigmap.contigs=[\'80\'] \ |
| 187 | +contigmap.polymer_chains=[\'rna\'] \ |
| 188 | +scaffoldguided.target_ss_pairs=[\'A5-15,A25-35\',\'A45-55,A65-75\'] \ |
| 189 | +scaffoldguided.force_multi_contacts=[\'A19,A61,A20\',\'A59,A21,A60\'] \ |
| 190 | +scaffoldguided.force_loops_list=[\'A38-42\'] \ |
| 191 | +inference.output_prefix='./demo_outputs/loop_touch_test' |
| 192 | +``` |
| 193 | + |
| 194 | +(pseudocyclic-symmetry-base-pair-patterning)= |
| 195 | +## Example 9: Pseudocyclic symmetry using procedurally generated base-pair patterning |
| 196 | +Below are arguments used to create pseudo-symmetry in a single chain forming a cyclic-symmetric shape (e.g., C2 pseudocycle). |
| 197 | + |
| 198 | +``` |
| 199 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \ |
| 200 | +inference.ckpt_path='/models/all_na_ss_cond/train_session2024-07-08_1720455712_BFF_5.00.pt' \ |
| 201 | +diffuser.T=50 \ |
| 202 | +inference.num_designs=5 \ |
| 203 | +contigmap.contigs=[\'240\'] \ |
| 204 | +inference.pseudo_symmetry='c2' \ |
| 205 | +inference.n_repeats=2 \ |
| 206 | +scaffoldguided.target_ss_pairs=[\'A4-5,A237-238\',\'A7,A236\',\'A11-12,A230-231\',\'A18-27,A137-146\',\'A44-47,A72-75\',\'A49-44,A64-69\',\'A63-69,A49-55\',\'A72-75,A44-47\',\'A90-99,A197-206\',\'A110-112,A130-132\',\'A115-115,A127-128\',\'A117-118,A124-125\',\'A124-125,A117-118\',\'A127-128,A115-116\',\'A130-132,A110-112\',\'A137-140,A24-27\',\'A141-146,A18-23\',\'A164-168,A191-195\',\'A169-175,A183-189\',\'A183-189,A169-175\',\'A191-195,A164-168\',\'A197-206,A90-99\',\'A210-214,A82-86\',\'A215-219,A77-81\',\'A230-231,A11-12\',\'A236,A7\',\'A237-238,A4-5\'] \ |
| 207 | +contigmap.polymer_chains=[\'rna\'] \ |
| 208 | +inference.output_prefix='./outputs_2025-02-03/pC2_test01__BFF_5.00' |
| 209 | +``` |
| 210 | + |
| 211 | +(de-novo-holliday-junctions)= |
| 212 | +## Example 10: De novo Holliday junctions using strand exchange |
| 213 | +We can use symmetry and strand exchange to design Holliday-junction-style complexes. Chain |
| 214 | +and index specifications in scaffoldguided.target ss pairs refer to chain IDs and indices in the |
| 215 | +output structure defined by contig topology. |
| 216 | +``` |
| 217 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py \ |
| 218 | +--config-name=multi_polymer \ |
| 219 | +diffuser.T=50 \ |
| 220 | +inference.symmetry='c2' \ |
| 221 | +inference.num_designs=3 contigmap.inpaint_seq=[\'D60\',\'D57\',\'D58\',\'D11\',\'D15\',\'D19\',\'D8\',\'D61\',\'D18\',\'A9\',\'A59\',\'A62\',\'A8\',\'A4\',\'A5\'] \ |
| 222 | +inference.num_designs=5 \ |
| 223 | +inference.ckpt_path='/projects/ml/afavor/RFD/models/all_na_ss_cond/train_session2024-07-08_1720455712_BFF_5.00.pt' \ |
| 224 | +contigmap.contigs=[\'A1-61,60,D14-65\ 15,B6-12,4,F1-8,15\ 15,E7-14,4,C4-10,15\ A1-61,60,D14-65\ 15,B6-12,4,F1-8,15\ 15,E7-14,4,C4-10,15\'] \ |
| 225 | +inference.ij_visible='acf-bde-gil-hjk' \ |
| 226 | +contigmap.polymer_chains=[\'protein\',\'dna\',\'dna\',\'protein\',\'dna\',\'dna\'] \ |
| 227 | +scaffoldguided.target_ss_pairs=[\'B1-10,F40-49\',\'B40-49,F1-10\',\'B16-34,C16-34\',\'E16-34,F16-34\',\'C1-10,E40-49\',\'C40-49,E1-10\'] \ |
| 228 | +inference.input_pdb='/projects/ml/afavor/test_data/DBP35opt_DBP48.pdb' \ |
| 229 | +inference.output_prefix='./outputs_2025-02-03/DBP_scaffolding_test06__BFF_4.00' |
| 230 | +``` |
| 231 | + |
| 232 | +(sequence-specification-sequence-design)= |
| 233 | +## Example 11: Sequence specification and sequence design |
| 234 | +These are two new features that I've added recently, so I'm placing two examples at the top of the |
| 235 | +wiki page so that they're the first thing people see. Anyway, we can now specify the sequence of our |
| 236 | +structures to be whatever we want! |
| 237 | +Additionally, I've trained the model to do sequence prediction, so we can decode a sequence during |
| 238 | +the denoising trajectory (this allows us to generate outputs with all of the base atoms rendered for |
| 239 | +NA stuff, as well as nice sidechain interactions for protein stuff). |
| 240 | +The default behavior throughout RFdiffusion is to keep the sequence of diffused regions masked |
| 241 | +during the trajectory, even if the outputs seem to have residue labels. RoseTTAfold must “see” |
| 242 | +sequence labels in order to generate sidechains, so I added a flag, inference.update seq t=True, |
| 243 | +which allows the model to see either a user-specified sequence or the model's predicted sequence from |
| 244 | +the previous timestep. This gives us sidechains, and it is super cool. |
| 245 | +There are two ways to control the sequence info during a trajectory: |
| 246 | +• Turn on full-sequence visibility at some timestep towards the end of the trajectory, using |
| 247 | +inference.show seq under t=15. |
| 248 | +• Gradually decode a random selection of positions at each step below some point in the trajectory, |
| 249 | +using diffuser.aa decode steps=40. |
| 250 | +Both methods work nicely, so test them both during your design process, and let Andrew know if you |
| 251 | +find that one works better! Examples using both methods are shown below. |
| 252 | + |
| 253 | +(unconditional-design-rna-protein)= |
| 254 | +### Unconditional design of RNA and protein |
| 255 | +While specifying the RNA sequence, letting the model design the protein sequence, and gradually revealing various sequence positions over the course of the last 40 steps: |
| 256 | +``` |
| 257 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \ |
| 258 | +diffuser.T=50 \ |
| 259 | +inference.num_designs=3 \ |
| 260 | +contigmap.contigs=[\'43\ 20\ 75\'] \ |
| 261 | +contigmap.polymer_chains=[\'rna\',\'rna\',\'protein\'] \ |
| 262 | +inference.set_sequence=[\'A1-43:GGAUGUACUACCAGCUGAUGAGUCCCAAAUAGGACGAAACGCC\',\'B1-20:GGCGUCCUGGUAUCCAAUCC\'] \ |
| 263 | +inference.update_seq_t=True \ |
| 264 | +diffuser.aa_decode_steps=40 \ |
| 265 | +inference.output_prefix='./demo_outputs/RNA-prot_seq-spec_and_seq-design_standard_settings' |
| 266 | +``` |
| 267 | + |
| 268 | +(unconditional-design-dna-protein)= |
| 269 | +### Unconditional design of DNA and protein |
| 270 | +While specifying the DNA sequence, letting the model design the protein sequence, and letting the model see for the last 15 steps (currently throws an error but works with autoregressive decoding): |
| 271 | +``` |
| 272 | +/path/to/SE3nv.sif /path/to/rf_diffusion/run_inference.py --config-name=multi_polymer \ |
| 273 | +diffuser.T=50 \ |
| 274 | +inference.num_designs=3 \ |
| 275 | +contigmap.contigs=[\'33\ 33\ 75\'] \ |
| 276 | +contigmap.polymer_chains=[\'dna\',\'dna\',\'protein\'] \ |
| 277 | +scaffoldguided.target_ss_pairs=[\'A11-23,B11-23\'] \ |
| 278 | +inference.set_sequence=[\'A11-23:TAGCAGGATGTGT\'] \ |
| 279 | +inference.assume_canonical_pair_seq=True \ |
| 280 | +inference.update_seq_t=True \ |
| 281 | +inference.show_seq_under_t=15 \ |
| 282 | +inference.output_prefix='./demo_outputs/DNA-prot_seq-spec_and_seq-design_standard_settings' |
| 283 | +``` |
| 284 | +Notice in the DNA example above, only one of the dsDNA chains is specified. Since we specified |
| 285 | +the paired regions using scaffoldguided.target ss pairs=['A11-23,B11-23'], the model knows |
| 286 | +which bases should be paired. Then, we can use the flag inference.assume canonical pair seq=True |
| 287 | +to fill in canonical base-pair partners automatically. |
| 288 | + |
| 289 | + |
| 290 | +## Further Reading |
| 291 | + |
| 292 | +### |
| 293 | + |
| 294 | +(residue-specification-arguments)= |
| 295 | +## Residue Specification Arguments |
| 296 | + |
| 297 | +Reference structure | Argument |
| 298 | +---------------------------------------------------------------------- |
| 299 | +Input PDB | contigmap.contigs=[\'B1-14,5,…,H1-9,0\'] |
| 300 | +---------------------------------------------------------------------- |
| 301 | + | scaffoldguided.target_ss_pairs=… |
| 302 | +Output design | scaffoldguided.target_ss_string_list=… |
| 303 | + | contigmap.ij_visible=… |
| 304 | +---------------------------------------------------------------------- |
| 305 | + |
0 commit comments