LearningToOptimize · andrewrosemberg · Jul 1, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 3, 2026
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
@@ -11,3 +11,9 @@ updates:
     directory: "/"
     schedule:
       interval: "weekly"
+    ignore:
+      # The v5→v7 bump silently broke coverage uploads ("Missing Head Commit"
+      # on PRs). Keep codecov-action pinned until a deliberate, verified
+      # migration — see the comment in .github/workflows/CI.yml.
+      - dependency-name: "codecov/codecov-action"
+        update-types: ["version-update:semver-major"]
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -24,7 +24,7 @@ jobs:
       matrix:
         version:
           - '1.10'
-          # - 'nightly'
+          - '1.12'
         os:
           - ubuntu-latest
         arch:
@@ -39,8 +39,15 @@ jobs:
       - uses: julia-actions/julia-buildpkg@v1
       - uses: julia-actions/julia-runtest@v1
       - uses: julia-actions/julia-processcoverage@v1
-      - uses: codecov/codecov-action@v7
+      # Pinned to v5: the dependabot bump to v7 (2026-06-25) silently broke
+      # uploads — Codecov has no commit newer than 2026-06-15, which is what
+      # produces "Missing Head Commit" on PRs. v5 is the last version verified
+      # to upload from this workflow. Before re-bumping, migrate deliberately
+      # (e.g. OIDC: `use_oidc: true` + `id-token: write` permission) and
+      # confirm a commit appears on Codecov.
+      - uses: codecov/codecov-action@v5
         with:
           files: lcov.info
           token: ${{ secrets.CODECOV_TOKEN }}
-          fail_ci_if_error: false
+          # Fail loudly: a silent upload failure hid this breakage for weeks.
+          fail_ci_if_error: true
diff --git a/README.md b/README.md
@@ -54,22 +54,25 @@ using DecisionRules, JuMP, DiffOpt, Flux
 using SCS
 
 # 1) Build per-stage subproblems (DiffOpt-enabled) and collect:
-#    subproblems, state_params_in, state_params_out, uncertainty_sampler, uncertainties_structure
+#    subproblems, state_params_in, state_params_out, uncertainty_samples
 
 # 2) Build the deterministic equivalent over the full horizon
 det = DiffOpt.diff_model(() -> DiffOpt.diff_optimizer(SCS.Optimizer))
 
-det, uncertainties_structure_det = DecisionRules.deterministic_equivalent!(
+det, uncertainty_samples_det = DecisionRules.deterministic_equivalent!(
     det,
     subproblems,
     state_params_in,
     state_params_out,
     Float64.(initial_state),
-    uncertainties_structure,
+    uncertainty_samples,
 )
 
+# deterministic_equivalent! remaps state_params_in/state_params_out in place.
+# Copy those arrays first if you also need the original stage-wise refs later.
+
 # 3) Train a TS-DDR policy end-to-end
-num_uncertainties = length(uncertainty_sampler()[1])  # number of uncertainty components per stage
+num_uncertainties = length(uncertainty_samples[1])  # number of uncertainty components per stage
 policy = Chain(
     Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
     Dense(64, length(initial_state)),
@@ -79,9 +82,9 @@ DecisionRules.train_multistage(
     policy,
     initial_state,
     det,
-    state_in_det,
-    state_out_det,
-    uncertainty_sampler;
+    state_params_in,
+    state_params_out,
+    uncertainty_samples_det;
     num_batches=100,
     num_train_per_batch=32,
     optimizer=Flux.Adam(1e-3),
@@ -97,7 +100,7 @@ Single shooting solves one optimization per stage and rolls forward using the re
 ```julia
 using DecisionRules, Flux
 
-num_uncertainties = length(uncertainty_sampler()[1])
+num_uncertainties = length(uncertainty_samples[1])
 policy = Chain(
     Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
     Dense(64, length(initial_state)),
@@ -109,7 +112,7 @@ DecisionRules.train_multistage(
     subproblems,
     state_params_in,
     state_params_out,
-    uncertainty_sampler;
+    uncertainty_samples;
     num_batches=100,
     num_train_per_batch=32,
     optimizer=Flux.Adam(1e-3),
@@ -126,7 +129,7 @@ Multiple shooting partitions the horizon into windows of length `window_size`. E
 using DecisionRules, Flux, DiffOpt
 using SCS
 
-num_uncertainties = length(uncertainty_sampler()[1])
+num_uncertainties = length(uncertainty_samples[1])
 policy = Chain(
     Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
     Dense(64, length(initial_state)),
@@ -149,10 +152,7 @@ DecisionRules.train_multiple_shooting(
     policy,
     initial_state,
     windows,
-    state_params_in,
-    state_params_out,
-    uncertainty_sampler;
-    window_size=24,  # e.g., 6, 24, ...
+    uncertainty_samples;
     num_batches=100,
     num_train_per_batch=32,
     optimizer=Flux.Adam(1e-3),
@@ -168,7 +168,8 @@ The training loops record metrics through a per-sample `SampleLog` cache and a p
 ```julia
 using DecisionRules, Random
 
-# Materialize a FIXED held-out evaluation set once, before training
+# Materialize a FIXED held-out evaluation set once, before training.
+# Use stage-wise subproblems and parameter refs, not DE-remapped refs.
 Random.seed!(1234)
 eval_scenarios = [DecisionRules.sample(uncertainty_samples) for _ in 1:8]
 
@@ -178,7 +179,7 @@ rollout_eval = RolloutEvaluation(
     policy_state=:realized,
 )
 
-train_multistage(policy, initial_state, det, state_in_det, state_out_det, uncertainty_sampler;
+train_multistage(policy, initial_state, det, state_params_in, state_params_out, uncertainty_samples_det;
     num_batches=100,
     record=(sample_log, iter, model) -> begin
         rollout_eval(iter, model)
@@ -202,6 +203,48 @@ Each evaluation reports (a) the rollout objective **excluding** the target-slack
 
 Per-sample debugging hooks can be attached with `SampleLog(on_sample=(s, models, log) -> ...)`; the training loop calls the hook after each sample's solve with the live JuMP model(s). The previous `record_loss=(iter, model, loss, tag) -> ...` keyword keeps working as a deprecated adapter.
 
+## Strict mode and reachable policies
+
+The standard TS-DDR target constraint uses slack:
+
+```math
+x_t + \delta_t = \hat{x}_t,
+\qquad
+\text{objective} += C_\delta \|\delta_t\|.
+```
+
+Slack makes training robust to unreachable targets, but it also makes the dual
+signal depend on the target-penalty calibration. Strict mode removes the slack:
+
+```math
+x_t = \hat{x}_t.
+```
+
+The resulting dual is the clean shadow price of imposing the target. The price
+of that cleaner signal is feasibility: every policy target must be reachable
+from the state used to condition the policy.
+
+This is automatic in the hydro strict subproblem path because each stage is
+solved sequentially and the policy receives the realized previous reservoir
+state. It is also possible in regular deterministic equivalents when the target
+trajectory is rolled out from the true initial state using a reachable policy:
+
+```math
+\hat{x}_0 = x_0,\qquad
+\hat{x}_t = \pi_\theta(w_t, \hat{x}_{t-1}),\qquad
+\hat{x}_t \in R(\hat{x}_{t-1}, w_t).
+```
+
+By induction, all targets are feasible, and the strict equalities force the
+realized trajectory to match that reachable path. See the hydro example and the
+DecisionRulesExa.jl companion for the GPU strict regular-DE implementation.
+
+The policy helpers separate two architectural choices:
+
+- recurrent `layers` / `DR_ENCODER_LAYERS` process uncertainty history only;
+- `combiner_layers` / `DR_HEAD_LAYERS` add a nonlinear feed-forward
+  state-to-target head without recurrence over the state input.
+
 ## GPU acceleration with DecisionRulesExa.jl
 
 For large-scale problems where the inner NLP solve is the bottleneck (e.g., AC-OPF with hundreds of buses), [DecisionRulesExa.jl](https://github.com/LearningToOptimize/DecisionRulesExa.jl) provides a GPU-accelerated backend that replaces JuMP with [ExaModels.jl](https://github.com/exanauts/ExaModels.jl) and solves with [MadNLP.jl](https://github.com/MadNLP/MadNLP.jl) + CUDSS on GPU.
@@ -222,6 +265,26 @@ Examples live in `examples/`. Run tests with:
 julia --project -e 'using Pkg; Pkg.test()'
 ```
 
+## Repository Map
+
+| Path | Purpose |
+|---|---|
+| `src/DecisionRules.jl` | Module entrypoint and exports |
+| `src/dense_multilayer_nn.jl` | MLP helpers, state-conditioned recurrent policies, nonlinear target heads |
+| `src/simulate_multistage.jl` | Stage-wise and deterministic-equivalent simulation logic |
+| `src/multiple_shooting.jl` | Windowed multiple-shooting setup, simulation, and training |
+| `src/utils.jl` | Target-parameter utilities, deficit construction, rollout evaluation |
+| `src/integer_strategies.jl` | Strategies for extracting gradients from integer/mixed-integer models |
+| `src/score_function.jl` | Score-function gradient correction for nonsmooth/integer problems |
+| `src/parameter_duals.jl` | Dual/sensitivity helpers for parameterized JuMP models |
+| `docs/src/` | Documenter.jl manual pages |
+| `examples/HydroPowerModels/` | Bolivia hydrothermal scheduling, strict reachable policies, SDDP comparisons |
+| `examples/inventory_control/` | Inventory-control example and dynamic-programming/SDDP comparisons |
+| `examples/rocket_control/` | Rocket MPC/control example |
+| `examples/RL/` | Reinforcement-learning style hydro scripts |
+| `examples/Experimental/` | Research prototypes and robotics/control explorations |
+| `test/runtests.jl` | Package test suite |
+
 ## Citation
 
 If you use this package in academic work, please cite:

diff --git a/docs/make.jl b/docs/make.jl
@@ -20,6 +20,7 @@ makedocs(;
     format=Documenter.HTML(;
         prettyurls=get(ENV, "CI", nothing) == "true",
         canonical="https://LearningToOptimize.github.io/DecisionRules.jl",
+        size_threshold=300 * 1024,
     ),
     pages=[
         "Home" => "index.md",