Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,9 @@ updates:
directory: "/"
schedule:
interval: "weekly"
ignore:
# The v5→v7 bump silently broke coverage uploads ("Missing Head Commit"
# on PRs). Keep codecov-action pinned until a deliberate, verified
# migration — see the comment in .github/workflows/CI.yml.
- dependency-name: "codecov/codecov-action"
update-types: ["version-update:semver-major"]
13 changes: 10 additions & 3 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
matrix:
version:
- '1.10'
# - 'nightly'
- '1.12'
os:
- ubuntu-latest
arch:
Expand All @@ -39,8 +39,15 @@ jobs:
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v7
# Pinned to v5: the dependabot bump to v7 (2026-06-25) silently broke
# uploads — Codecov has no commit newer than 2026-06-15, which is what
# produces "Missing Head Commit" on PRs. v5 is the last version verified
# to upload from this workflow. Before re-bumping, migrate deliberately
# (e.g. OIDC: `use_oidc: true` + `id-token: write` permission) and
# confirm a commit appears on Codecov.
- uses: codecov/codecov-action@v5
with:
files: lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
# Fail loudly: a silent upload failure hid this breakage for weeks.
fail_ci_if_error: true
95 changes: 79 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,22 +54,25 @@ using DecisionRules, JuMP, DiffOpt, Flux
using SCS

# 1) Build per-stage subproblems (DiffOpt-enabled) and collect:
# subproblems, state_params_in, state_params_out, uncertainty_sampler, uncertainties_structure
# subproblems, state_params_in, state_params_out, uncertainty_samples

# 2) Build the deterministic equivalent over the full horizon
det = DiffOpt.diff_model(() -> DiffOpt.diff_optimizer(SCS.Optimizer))

det, uncertainties_structure_det = DecisionRules.deterministic_equivalent!(
det, uncertainty_samples_det = DecisionRules.deterministic_equivalent!(
det,
subproblems,
state_params_in,
state_params_out,
Float64.(initial_state),
uncertainties_structure,
uncertainty_samples,
)

# deterministic_equivalent! remaps state_params_in/state_params_out in place.
# Copy those arrays first if you also need the original stage-wise refs later.

# 3) Train a TS-DDR policy end-to-end
num_uncertainties = length(uncertainty_sampler()[1]) # number of uncertainty components per stage
num_uncertainties = length(uncertainty_samples[1]) # number of uncertainty components per stage
policy = Chain(
Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
Dense(64, length(initial_state)),
Expand All @@ -79,9 +82,9 @@ DecisionRules.train_multistage(
policy,
initial_state,
det,
state_in_det,
state_out_det,
uncertainty_sampler;
state_params_in,
state_params_out,
uncertainty_samples_det;
num_batches=100,
num_train_per_batch=32,
optimizer=Flux.Adam(1e-3),
Expand All @@ -97,7 +100,7 @@ Single shooting solves one optimization per stage and rolls forward using the re
```julia
using DecisionRules, Flux

num_uncertainties = length(uncertainty_sampler()[1])
num_uncertainties = length(uncertainty_samples[1])
policy = Chain(
Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
Dense(64, length(initial_state)),
Expand All @@ -109,7 +112,7 @@ DecisionRules.train_multistage(
subproblems,
state_params_in,
state_params_out,
uncertainty_sampler;
uncertainty_samples;
num_batches=100,
num_train_per_batch=32,
optimizer=Flux.Adam(1e-3),
Expand All @@ -126,7 +129,7 @@ Multiple shooting partitions the horizon into windows of length `window_size`. E
using DecisionRules, Flux, DiffOpt
using SCS

num_uncertainties = length(uncertainty_sampler()[1])
num_uncertainties = length(uncertainty_samples[1])
policy = Chain(
Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
Dense(64, length(initial_state)),
Expand All @@ -149,10 +152,7 @@ DecisionRules.train_multiple_shooting(
policy,
initial_state,
windows,
state_params_in,
state_params_out,
uncertainty_sampler;
window_size=24, # e.g., 6, 24, ...
uncertainty_samples;
num_batches=100,
num_train_per_batch=32,
optimizer=Flux.Adam(1e-3),
Expand All @@ -168,7 +168,8 @@ The training loops record metrics through a per-sample `SampleLog` cache and a p
```julia
using DecisionRules, Random

# Materialize a FIXED held-out evaluation set once, before training
# Materialize a FIXED held-out evaluation set once, before training.
# Use stage-wise subproblems and parameter refs, not DE-remapped refs.
Random.seed!(1234)
eval_scenarios = [DecisionRules.sample(uncertainty_samples) for _ in 1:8]

Expand All @@ -178,7 +179,7 @@ rollout_eval = RolloutEvaluation(
policy_state=:realized,
)

train_multistage(policy, initial_state, det, state_in_det, state_out_det, uncertainty_sampler;
train_multistage(policy, initial_state, det, state_params_in, state_params_out, uncertainty_samples_det;
num_batches=100,
record=(sample_log, iter, model) -> begin
rollout_eval(iter, model)
Expand All @@ -202,6 +203,48 @@ Each evaluation reports (a) the rollout objective **excluding** the target-slack

Per-sample debugging hooks can be attached with `SampleLog(on_sample=(s, models, log) -> ...)`; the training loop calls the hook after each sample's solve with the live JuMP model(s). The previous `record_loss=(iter, model, loss, tag) -> ...` keyword keeps working as a deprecated adapter.

## Strict mode and reachable policies

The standard TS-DDR target constraint uses slack:

```math
x_t + \delta_t = \hat{x}_t,
\qquad
\text{objective} += C_\delta \|\delta_t\|.
```

Slack makes training robust to unreachable targets, but it also makes the dual
signal depend on the target-penalty calibration. Strict mode removes the slack:

```math
x_t = \hat{x}_t.
```

The resulting dual is the clean shadow price of imposing the target. The price
of that cleaner signal is feasibility: every policy target must be reachable
from the state used to condition the policy.

This is automatic in the hydro strict subproblem path because each stage is
solved sequentially and the policy receives the realized previous reservoir
state. It is also possible in regular deterministic equivalents when the target
trajectory is rolled out from the true initial state using a reachable policy:

```math
\hat{x}_0 = x_0,\qquad
\hat{x}_t = \pi_\theta(w_t, \hat{x}_{t-1}),\qquad
\hat{x}_t \in R(\hat{x}_{t-1}, w_t).
```

By induction, all targets are feasible, and the strict equalities force the
realized trajectory to match that reachable path. See the hydro example and the
DecisionRulesExa.jl companion for the GPU strict regular-DE implementation.

The policy helpers separate two architectural choices:

- recurrent `layers` / `DR_ENCODER_LAYERS` process uncertainty history only;
- `combiner_layers` / `DR_HEAD_LAYERS` add a nonlinear feed-forward
state-to-target head without recurrence over the state input.

## GPU acceleration with DecisionRulesExa.jl

For large-scale problems where the inner NLP solve is the bottleneck (e.g., AC-OPF with hundreds of buses), [DecisionRulesExa.jl](https://github.com/LearningToOptimize/DecisionRulesExa.jl) provides a GPU-accelerated backend that replaces JuMP with [ExaModels.jl](https://github.com/exanauts/ExaModels.jl) and solves with [MadNLP.jl](https://github.com/MadNLP/MadNLP.jl) + CUDSS on GPU.
Expand All @@ -222,6 +265,26 @@ Examples live in `examples/`. Run tests with:
julia --project -e 'using Pkg; Pkg.test()'
```

## Repository Map

| Path | Purpose |
|---|---|
| `src/DecisionRules.jl` | Module entrypoint and exports |
| `src/dense_multilayer_nn.jl` | MLP helpers, state-conditioned recurrent policies, nonlinear target heads |
| `src/simulate_multistage.jl` | Stage-wise and deterministic-equivalent simulation logic |
| `src/multiple_shooting.jl` | Windowed multiple-shooting setup, simulation, and training |
| `src/utils.jl` | Target-parameter utilities, deficit construction, rollout evaluation |
| `src/integer_strategies.jl` | Strategies for extracting gradients from integer/mixed-integer models |
| `src/score_function.jl` | Score-function gradient correction for nonsmooth/integer problems |
| `src/parameter_duals.jl` | Dual/sensitivity helpers for parameterized JuMP models |
| `docs/src/` | Documenter.jl manual pages |
| `examples/HydroPowerModels/` | Bolivia hydrothermal scheduling, strict reachable policies, SDDP comparisons |
| `examples/inventory_control/` | Inventory-control example and dynamic-programming/SDDP comparisons |
| `examples/rocket_control/` | Rocket MPC/control example |
| `examples/RL/` | Reinforcement-learning style hydro scripts |
| `examples/Experimental/` | Research prototypes and robotics/control explorations |
| `test/runtests.jl` | Package test suite |

## Citation

If you use this package in academic work, please cite:
Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ makedocs(;
format=Documenter.HTML(;
prettyurls=get(ENV, "CI", nothing) == "true",
canonical="https://LearningToOptimize.github.io/DecisionRules.jl",
size_threshold=300 * 1024,
),
pages=[
"Home" => "index.md",
Expand Down
Loading
Loading