Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 27, 2026

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

  • New Features

    • Modular clustering toolkit: aggregation, iteration, expansion, and inter-cluster helpers for multi-period/multi-scenario workflows.
    • Expanded variable expansion options (segment interpolation, first-timestep handling, final-state append).
  • Refactor

    • Core clustering, aggregation, and expansion logic reorganized into reusable modules and a unified dimension abstraction.
  • Tests

    • Added tests for extremes handling and startup/shutdown expansion (segmented and non-segmented).
  • Chores

    • CI dependency references updated.

✏️ Tip: You can customize this high-level summary in your review settings.

  1. Fixed _interpolate_charge_state_segmented Bug (Critical)

  File: flixopt/transform_accessor.py:1965-2043

  The function was incorrectly decoding timestep_mapping using timesteps_per_cluster. For segmented systems, timestep_mapping encodes cluster * n_segments + segment_idx, so
  this produced wrong cluster indices.

  Fix: Compute original period index and position directly from timestep indices instead of decoding from timestep_mapping.

  2. Implemented EXPAND_FIRST_TIMESTEP for Segmented Systems Only

  File: flixopt/transform_accessor.py:2045-2100

  Added _expand_first_timestep_only() method that places startup/shutdown events at the first timestep of each segment (not cluster).

  Key design decisions:
  - Segmented systems: Events placed at first timestep of each segment (timing within segment is lost)
  - Non-segmented systems: Normal expansion preserves timing within cluster (no special handling needed)

  3. Added Tests

  - test_startup_shutdown_first_timestep_only: Verifies segmented systems place events at segment boundaries
  - test_startup_timing_preserved_non_segmented: Verifies non-segmented systems preserve timing within clusters

  4. Updated Docstrings

  Clarified in expand() docstring that:
  - Binary events in segmented systems go to first timestep of each segment
  - Non-segmented systems preserve timing via normal expansion
  1. Fixed _interpolate_charge_state_segmented Bug (Critical)

  File: flixopt/transform_accessor.py:1965-2043

  The function was incorrectly decoding timestep_mapping using timesteps_per_cluster. For segmented systems, timestep_mapping encodes cluster * n_segments + segment_idx, so
  this produced wrong cluster indices.

  Fix: Compute original period index and position directly from timestep indices instead of decoding from timestep_mapping.

  2. Implemented EXPAND_FIRST_TIMESTEP for Segmented Systems Only

  File: flixopt/transform_accessor.py:2045-2100

  Added _expand_first_timestep_only() method that places startup/shutdown events at the first timestep of each segment (not cluster).

  Key design decisions:
  - Segmented systems: Events placed at first timestep of each segment (timing within segment is lost)
  - Non-segmented systems: Normal expansion preserves timing within cluster (no special handling needed)

  3. Added Tests

  - test_startup_shutdown_first_timestep_only: Verifies segmented systems place events at segment boundaries
  - test_startup_timing_preserved_non_segmented: Verifies non-segmented systems preserve timing within clusters

  4. Updated Docstrings

  Clarified in expand() docstring that:
  - Binary events in segmented systems go to first timestep of each segment
  - Non-segmented systems preserve timing via normal expansion
…scenario) dimensions:

  1. For each missing (period_label, scenario_label) key, it selects the specific period/scenario slice from the original variable using .sel(..., drop=True)
  2. Then it slices and reshapes that specific slice
  3. All slices in filled_slices now have consistent dimensions ['cluster', 'time'] + other_dims without 'period' or 'scenario' coordinates

  This ensures all DataArrays being concatenated have the same structure. Can you try running your clustering code again?
  New Modules Created

  1. flixopt/clustering/iteration.py - Shared iteration infrastructure:
    - DimSliceContext dataclass for (period, scenario) slice context
    - DimInfo dataclass for dimension metadata
    - iter_dim_slices() generator for standardized iteration
    - iter_dim_slices_simple() convenience function
  2. flixopt/clustering/aggregation.py - Aggregation helpers:
    - combine_slices_to_dataarray() - Unified slice combination function
    - build_typical_dataarrays() - Build typical periods DataArrays
    - build_segment_durations() - Build segment duration DataArray
    - build_cluster_weights() - Build cluster weight DataArray
    - build_clustering_metrics() - Build metrics Dataset
    - calculate_clustering_weights() - Calculate clustering weights from data
    - build_cluster_config_with_weights() - Merge weights into ClusterConfig
    - accuracy_to_dataframe() - Convert tsam AccuracyMetrics to DataFrame
  3. flixopt/clustering/expansion.py - Expansion helpers:
    - VariableExpansionHandler class - Handles variable-specific expansion logic
    - interpolate_charge_state_segmented() - Interpolate state variables in segments
    - expand_first_timestep_only() - Expand binary events to first timestep
    - build_segment_total_varnames() - Build segment total variable names
    - append_final_state() - Append final state value to expanded data
  4. Updated flixopt/clustering/intercluster_helpers.py:
    - Added combine_intercluster_charge_states() - Combine charge_state with SOC_boundary
    - Added apply_soc_decay() - Apply self-discharge decay to SOC values

  Refactored flixopt/transform_accessor.py

  - cluster() method: Now uses DimInfo and iter_dim_slices() for standardized iteration
  - expand() method: Uses VariableExpansionHandler and extracted helper functions
  - _build_reduced_flow_system(): Signature changed to use DimInfo instead of separate periods/scenarios lists
  - _build_reduced_dataset(): Updated to use DimInfo and combine_slices_to_dataarray()
  - apply_clustering(): Uses DimInfo for cleaner key conversion

  Removed from transform_accessor.py

  - _calculate_clustering_weights() → moved to clustering/aggregation.py
  - _build_cluster_config_with_weights() → moved to clustering/aggregation.py
  - _accuracy_to_dataframe() → moved to clustering/aggregation.py
  - _build_cluster_weight_da() → moved to clustering/aggregation.py
  - _build_typical_das() → moved to clustering/aggregation.py
  - _build_segment_durations_da() → moved to clustering/aggregation.py
  - _build_clustering_metrics() → moved to clustering/aggregation.py
  - _combine_slices_to_dataarray_generic() → replaced by unified combine_slices_to_dataarray()
  - _combine_slices_to_dataarray_2d() → replaced by unified combine_slices_to_dataarray()
  - _combine_intercluster_charge_states() → moved to clustering/intercluster_helpers.py
  - _apply_soc_decay() → moved to clustering/intercluster_helpers.py
  - _build_segment_total_varnames() → moved to clustering/expansion.py
  - _interpolate_charge_state_segmented() → moved to clustering/expansion.py
  - _expand_first_timestep_only() → moved to clustering/expansion.py

  Updated flixopt/clustering/__init__.py

  - Exports all new helpers from the new modules

  Tests

  - All 102 tests in test_cluster_reduce_expand.py pass
  - All 43 tests in test_clustering_io.py pass
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Warning

Rate limit exceeded

@FBumann has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Modularizes clustering, expansion, and inter-cluster operations by extracting implementations from transform_accessor into dedicated clustering submodules; adds DimInfo for multi-dimensional (period/scenario) handling; re-exports new public APIs via flixopt.clustering.init; updates consumers to use the new helpers. (49 words)

Changes

Cohort / File(s) Summary
New Clustering Infrastructure
flixopt/clustering/aggregation.py
New module (≈445 lines) with helpers to combine per-(period,scenario) slices into xarray DataArrays/Datasets: combine_slices_to_dataarray, build_typical_dataarrays, build_segment_durations, build_cluster_weights, build_clustering_metrics, build_cluster_assignments_dataarray, calculate_clustering_weights, build_cluster_config_with_weights, accuracy_to_dataframe.
Expansion Utilities
flixopt/clustering/expansion.py
New module (≈324 lines) introducing VariableExpansionHandler and functions build_segment_total_varnames, interpolate_charge_state_segmented, expand_first_timestep_only, append_final_state for expanding clustered variables back to original timesteps, with compatibility fallbacks.
Iteration Abstraction
flixopt/clustering/iteration.py
New module (≈211 lines) adding DimInfo and DimSliceContext dataclasses plus iter_dim_slices / iter_dim_slices_simple to standardize (period, scenario) iteration and key/selector translation.
Inter-cluster Storage
flixopt/clustering/intercluster_helpers.py
New helpers (≈134 lines): combine_intercluster_charge_states and apply_soc_decay to reconstruct SOC across original timesteps, map cluster/timestep indices, apply self-discharge decay, and clean intermediate boundary vars.
Public API Expansion
flixopt/clustering/__init__.py
Re-exports new symbols from aggregation, expansion, iteration, and intercluster_helpers (adds VariableExpansionHandler, DimInfo, CapacityBounds, many helper functions, and clustering result types).
Major Refactor / Consumer Update
flixopt/transform_accessor.py
Replaces in-file clustering/aggregation/expansion logic with imports from new clustering modules; adopts DimInfo + iter_dim_slices for selection; delegates aggregation, expansion, and inter-cluster SOC reconstruction to modular helpers (net: +194/-837 lines).
Tests
tests/test_cluster_reduce_expand.py
Adds tests (≈240 lines) for extremes handling (new_cluster, replace) across single/multi-period and segmented cases, plus startup/shutdown expansion tests (first-timestep placement and timing preservation).
CI / Pinning
.github/workflows/*.yaml, pyproject.toml
Updated Git commit references for installing tsam v3 in CI and commented optional dependency lines (small commit-hash updates).

Sequence Diagram(s)

sequenceDiagram
  participant TA as transform_accessor
  participant DIM as clustering.iteration::DimInfo / iter_dim_slices
  participant TSAM as tsam.aggregate
  participant AGG as clustering.aggregation
  participant EXP as clustering.expansion::VariableExpansionHandler
  participant ICC as clustering.intercluster_helpers

  TA->>DIM: build DimInfo from flow_system
  TA->>DIM: iterate slices via iter_dim_slices
  loop per (period,scenario)
    TA->>TSAM: call aggregate on selected data
    TSAM-->>TA: Aggregation results (per-slice)
    TA->>AGG: combine_slices_to_dataarray / build_typical_dataarrays
    AGG-->>TA: reduced DataArrays/Datasets
  end
  TA->>EXP: decide expansion paths via VariableExpansionHandler
  EXP-->>TA: expanded per-variable DataArrays
  TA->>ICC: combine_intercluster_charge_states (apply_soc_decay if needed)
  ICC-->>TA: SOC-reconstructed solution
  TA->>TA: assemble final expanded FlowSystem / dataset
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇✨ I hopped through modules, tidy and neat,
Moved scattered carrots into one neat heap,
DimInfo guides my seasoned quests,
Expand, combine — I build my nests,
A little rabbit cheers this refactor feat!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a template with mostly unchecked boxes and minimal actual content; only a few checklist items are marked but no substantive description of changes is provided. Fill in the Description field with a brief summary of changes (e.g., modularizing clustering into submodules), specify the Type of Change, provide issue reference, and confirm testing status.
Title check ❓ Inconclusive The title "Fix/expand flow system rework" is vague and generic, using non-descriptive terms that don't clearly convey the specific changes or main purpose of the changeset. Replace the vague title with a more specific and descriptive one that captures the main change, such as "Modularize clustering, expansion, and aggregation utilities" or "Refactor clustering operations into dedicated modules".
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 97.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
flixopt/transform_accessor.py (2)

376-387: Update _build_cluster_assignments_da to use the unified combine_slices_to_dataarray function.

The method calls self._combine_slices_to_dataarray_generic(), which no longer exists—it was unified into combine_slices_to_dataarray() (see comment at line 1371-1372). This will raise AttributeError when invoked.

Proposed fix
-            return self._combine_slices_to_dataarray_generic(
-                cluster_assignments_slices, ['original_cluster'], periods, scenarios, 'cluster_assignments'
-            )
+            dim_info = DimInfo(periods=periods, scenarios=scenarios)
+            return combine_slices_to_dataarray(
+                slices=cluster_assignments_slices,
+                dim_info=dim_info,
+                base_dims=['original_cluster'],
+                name='cluster_assignments',
+            )

1320-1351: Add explicit period/scenario validation before calling clustering.results.apply().

The docstring promises a ValueError if clustering dimensions don't match, but currently validation is implicit via xarray's .sel(), which raises KeyError with an unclear message. Add a preemptive guard that validates clustering.coords (e.g., checking that all periods/scenarios in clustering exist in the dataset) before line 1340, with a clear error message matching the docstring's contract.

🤖 Fix all issues with AI agents
In `@flixopt/clustering/intercluster_helpers.py`:
- Around line 277-335: The code assumes loss_value has a .values attribute which
fails for scalar returns from _scalar_safe_reduce; update apply_soc_decay to
handle both scalars and DataArray: after loss_value = _scalar_safe_reduce(...),
normalize it by wrapping scalars into an xarray/DataArray or use numpy
conversion (e.g., loss_arr = np.asarray(loss_value)) and replace the check if
not np.any(loss_value.values > 0) with if not np.any(loss_arr > 0); then use
loss_arr (or xr.DataArray(loss_value) if you prefer) for the decay calculation
(decay_da = (1 - loss_arr) ** time_within_period_da) so both scalar floats and
DataArrays from storage.relative_loss_per_hour work without AttributeError.

In `@flixopt/transform_accessor.py`:
- Around line 1164-1176: The current check uses is_multi_dimensional =
dim_info.has_periods or dim_info.has_scenarios which wrongly blocks
extremes.method 'new_cluster'/'append' even for a single period/scenario slice;
change the guard to detect multiple period/scenario combinations instead (e.g.,
compute total_slices from dim_info — using available attributes like
dim_info.periods or dim_info.n_periods and dim_info.scenarios or
dim_info.n_scenarios — and treat multi-dimensional only when total_slices > 1),
then only raise the ValueError for extremes.method in ('new_cluster', 'append')
when total_slices > 1. Ensure you reference the same variables used in the diff
(dim_info, extremes, extreme_method) when updating the condition.
🧹 Nitpick comments (3)
flixopt/clustering/aggregation.py (1)

98-100: Preserve full base_dims ordering after concatenation.

Only the first base dimension is forced to the front, so other base dims (e.g., time) can end up after period/scenario, contradicting the stated order. Consider transposing with all base dims up front.

♻️ Proposed change
-    result = result.transpose(base_dims[0], ...)
+    result = result.transpose(*base_dims, ...)
tests/test_cluster_reduce_expand.py (2)

841-870: Align the new_cluster expectation with the assertion.

The description says new_cluster should increase n_clusters, but the assertion allows no increase. Consider tightening the assertion or updating the wording to match the intended behavior.

💡 Possible tweak
-        assert fs_clustered.clustering.n_clusters >= 2
+        assert fs_clustered.clustering.n_clusters > 2

908-923: Test name/docstring doesn’t match the ExtremeConfig method used.

The test is named for new_cluster, but it configures method='append'. Consider renaming the test/docstring or switching the method to keep intent unambiguous.

💡 Possible rename
-    def test_extremes_new_cluster_with_segments(self, solver_fixture, timesteps_8_days):
-        """Test that method='new_cluster' works correctly with segmentation."""
+    def test_extremes_append_with_segments(self, solver_fixture, timesteps_8_days):
+        """Test that method='append' works correctly with segmentation."""

… to handle both scalar and DataArray returns from _scalar_safe_reduce using np.asarray().

  2. flixopt/transform_accessor.py: Fixed the multi-dimensional check to only block extremes.method='new_cluster'/'append' when there are actually multiple slices (total_slices
   > 1), not just when periods or scenarios exist.
  3. flixopt/clustering/aggregation.py: Fixed combine_slices_to_dataarray to transpose with all base dims in order (*base_dims, ...) instead of just the first one.
  4. tests/test_cluster_reduce_expand.py:
    - Updated test_extremes_new_cluster_increases_n_clusters docstring to clarify that tsam may or may not add clusters (the >= 2 assertion was actually correct)
    - Renamed test_extremes_new_cluster_with_segments to test_extremes_append_with_segments and updated docstring to match the actual method='append' used
…m-rework

# Conflicts:
#	flixopt/transform_accessor.py
#	tests/test_cluster_reduce_expand.py
@FBumann FBumann merged commit cd0e576 into main Jan 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants