Skip to content

Conversation

@enlorenz
Copy link

Note: Not tested "in production" yet. But Continuity checks between consecutive episodes have passed so far.

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds a carry-over state option to the environment, separates cumulative vs. episode-level metrics, and guards episode-level metric updates (baseline and normal paths) to avoid KeyError while threading per-episode scaffolding through reset and step flows.

Changes

Cohort / File(s) Summary
Environment / carry-over
src/environment.py
Adds carry_over_state ctor flag, reset_timeline_state(), conditional reset behavior (carry-over vs full reinit), initializes per-episode scaffolding, and propagates episode_* metrics through step() outputs.
Metrics refactor / episode vs cumulative
src/metrics_tracker.py
Introduces reset_cumulative_metrics(), separates cumulative and episode-level attributes (adds many total_* and episode_* fields), moves initialization, and updates emitted episode data to use episode-scoped fields.
Baseline metric guards
src/baseline.py
Adds presence-guarded updates for episode_baseline_* metrics when submitting, rejecting, or updating max queue size to avoid KeyError.
Job completions / drops tracking
src/job_management.py
Adds presence-guarded increments for episode_* and episode_baseline_* metrics on job completion and drop paths.
Tests / sanity harness
test/test_sanity_env.py
Adds carry_over_test(), make_env_with_carry() helper, CLI --carry-over-state flag, and wires carry-over into sanity checks and environment construction.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Env as Environment
    participant Jobs as JobManagement
    participant Metrics as MetricsTracker

    User->>Env: __init__(..., carry_over_state)
    Env->>Metrics: reset_cumulative_metrics()
    Env->>Env: reset_timeline_state() / init episode scaffolding

    User->>Env: step(action)
    Env->>Jobs: assign_jobs_to_available_nodes()
    Jobs->>Metrics: update completion/drop / wait-time (guarded if episode keys exist)
    Env->>Metrics: update episode_total_cost & max_queue_size (guarded)
    Env->>User: return (obs, reward, done, info with episode metrics)

    User->>Env: reset()
    alt carry_over_state == True
        Env->>Metrics: reset_episode_metrics() only
    else
        Env->>Metrics: reset_cumulative_metrics()
        Env->>Env: full state reinitialization
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • rbx
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: carry-over of state between episodes instead of resetting to zero.
Description check ✅ Passed The description is related to the changeset, acknowledging testing status and mentioning continuity checks, though it lacks detailed explanation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/environment.py (1)

296-301: Episode increment and termination logic breaks in carry-over mode due to unresolved current_hour state.

When carry_over_state=True, reset() calls only reset_episode_metrics(), which doesn't reset current_hour. Since reset_timeline_state() (which sets current_hour = 0) is called only once in __init__, subsequent episodes in carry-over mode start with current_hour at its final value from the previous episode (336).

This causes two problems:

  1. After episode 1, current_hour remains at 336 and increments to 337, 338, etc. in episode 2.
  2. The termination check at line 495 (if self.metrics.current_hour == EPISODE_HOURS) will never trigger again, causing episodes to never terminate after the first one.
  3. The episode increment check at line 299 (if self.metrics.current_hour == 1) also fails in subsequent episodes since current_hour never resets to 1.

In non-carry-over mode, reset() calls reset_state_metrics() which correctly resets current_hour = 0, avoiding this issue.

🤖 Fix all issues with AI agents
In `@src/plotter.py`:
- Around line 90-93: The metric calculations currently read attributes directly
from env (completion_rate, baseline_completion_rate, avg_wait,
baseline_avg_wait) but those values live under env.metrics; update each
expression to use env.metrics.jobs_completed, env.metrics.jobs_submitted,
env.metrics.total_job_wait_time and the baseline counterparts
(env.metrics.baseline_jobs_completed, env.metrics.baseline_jobs_submitted,
env.metrics.baseline_total_job_wait_time) and keep the same guard logic (getattr
checks or existence checks) to avoid division by zero.
🧹 Nitpick comments (12)
src/environment.py (1)

218-265: Significant code duplication between reset() and reset_timeline_state().

Lines 238-263 in reset() duplicate the state initialization logic from reset_timeline_state() (lines 270-294). This violates DRY and creates maintenance risk—if state structure changes, both locations must be updated.

♻️ Suggested refactor to eliminate duplication

Consider extracting the common state initialization to a private helper method:

+    def _init_state_structures(self):
+        """Initialize state structures for nodes, queues, and job tracking."""
+        self.state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+            'predicted_prices': self.prices.predicted_prices.copy(),
+        }
+        self.baseline_state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+        }
+        self.cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.baseline_cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.running_jobs = {}
+        self.baseline_running_jobs = {}
+        self.next_job_id = 0
+        self.next_empty_slot = 0
+        self.baseline_next_empty_slot = 0

     def reset_timeline_state(self):
         self.metrics.current_hour = 0
-        self.state = {
-            # ... all the initialization code ...
-        }
-        # ... rest of initialization ...
+        self._init_state_structures()

     def reset(self, seed=None, options=None):
         # ... existing code ...
         if not self.carry_over_state:
             # ... price reset logic ...
-            self.state = {
-                # ... all the initialization code ...
-            }
-            # ... rest of initialization ...
+            self._init_state_structures()
src/plotter.py (3)

46-58: Potential off-by-one in monthly percentage calculation.

The monthly percentage is calculated every 2 episodes (assuming 2-week episodes = 1 month), but the values are duplicated twice (extend([pct, pct])). This creates step-like visualization but may cause array length mismatches. The padding logic at lines 66-70 handles this, but it's fragile.

Consider documenting this behavior more clearly or simplifying by computing monthly percentages at the end rather than during iteration.


81-81: Unused parameter episode_costs should be removed.

The docstring at line 84-85 acknowledges this is kept for backward compatibility. Consider removing it entirely if it's truly unused, or add _ = episode_costs to silence the warning if backward compatibility is required.

♻️ Proposed fix
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+def plot_dashboard(env, num_hours, max_nodes, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
-    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
-          Cumulative savings now lives in plot_cumulative_savings().
     """

Or if backward compatibility is truly needed:

 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    del episode_costs  # Unused, kept for backward compatibility

290-290: Ambiguous variable name l.

The variable l can be confused with 1 (one) in certain fonts. Use a more descriptive name.

♻️ Proposed fix
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
train.py (4)

156-159: Remove extraneous f-prefix from plain string.

Line 156 uses an f-string without any placeholders.

Suggested fix
-        print(f"Starting a new model training...")
+        print("Starting a new model training...")

259-275: Avoid hardcoding max_nodes=335; retrieve from environment.

The hardcoded value will break if the environment's node count changes. The commented-out code shows the right approach.

Suggested fix
                 plot_dashboard(
                     env,
                     num_hours=args.dashboard_hours,
-                 #   max_nodes=env.max_nodes if hasattr(env, "max_nodes") else env.num_nodes if hasattr(env, "num_nodes") else 0,
-                    max_nodes=335,
+                    max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 335)),

287-295: Minor issues in training loop instrumentation.

  1. Line 288-289: The progress log is confusingly worded - it says "at {iters + 1}" but iters was already incremented on line 286.
  2. Line 294: f-string without placeholders on tb_log_name=f"PPO".
Suggested fixes
             iters += 1
             t0 = time.time()
-            if (iters+1)%10==0:
-                print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")
+            if iters % 10 == 0:
+                print(f"Running iteration {iters}, cumulative steps: {STEPS_PER_ITERATION * iters}")
             if args.iter_limit > 0 and iters > args.iter_limit:
                 print(f"iterations limit ({args.iter_limit}) reached: {iters}.")
                 break
             try:
-                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())
+                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())

152-159: Consider increasing n_steps or documenting the rationale for the small value.

The n_steps=64 is significantly smaller than stable-baselines3's default of 2048. For this environment with long episodes (336 hours), SB3 best practices recommend larger n_steps (1024–2048) to reduce correlated data and improve sample efficiency. The current configuration results in ~5 updates per episode, compared to ~1 per 6 episodes with the default, which increases variance and may impact learning stability. If the intent is faster iteration for development/testing, consider adding a comment explaining the choice or making it configurable via command-line argument.

train_iter.py (1)

175-178: Consider improving readability of long function call.

Line 178 is quite long and could benefit from line breaks for readability.

Suggested improvement
     for combo in combinations:
         efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight = combo
         print(f"Running with weights: efficiency={efficiency_weight}, price={price_weight}, idle={idle_weight}, job_age={job_age_weight}, drop={drop_weight}")
-        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours)
+        run(
+            efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight,
+            args.iter_limit_per_step, args.session,
+            args.prices, args.job_durations, args.jobs, args.hourly_jobs,
+            plot_dashboard=args.plot_dashboard,
+            dashboard_hours=args.dashboard_hours,
+        )
test/test_sanity_env.py (3)

48-101: Consider using imported constants instead of hardcoded values.

The invariant checks use hardcoded values (170, 96, 168, 16) that should match environment constants. The comment at line 84 acknowledges this, but since you already import MAX_JOB_DURATION, MAX_NODES_PER_JOB, CORES_PER_NODE, etc., consider using them for consistency.

Example for one check
-    assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}"
+    assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"

243-246: Fix indentation of nested function.

The norm_path function has inconsistent indentation (extra leading space).

Suggested fix
-     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
+    # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
     # To be safe: normalize "" -> None here.
     def norm_path(x):
         return None if (x is None or str(x).strip() == "") else x

336-359: Consider removing or documenting inline debugging test.

Lines 336-359 appear to be inline debugging/verification code that duplicates the formal determinism_test. If this is intentional for quick sanity checks, consider adding a comment explaining its purpose, or removing it if it's leftover debugging code.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1cefff and dd7d5cd.

📒 Files selected for processing (8)
  • src/baseline.py
  • src/environment.py
  • src/job_management.py
  • src/metrics_tracker.py
  • src/plotter.py
  • test/test_sanity_env.py
  • train.py
  • train_iter.py
🧰 Additional context used
📓 Path-based instructions (2)
train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

  • train.py
{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

  • train.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

  • src/job_management.py
  • src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

  • train_iter.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

  • train_iter.py
  • train.py
  • src/plotter.py
  • src/environment.py
  • src/metrics_tracker.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps

Applied to files:

  • train.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

  • src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

  • src/environment.py
🧬 Code graph analysis (3)
train_iter.py (1)
src/plotter.py (1)
  • plot_dashboard (81-222)
train.py (2)
src/plotter.py (2)
  • plot_dashboard (81-222)
  • plot_cumulative_savings (225-316)
src/plot.py (1)
  • plot_cumulative_savings (136-276)
src/environment.py (3)
src/metrics_tracker.py (2)
  • reset_episode_metrics (43-80)
  • reset_state_metrics (38-41)
test/test_sanity_env.py (1)
  • reset (321-327)
src/prices_deterministic.py (1)
  • reset (52-76)
🪛 Ruff (0.14.11)
test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

train_iter.py

119-119: subprocess call: check for execution of untrusted input

(S603)

train.py

156-156: f-string without any placeholders

Remove extraneous f prefix

(F541)


276-276: Do not catch blind exception: Exception

(BLE001)


294-294: f-string without any placeholders

Remove extraneous f prefix

(F541)


317-317: Do not catch blind exception: Exception

(BLE001)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)


290-290: Ambiguous variable name: l

(E741)

🔇 Additional comments (23)
src/environment.py (3)

175-176: Initialization order may cause issues with self.prices.

reset_timeline_state() at line 175 accesses self.prices.predicted_prices (line 276), but self.prices is initialized at line 121, so this should work. However, the call order means the state is initialized twice—once in reset_timeline_state() during __init__, and again in reset() when the environment is first reset. This is intentional for the carry-over flow but worth noting.


331-333: LGTM - Episode-level metrics propagation is consistent.

The episode-level metrics (episode_jobs_submitted, episode_jobs_rejected_queue_full, episode_jobs_completed, episode_total_job_wait_time, episode_jobs_dropped) are correctly incremented alongside their cumulative counterparts, maintaining parallel tracking.

Also applies to: 377-379


401-402: LGTM - Baseline and cost episode metrics are properly tracked.

Episode-level baseline metrics and costs are correctly propagated from the baseline_metrics dict and accumulated into episode_total_cost, episode_baseline_cost, and episode_baseline_cost_off.

Also applies to: 437-448, 446-447, 468-470

src/metrics_tracker.py (3)

7-14: LGTM - Clean initialization pattern.

The constructor properly separates cumulative metrics initialization from state/episode metrics, establishing a clear hierarchy.


62-76: LGTM - Comprehensive episode-level metrics.

The episode-level metrics for both agent and baseline are well-structured and mirror the cumulative metrics, enabling accurate per-episode reporting.


93-101: LGTM - Correct episode-level calculations.

The calculations for avg_wait_time, completion_rate, and drop_rate properly use episode-level counters with appropriate zero-division guards.

src/job_management.py (2)

159-167: LGTM - Backward-compatible episode metrics updates.

The presence checks (if 'episode_*' in metrics) allow this function to work with both old and new metrics dictionaries, maintaining backward compatibility.


187-193: LGTM - Consistent pattern for drop metrics.

The episode-level drop tracking mirrors the completion tracking pattern, with proper presence checks for both baseline and non-baseline paths.

src/baseline.py (2)

42-46: LGTM - Episode metrics for baseline submissions.

Correctly increments episode_baseline_jobs_submitted and episode_baseline_jobs_rejected_queue_full with presence checks for backward compatibility.


61-63: LGTM - Episode max queue size tracking.

Properly tracks per-episode maximum queue size for baseline, with the inner condition correctly checking against the episode-level threshold.

src/plotter.py (2)

8-16: LGTM - Robust series normalization helper.

The _as_series function correctly handles None, arrays of varying lengths, and padding with NaN values.


225-316: LGTM - Comprehensive cumulative savings visualization.

The function properly handles dual-axis plotting, combines legends, and provides a summary statistics box. The return value with computed statistics is useful for programmatic access.

train.py (4)

5-11: LGTM!

Import changes are appropriate - plot_dashboard and plot_cumulative_savings from the new plotter module, and time for performance instrumentation.


60-62: LGTM!

New CLI options for dashboard plotting are well-defined with sensible defaults (336 hours = 2 weeks).


298-318: LGTM on dashboard plotting integration.

The exception handling around plot_dashboard is appropriate for non-fatal plotting operations. Using Exception here is acceptable since you want to catch any plotting failure without crashing training.


218-251: Good implementation of evaluation metrics.

The evaluation output correctly includes per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections), aligning with the coding guidelines.

train_iter.py (2)

84-98: LGTM!

The extended function signature with plot_dashboard and dashboard_hours parameters is clean and maintains backward compatibility with defaults.


114-122: LGTM on subprocess forwarding.

The subprocess call constructs commands from user-provided CLI arguments, which is appropriate for this local training orchestration tool. The Ruff S603 warning is a false positive in this context.

test/test_sanity_env.py (5)

1-11: Good documentation and usage examples.

The docstrings provide clear usage examples for different test scenarios (quick run, full belt-and-braces, with external data).


111-142: Well-designed determinism test.

The test correctly pins price_start_index to ensure external price variation doesn't affect determinism checking. The unused obs variable from unpacking (line 126) is a common pattern in RL code and can be safely ignored.


144-177: Excellent carry-over continuity test.

This test directly validates the PR's core objective - carrying state between episodes rather than resetting to zero. The comprehensive snapshot comparison ensures all critical state components (nodes, job_queue, predicted_prices, running_jobs, price_index, etc.) are preserved across resets.


377-377: Clarify why check_obs_is_copy is commented out.

If there's a known issue with observation aliasing, consider adding a TODO or comment explaining why this check is disabled.


398-406: Good test orchestration structure.

The conditional test execution (gym compliance → invariants → determinism → carry-over) provides flexibility for different testing scenarios.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 38 to 42
def reset_state_metrics(self):
"""Reset timeline-dependent metrics."""
self.current_hour = 0
self.reset_episode_metrics()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

reset_state_metrics resets current_hour but reset_episode_metrics does not.

This is the expected behavior for the non-carry-over case, but confirms the issue noted in environment.py: in carry-over mode, calling only reset_episode_metrics() will not reset current_hour, causing episode boundary detection to fail.

If carry-over mode is intended to span multiple episodes without resetting time, then current_hour should be reset separately in reset() when carry_over_state=True:

🐛 Proposed fix in environment.py reset()
     def reset(self, seed=None, options=None):
         # ...
         # Reset metrics
         if self.carry_over_state:
+            self.metrics.current_hour = 0  # Reset hour for new episode
             self.metrics.reset_episode_metrics()
         else:
             self.metrics.reset_state_metrics()

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/environment.py`:
- Around line 175-177: The constructor calls reset_timeline_state() before
Prices.reset(), so reset_timeline_state() tries to copy
self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().

In `@src/plotter.py`:
- Around line 81-86: The parameter episode_costs on plot_dashboard is
intentionally unused but triggers Ruff ARG001; silence the warning by explicitly
using it in the function body (e.g., assign it to a throwaway variable or delete
it) so the signature stays the same but the linter treats it as used; update the
body of plot_dashboard to include a line like "_ = episode_costs" (or "del
episode_costs") immediately after the docstring.

In `@test/test_sanity_env.py`:
- Around line 118-137: In the rollout() function and other places where you
unpack but don't use values (e.g., the env.reset and env.step returns and
variables like obs, r, info, nodes, prices), prefix those unused variable names
with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059
is satisfied; update every unpacking site (including the env.reset(...) return
and env.step(...) tuples in rollout and the other similar blocks referenced) to
use the underscore-prefixed names consistently.
- Around line 166-176: Update the carry-over test to expect metrics.current_hour
to be reset to 0 after calling ComputeClusterEnv.reset() instead of matching the
saved snapshot; specifically modify the assertion that references
env.metrics.current_hour (in test/test_sanity_env.py) so it asserts
env.metrics.current_hour == 0 (reflecting the reset behavior in
ComputeClusterEnv.reset) rather than comparing to snapshot["current_hour"].
♻️ Duplicate comments (1)
src/plotter.py (1)

89-179: Use env.metrics for costs, job counts, and time‑series (current code can crash or show zeros).

plot_dashboard() still reads env.jobs_completed, env.total_cost, env.price_stats, etc., but these live on env.metrics. This will either raise AttributeError or render empty panels. Please route all metric reads through metrics.

🛠️ Proposed fix
-    base_cost = float(getattr(env, "baseline_cost", 0.0))
-    base_cost_off = float(getattr(env, "baseline_cost_off", 0.0))
-    agent_cost = float(getattr(env, "total_cost", 0.0))
+    base_cost = float(getattr(metrics, "baseline_cost", 0.0))
+    base_cost_off = float(getattr(metrics, "baseline_cost_off", 0.0))
+    agent_cost = float(getattr(metrics, "total_cost", 0.0))
@@
-        f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | "
-        f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})"
+        f"Jobs: {getattr(metrics, 'jobs_completed', 0)}/{getattr(metrics, 'jobs_submitted', 0)} "
+        f"({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={getattr(metrics, 'max_queue_size_reached', 0)}) | "
+        f"Base: {getattr(metrics, 'baseline_jobs_completed', 0)}/{getattr(metrics, 'baseline_jobs_submitted', 0)} "
+        f"({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={getattr(metrics, 'baseline_max_queue_size_reached', 0)})"
@@
-        add_panel("Electricity price", getattr(env, "price_stats", None), "€/MWh", None)
+        add_panel("Electricity price", getattr(metrics, "price_stats", None), "€/MWh", None)
@@
-        add_panel("Online nodes", getattr(env, "on_nodes", None), "count", (0, max_nodes * 1.1))
+        add_panel("Online nodes", getattr(metrics, "on_nodes", None), "count", (0, max_nodes * 1.1))
@@
-        add_panel("Used nodes", getattr(env, "used_nodes", None), "count", (0, max_nodes))
+        add_panel("Used nodes", getattr(metrics, "used_nodes", None), "count", (0, max_nodes))
@@
-        add_panel("Job queue & running jobs",
-                  getattr(env, "job_queue_sizes", None),
+        add_panel("Job queue & running jobs",
+                  getattr(metrics, "job_queue_sizes", None),
                   "jobs",
                   None,
                   overlay=("Running jobs", running_series),
         )
@@
-        add_panel("Efficiency reward (%)", getattr(env, "eff_rewards", None), "score", None)
+        add_panel("Efficiency reward (%)", getattr(metrics, "eff_rewards", None), "score", None)
@@
-        add_panel("Price reward (%)", getattr(env, "price_rewards", None), "score", None)
+        add_panel("Price reward (%)", getattr(metrics, "price_rewards", None), "score", None)
@@
-        add_panel("Idle penalty (%)", getattr(env, "idle_penalties", None), "score", None)
+        add_panel("Idle penalty (%)", getattr(metrics, "idle_penalties", None), "score", None)
@@
-        add_panel("Job-age penalty (%)", getattr(env, "job_age_penalties", None), "score", None)
+        add_panel("Job-age penalty (%)", getattr(metrics, "job_age_penalties", None), "score", None)
@@
-        add_panel("Total reward", getattr(env, "rewards", None), "reward", None)
+        add_panel("Total reward", getattr(metrics, "rewards", None), "reward", None)
🧹 Nitpick comments (1)
src/plotter.py (1)

305-308: Minor readability: avoid single‑letter l in list comprehension.

Ruff flags this as ambiguous (E741).

✏️ Suggested tweak
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e15d30 and 2c9ade7.

📒 Files selected for processing (6)
  • src/baseline.py
  • src/environment.py
  • src/job_management.py
  • src/metrics_tracker.py
  • src/plotter.py
  • test/test_sanity_env.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/baseline.py
  • src/job_management.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

  • src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

  • src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

  • src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

  • src/environment.py
  • src/plotter.py
  • src/metrics_tracker.py
🧬 Code graph analysis (1)
src/environment.py (3)
src/metrics_tracker.py (2)
  • reset_episode_metrics (43-80)
  • reset_state_metrics (38-41)
test/test_sanity_env.py (1)
  • reset (321-327)
src/prices_deterministic.py (1)
  • reset (52-76)
🪛 Ruff (0.14.11)
test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)


307-307: Ambiguous variable name: l

(E741)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +175 to +177
self.reset_timeline_state()
self.metrics.reset_episode_metrics()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the Prices class and check __init__
rg -n "class Prices" src/ -A 20 | head -50

Repository: FairRootGroup/powersched

Length of output: 2426


🏁 Script executed:

# Also check the environment.py file at the mentioned lines
sed -n '160,190p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1410


🏁 Script executed:

# And check lines 268-277 as mentioned
sed -n '260,285p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1050


🏁 Script executed:

# Now let's look at the reset_timeline_state implementation
rg -n "def reset_timeline_state" src/environment.py -A 10

Repository: FairRootGroup/powersched

Length of output: 577


🏁 Script executed:

# Search for reset method in Prices class
rg -n "def reset" src/prices_deterministic.py -A 15

Repository: FairRootGroup/powersched

Length of output: 796


🏁 Script executed:

# Check the full environment.__init__ to see the complete sequence
rg -n "def __init__" src/environment.py -A 80 | head -120

Repository: FairRootGroup/powersched

Length of output: 3707


🏁 Script executed:

# Also check if Prices.reset() is called anywhere in environment.__init__
rg -n "self.prices.reset" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 183


🏁 Script executed:

# Get full context around these critical lines
sed -n '115,250p' src/environment.py | cat -n

Repository: FairRootGroup/powersched

Length of output: 6999


reset_timeline_state() attempts to copy self.prices.predicted_prices while it is still None.

In __init__ at line 175, reset_timeline_state() is called before Prices.reset() is invoked. However, Prices.__init__ initializes predicted_prices = None (line 19), and it only gets populated when Prices.reset() is called. This causes AttributeError: 'NoneType' object has no attribute 'copy' at line 277 when trying to execute self.prices.predicted_prices.copy().

Call self.prices.reset(start_index=0) before self.reset_timeline_state() in __init__, matching the pattern used in the reset() method (lines 120–123).

🤖 Prompt for AI Agents
In `@src/environment.py` around lines 175 - 177, The constructor calls
reset_timeline_state() before Prices.reset(), so reset_timeline_state() tries to
copy self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().

Comment on lines 81 to 86
def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
"""
Per-hour dashboard: price, nodes, queue, reward components, etc.
NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
Cumulative savings now lives in plot_cumulative_savings().
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silence Ruff’s unused-parameter warning for episode_costs.

The parameter is kept for backwards compatibility but is unused, so Ruff (ARG001) will complain. Mark it intentionally unused to keep lint clean without breaking the signature.

🧹 Proposed fix
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
     NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
           Cumulative savings now lives in plot_cumulative_savings().
     """
+    _ = episode_costs  # backward-compat placeholder
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
"""
Per-hour dashboard: price, nodes, queue, reward components, etc.
NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
Cumulative savings now lives in plot_cumulative_savings().
"""
def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
"""
Per-hour dashboard: price, nodes, queue, reward components, etc.
NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
Cumulative savings now lives in plot_cumulative_savings().
"""
_ = episode_costs # backward-compat placeholder
🧰 Tools
🪛 Ruff (0.14.11)

81-81: Unused function argument: episode_costs

(ARG001)

🤖 Prompt for AI Agents
In `@src/plotter.py` around lines 81 - 86, The parameter episode_costs on
plot_dashboard is intentionally unused but triggers Ruff ARG001; silence the
warning by explicitly using it in the function body (e.g., assign it to a
throwaway variable or delete it) so the signature stays the same but the linter
treats it as used; update the body of plot_dashboard to include a line like "_ =
episode_costs" (or "del episode_costs") immediately after the docstring.

Comment on lines 118 to 137
def rollout():
env = make_env()
# Pin external price window so determinism doesn't vary by episode.
obs, _ = env.reset(seed=seed, options={"price_start_index": 0})
traj = []
done = False
i = 0
while not done and i < n_steps:
obs, r, term, trunc, info = env.step(actions[i])
# record a small fingerprint
traj.append((
float(r),
float(info.get("step_cost", 0.0)),
int(info.get("num_unprocessed_jobs", -1)),
int(info.get("num_on_nodes", -1)),
int(info.get("dropped_this_episode", -1)),
))
done = term or trunc
i += 1
env.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ruff RUF059: unused unpacked variables.

Several unpacked values aren’t used (e.g., obs, r, info, nodes, prices). Prefix them with _ to keep lint clean.

🧹 Suggested clean‑up
-            obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, info = env.step(actions[i])
@@
-        obs, r, term, trunc, info = env.step(action)
+        _obs, _r, term, trunc, _info = env.step(action)
@@
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)
@@
-    o1, _ = env.reset(seed=seed)
+    _o1, _ = env.reset(seed=seed)
-    o1s, r1, t1, tr1, i1 = env.step(action)
+    o1s, r1, _t1, _tr1, i1 = env.step(action)
@@
-    o2, _ = env.reset(seed=seed)
+    _o2, _ = env.reset(seed=seed)
-    o2s, r2, t2, tr2, i2 = env.step(action)
+    o2s, r2, _t2, _tr2, i2 = env.step(action)
@@
-            obs, r, term, trunc, info = env.step(action)
+            obs, _r, term, trunc, _info = env.step(action)

Also applies to: 145-152, 281-283, 341-346, 383-384

🧰 Tools
🪛 Ruff (0.14.11)

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
In `@test/test_sanity_env.py` around lines 118 - 137, In the rollout() function
and other places where you unpack but don't use values (e.g., the env.reset and
env.step returns and variables like obs, r, info, nodes, prices), prefix those
unused variable names with an underscore (e.g., _obs, _r, _info, _nodes,
_prices) so the linter RUF059 is satisfied; update every unpacking site
(including the env.reset(...) return and env.step(...) tuples in rollout and the
other similar blocks referenced) to use the underscore-prefixed names
consistently.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/test_sanity_env.py`:
- Around line 376-378: The test was commented out because observations alias
env.state; fix by returning an isolated copy of the environment state in both
reset() and step() so callers cannot mutate internal state. Locate the reset()
and step() methods (they currently return self.state directly) and change them
to return a defensive copy (e.g., use copy.deepcopy(self.state) or an
appropriate shallow copy if state is a simple dict/ndarray) for the observation
value; ensure any other places that currently expose self.state (observation
construction paths) also return copies so check_obs_is_copy will pass.
♻️ Duplicate comments (4)
src/plotter.py (1)

81-86: Silence the unused episode_costs parameter to avoid lint noise.
Ruff flags ARG001 here; the placeholder line keeps the API stable without warnings.

♻️ Proposed fix
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
     NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
           Cumulative savings now lives in plot_cumulative_savings().
     """
+    _ = episode_costs  # backward-compat placeholder
test/test_sanity_env.py (2)

166-177: Verify intended behavior: should current_hour persist or reset in carry-over mode?

Line 176 asserts that env.metrics.current_hour equals the snapshotted value after reset(). However, semantically, current_hour tracks hours within an episode and is used for episode termination (line 495 in environment.py: if self.metrics.current_hour == EPISODE_HOURS).

If current_hour should reset to 0 per episode (even in carry-over mode), update the assertion:

#!/bin/bash
# Check what reset_episode_metrics does to current_hour
rg -n "current_hour" src/metrics_tracker.py -B2 -A2

144-152: Prefix unused unpacked variables with _ to satisfy linter.

Multiple locations have unused unpacked variables flagged by Ruff RUF059 (lines 150, 281, 341-345, 383). Prefix them with _ for lint compliance.

src/environment.py (1)

175-177: reset_timeline_state() is called before self.prices.reset(), causing AttributeError.

At line 175, reset_timeline_state() is invoked, which at line 276 attempts self.prices.predicted_prices.copy(). However, Prices.__init__ initializes predicted_prices = None, and it only gets populated when Prices.reset() is called. Since no self.prices.reset() call precedes line 175, this will raise AttributeError: 'NoneType' object has no attribute 'copy'.

🐛 Proposed fix
+        self.prices.reset(start_index=0)
         self.reset_timeline_state()
         self.metrics.reset_episode_metrics()
🧹 Nitpick comments (3)
src/plotter.py (1)

306-308: Avoid ambiguous variable name l.
This trips Ruff E741; a clearer name improves readability.

♻️ Proposed refactor
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
train.py (1)

259-265: Avoid hard-coded max_nodes=335 for dashboard plots.
Use env/config values so plots stay accurate if cluster size changes.

♻️ Proposed refactor
-                    max_nodes=335,
+                    max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 0)),

Also applies to: 300-305

test/test_sanity_env.py (1)

336-359: Consider removing or guarding the debug comparison block.

Lines 336-359 appear to be development/debug code that runs unconditionally, printing comparison results. Consider either:

  1. Removing it if no longer needed
  2. Guarding it behind a --verbose or --debug flag
♻️ Suggested guard
+    if args.render == "human":  # or add --debug flag
         seed = 123
         action = np.array([1, 0], dtype=np.int64)
         ...
         print("info.current_price", i1.get("current_price"), i2.get("current_price"))

Comment on lines 376 to 378
check_invariants(env, obs)
#check_obs_is_copy(env, obs)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the test file around the commented line
find . -name "test_sanity_env.py" -type f

Repository: FairRootGroup/powersched

Length of output: 93


🏁 Script executed:

# Search for the check_obs_is_copy function definition
rg -n "def check_obs_is_copy" -A 10

Repository: FairRootGroup/powersched

Length of output: 750


🏁 Script executed:

# Search for all usages and references to check_obs_is_copy
rg -n "check_obs_is_copy" -B 2 -A 2

Repository: FairRootGroup/powersched

Length of output: 658


🏁 Script executed:

# Search for environment files and how observations are returned
fd -name "*environment*" -o -name "*env*" | grep -E "\.(py)$" | head -20

Repository: FairRootGroup/powersched

Length of output: 298


🏁 Script executed:

# Read the test file around line 376-378 with more context
sed -n '360,385p' test/test_sanity_env.py

Repository: FairRootGroup/powersched

Length of output: 1026


🏁 Script executed:

# Search for environment implementations
find . -name "*.py" -type f | xargs grep -l "def reset\|def step" | grep -v test | head -10

Repository: FairRootGroup/powersched

Length of output: 165


🏁 Script executed:

# Look for how observations are created/returned
rg -n "def reset\|def step" --type py | grep -v test | head -20

Repository: FairRootGroup/powersched

Length of output: 50


🏁 Script executed:

# Search for observation creation in the environment
rg -n "observation\|obs.*=" --type py -i | grep -v test | head -30

Repository: FairRootGroup/powersched

Length of output: 50


🏁 Script executed:

# Read the environment implementation - look for reset and step methods
grep -n "def reset\|def step" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 189


🏁 Script executed:

# Get the full reset and step implementations
sed -n '1,50p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1496


🏁 Script executed:

# Look for observation-related code in environment
rg -n "observation\|return.*obs\|self.obs" src/environment.py -A 2 | head -50

Repository: FairRootGroup/powersched

Length of output: 50


🏁 Script executed:

# Check the actual structure - read the environment file
wc -l src/environment.py

Repository: FairRootGroup/powersched

Length of output: 90


🏁 Script executed:

# Read the reset method implementation
sed -n '207,266p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 2674


🏁 Script executed:

# Read the step method implementation
sed -n '296,350p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 3394


🏁 Script executed:

# Check if there's any .copy() or deepcopy usage
rg -n "\.copy\(\)|deepcopy" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 217


🏁 Script executed:

# Read the step method return statement
sed -n '296,400p' src/environment.py | tail -100

Repository: FairRootGroup/powersched

Length of output: 6201


🏁 Script executed:

# Search for the full step method - get line count up to next method or end
sed -n '296,$p' src/environment.py | head -120

Repository: FairRootGroup/powersched

Length of output: 7332


🏁 Script executed:

# Check if step returns self.state directly
rg -n "return.*state" src/environment.py -A 1 -B 1

Repository: FairRootGroup/powersched

Length of output: 191


🏁 Script executed:

# Look for any comments or git history around line 377 in test file
git log -p --all -S "check_obs_is_copy" -- test/test_sanity_env.py 2>/dev/null | head -100

Repository: FairRootGroup/powersched

Length of output: 50


Uncomment and fix the observation aliasing bug: observations must not alias env.state.

The check_obs_is_copy function is commented out because it fails — the environment returns direct references to self.state instead of copies (see reset() line 265 and step() line 529). This allows users to corrupt internal state by modifying the returned observation.

Return a properly isolated copy of state from both reset() and step() methods.

🤖 Prompt for AI Agents
In `@test/test_sanity_env.py` around lines 376 - 378, The test was commented out
because observations alias env.state; fix by returning an isolated copy of the
environment state in both reset() and step() so callers cannot mutate internal
state. Locate the reset() and step() methods (they currently return self.state
directly) and change them to return a defensive copy (e.g., use
copy.deepcopy(self.state) or an appropriate shallow copy if state is a simple
dict/ndarray) for the observation value; ensure any other places that currently
expose self.state (observation construction paths) also return copies so
check_obs_is_copy will pass.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/plotter.py`:
- Around line 306-307: Rename the ambiguous loop variable `l` in the list
comprehension that builds `labels` to a clearer identifier (e.g., `line` or
`item`) to satisfy Ruff E741; locate the expression where `lines` is created
from `line1`, `line1b`, `line2`, `line2b` and change the comprehension
`[l.get_label() for l in lines]` to use the clearer name so it reads like
`[line.get_label() for line in lines]` (or similar) ensuring callers of `labels`
remain unaffected.
- Around line 119-124: The header f-string currently references deprecated
top-level attributes (env.session, env.current_episode, env.current_step,
env.weights, env.jobs_completed, env.jobs_submitted, env.max_queue_size_reached,
env.baseline_jobs_completed, env.baseline_jobs_submitted,
env.baseline_max_queue_size_reached) and should be switched to read from
env.metrics with safe fallbacks; update the header construction (the variable
named header) to pull values from env.metrics (e.g. env.metrics.get('session'),
.get('current_episode'), .get('current_step'), .get('weights'),
.get('jobs_completed'), .get('jobs_submitted'), .get('max_queue_size_reached'),
.get('baseline_jobs_completed'), .get('baseline_jobs_submitted'),
.get('baseline_max_queue_size_reached')) and use sensible defaults when keys are
missing to avoid AttributeError or stale values, keeping the same formatting and
percentage calculations but sourcing all counters/averages from env.metrics.
♻️ Duplicate comments (7)
src/plotter.py (1)

81-86: Silence unused episode_costs to keep Ruff clean.
The parameter is intentionally unused but still triggers ARG001.

🧹 Proposed fix
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
     NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
           Cumulative savings now lives in plot_cumulative_savings().
     """
+    _ = episode_costs  # backward-compat placeholder
src/environment.py (3)

175-177: reset_timeline_state() reads predicted_prices before prices are reset.
If Prices.__init__ doesn’t populate predicted_prices, this raises at Line 276. Initialize prices before copying.

🐛 Proposed fix
-        self.reset_timeline_state()
+        self.prices.reset(start_index=0)
+        self.reset_timeline_state()
         self.metrics.reset_episode_metrics()

218-221: Carry‑over mode doesn’t reset current_hour, so episodes won’t terminate after the first.
step() only truncates when current_hour == EPISODE_HOURS. Without resetting, future episodes never end.

🐛 Proposed fix
         if self.carry_over_state:
+            self.metrics.current_hour = 0  # reset per-episode hour
             self.metrics.reset_episode_metrics()
         else:
             self.metrics.reset_state_metrics()

265-265: Return defensive copies of observations to avoid state aliasing.
Returning self.state directly allows callers to mutate internal state.

🐛 Proposed fix
+    def _copy_obs(self):
+        return {k: v.copy() for k, v in self.state.items()}
...
-        return self.state, {}
+        return self._copy_obs(), {}
...
-        return self.state, step_reward, terminated, truncated, info
+        return self._copy_obs(), step_reward, terminated, truncated, info

Also applies to: 529-529

test/test_sanity_env.py (3)

118-137: Prefix unused unpacked variables with _ to satisfy Ruff RUF059.

🧹 Suggested clean‑up
-            _obs, r, term, trunc, info = env.step(actions[i])
+            _obs, r, term, trunc, _info = env.step(actions[i])
...
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)
...
-    o1, _ = env.reset(seed=seed)
+    _o1, _ = env.reset(seed=seed)
-    o1s, r1, t1, tr1, i1 = env.step(action)
+    o1s, r1, _t1, _tr1, i1 = env.step(action)
...
-    o2, _ = env.reset(seed=seed)
+    _o2, _ = env.reset(seed=seed)
-    o2s, r2, t2, tr2, i2 = env.step(action)
+    o2s, r2, _t2, _tr2, i2 = env.step(action)
...
-            obs, r, term, trunc, info = env.step(action)
+            obs, _r, term, trunc, _info = env.step(action)

Also applies to: 145-152, 281-283, 341-346, 383-384


166-176: Carry‑over test should expect current_hour to reset per episode.
This aligns with the episode‑length termination logic.

🧪 Proposed fix
-    assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset"
+    assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode"

371-378: Re‑enable the copy‑safety check once observations are defensive copies.
Right now it’s commented out, so the aliasing bug can regress silently.

… now checks for determinism and carry-over continuitiy.

Add carry-over state handling in the refactored environment by initializing timeline state on construction and skipping resets when enabled, while separating cumulative from episode-scoped metrics. Update MetricsTracker to track episode_* counters, propagate those counters through job assignment and baseline steps, and ensure episode summaries use per-episode values. Extend sanity tests and metrics usage to align with the refactor (metrics.current_hour, carry-over continuity checks).
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/metrics_tracker.py`:
- Around line 15-36: The reset_cumulative_metrics method resets numeric counters
but does not clear the episode_costs history; modify reset_cumulative_metrics
(in class where reset_cumulative_metrics is defined) to also clear the
cumulative history by resetting self.episode_costs to an empty list (or calling
self.episode_costs.clear()), and similarly clear any other cumulative history
lists if present (e.g., baseline_episode_costs) to avoid mixing old episode
history with fresh counters.
♻️ Duplicate comments (3)
src/environment.py (2)

175-176: Reset order still risks predicted_prices being None.

reset_timeline_state() copies self.prices.predicted_prices, but Prices.reset() isn’t called in __init__ before this, so predicted_prices may still be None. Consider initializing prices first.

🐛 Proposed fix
-        self.reset_timeline_state()
+        self.prices.reset(start_index=0)
+        self.reset_timeline_state()

267-267: Return defensive copies of observations.

reset() and step() still return self.state directly, so callers can mutate env internals. Please return copies.

🛡️ Proposed fix
-        return self.state, {}
+        obs = {k: (v.copy() if hasattr(v, "copy") else v) for k, v in self.state.items()}
+        return obs, {}
@@
-        return self.state, step_reward, terminated, truncated, info
+        obs = {k: (v.copy() if hasattr(v, "copy") else v) for k, v in self.state.items()}
+        return obs, step_reward, terminated, truncated, info

Also applies to: 532-532

test/test_sanity_env.py (1)

144-152: Rename unused step outputs to satisfy RUF059.

obs, r, and info aren’t used in the loop. Prefix with _ to keep Ruff clean.

🧹 Proposed fix
-    obs, _ = env.reset(seed=seed)
+    _obs, _ = env.reset(seed=seed)
@@
-        obs, r, term, trunc, info = env.step(action)
+        _obs, _r, term, trunc, _info = env.step(action)

Comment on lines +15 to 36
def reset_cumulative_metrics(self):
"""Reset metrics that persist across episodes."""
# Job tracking metrics for agent (cumulative across episodes)
# Cost tracking (cumulative across episodes)
self.total_cost = 0
self.baseline_cost = 0
self.baseline_cost_off = 0

# Agent job metrics (cumulative across episodes)
self.jobs_submitted = 0
self.jobs_completed = 0
self.total_job_wait_time = 0
self.max_queue_size_reached = 0
self.jobs_dropped = 0
self.jobs_rejected_queue_full = 0

# Job tracking metrics for baseline (cumulative across episodes)
# Baseline job metrics (cumulative across episodes)
self.baseline_jobs_submitted = 0
self.baseline_jobs_completed = 0
self.baseline_total_job_wait_time = 0
self.baseline_max_queue_size_reached = 0
self.baseline_jobs_dropped = 0
self.baseline_jobs_rejected_queue_full = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reset cumulative history list alongside counters.

reset_cumulative_metrics() resets totals but leaves episode_costs intact, which can mix old episode history with fresh counters if this method is reused mid-run. Consider clearing it here.

🧹 Proposed fix
 def reset_cumulative_metrics(self):
     """Reset metrics that persist across episodes."""
@@
     self.baseline_jobs_rejected_queue_full = 0
+    self.episode_costs = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def reset_cumulative_metrics(self):
"""Reset metrics that persist across episodes."""
# Job tracking metrics for agent (cumulative across episodes)
# Cost tracking (cumulative across episodes)
self.total_cost = 0
self.baseline_cost = 0
self.baseline_cost_off = 0
# Agent job metrics (cumulative across episodes)
self.jobs_submitted = 0
self.jobs_completed = 0
self.total_job_wait_time = 0
self.max_queue_size_reached = 0
self.jobs_dropped = 0
self.jobs_rejected_queue_full = 0
# Job tracking metrics for baseline (cumulative across episodes)
# Baseline job metrics (cumulative across episodes)
self.baseline_jobs_submitted = 0
self.baseline_jobs_completed = 0
self.baseline_total_job_wait_time = 0
self.baseline_max_queue_size_reached = 0
self.baseline_jobs_dropped = 0
self.baseline_jobs_rejected_queue_full = 0
def reset_cumulative_metrics(self):
"""Reset metrics that persist across episodes."""
# Cost tracking (cumulative across episodes)
self.total_cost = 0
self.baseline_cost = 0
self.baseline_cost_off = 0
# Agent job metrics (cumulative across episodes)
self.jobs_submitted = 0
self.jobs_completed = 0
self.total_job_wait_time = 0
self.max_queue_size_reached = 0
self.jobs_dropped = 0
self.jobs_rejected_queue_full = 0
# Baseline job metrics (cumulative across episodes)
self.baseline_jobs_submitted = 0
self.baseline_jobs_completed = 0
self.baseline_total_job_wait_time = 0
self.baseline_max_queue_size_reached = 0
self.baseline_jobs_dropped = 0
self.baseline_jobs_rejected_queue_full = 0
self.episode_costs = []
🤖 Prompt for AI Agents
In `@src/metrics_tracker.py` around lines 15 - 36, The reset_cumulative_metrics
method resets numeric counters but does not clear the episode_costs history;
modify reset_cumulative_metrics (in class where reset_cumulative_metrics is
defined) to also clear the cumulative history by resetting self.episode_costs to
an empty list (or calling self.episode_costs.clear()), and similarly clear any
other cumulative history lists if present (e.g., baseline_episode_costs) to
avoid mixing old episode history with fresh counters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant