Skip to content

Feature: Rule-based Fact Engine#59

Merged
hariharan-devarajan merged 14 commits into
llnl:developfrom
izzet:feat/facts-report
Jun 25, 2026
Merged

Feature: Rule-based Fact Engine#59
hariharan-devarajan merged 14 commits into
llnl:developfrom
izzet:feat/facts-report

Conversation

@izzet

@izzet izzet commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

This pull request introduces a major new feature: integration of an "analysis facts" pipeline into DFAnalyzer, enabling machine-readable bottleneck signals that can be consumed by DFDiagnoser and DFOptimizer. This is an opt-in, configurable system for producing compact, actionable findings from analysis runs. The PR also adds the configuration and infrastructure needed to support this feature, including a sample ruleset for DLIO workloads.

The most important changes are:

Analysis Facts Pipeline Integration:

  • Added a new FactsConfig dataclass and corresponding configuration options to enable, configure, and control the emission of analysis facts. These facts can be generated by rule-based (YAML) or metric-based evaluation modes, and are opt-in by default. (python/dftracer/analyzer/config.py [1] [2]
  • Updated the Analyzer class to initialize and run the facts pipeline when enabled, evaluating facts over flat views and including them in the analysis result. (python/dftracer/analyzer/analyzer.py [1] [2] [3] [4] [5]
  • Integrated facts configuration into Hydra initialization and the main config schema. (python/dftracer/analyzer/__init__.py [1] python/dftracer/analyzer/config.py [2]

Output and Configuration Enhancements:

  • Added a FileOutputConfig for writing output bundles, including facts and supporting files, for downstream consumption by DFDiagnoser and DFOptimizer. (python/dftracer/analyzer/config.py [1] [2]

Ruleset Example:

Documentation:

  • Expanded the README.md with detailed instructions and examples for using the analysis facts feature, including end-to-end workflow and configuration options. (README.md README.mdR119-R207)

These changes collectively enable DFAnalyzer to produce actionable, machine-readable findings for downstream diagnosis and optimization, with a flexible and extensible configuration system.

Izzet Yildirim added 8 commits June 24, 2026 11:39
…+ rules

Leaf additions for the facts feature, re-ported onto current main:
- types.py: AnalysisFact / FactWindow / FactScope / FactSeverity / FactProvenance /
  FactEnvelope(+Context) (analyzer.fact-envelope.v1), appended after Views; main's
  Output*/AnalysisResult types untouched.
- scoring.py: continuous slope-severity (normalize_slope) re-centered on the
  proportional baseline.
- configs/schemas/analyzer.fact-envelope.v1.schema.json + configs/fact_rules/*.yaml.
- meson.build packages scoring.py + the rule/schema data.
Nothing is wired into the analyzer yet, so view/HLM output is structurally identical
to main (verified). Leaf unit tests pass (scoring + fact dataclass roundtrip).
fact_engine.py (FactEngine rule builder, MetricFactBuilder slope builder, FactEmitter
aggregate/detail + TEMPORAL_VIEW_TYPES, FactPipeline) and fact_rules.py (rule compile
+ validation). Self-contained on types/scoring/fact_rules; driven by synthetic
flat_views, not wired into the analyzer -- view/HLM output stays structurally identical
to main (verified). Tests: test_fact_engine / test_fact_rules / test_aggregate_facts +
test_metric_facts (22 pass; 2 pipeline tests deferred to stage 3 FactsConfig).
Add FactsConfig (enabled / eval_mode / eval_rule_file / emit_flat_views / emit_mode /
strict_time_semantics / allow_mixed_time_aggregates) and a `facts` field on Config.
Additive: the analyzer does not consume cfg.facts yet (stage 4 wires it), so Hydra
composition and view/HLM output are unchanged (verified). Unblocks the 2 deferred
FactPipeline.from_facts_config tests -> full fact-engine suite now 27 pass.
…s-off-safe)

Analyzer gains facts_config + fact_pipeline (built only when facts.enabled) and the
_build_facts_config / _evaluate_analysis_facts / _materialize_output_artifacts methods;
_analyze_hlm evaluates facts over the flat views and gates them by emit_flat_views.
AnalysisResult gains analysis_facts + get_analysis_facts/iter_analysis_facts/
to_fact_envelope. With facts disabled (the default) the pipeline is None and flat views
pass through untouched -> view/HLM output is structurally identical to main (verified
on the dlio trace: file_name 48x1794, proc_name 6x1782). Facts-on path (reader window
column + output envelope) follows in 4b/4c.
…=file)

Wire facts_config from Hydra into the analyzer (init_with_hydra passes
hydra_config.facts), and add FileOutput (output=file): writes the offline bundle
facts.jsonl + detail_view_*.parquet + raw_stats.json that dfdiagnoser input=file
consumes. Verified end-to-end on main's reader (dftracer-dlio, dftracer-utils 0.0.10):
facts.enabled=true on the time_range view -> 84 analysis_facts -> FileOutput bundle ->
diagnoser -> io_present finding. Facts-off remains structurally identical to main
(file_name 48x1794, proc_name 6x1782). Streaming/window output (ZMQOutput/MofkaOutput)
+ the window view follow in stage 5.
…ptimizer chain

Add a Facts section to the README: the opt-in facts.enabled model (additive; default
output unchanged), rule vs metric builders, output=file bundle (facts.jsonl +
detail_view_*.parquet + raw_stats.json), the full offline chain
(dfanalyzer output=file -> dfdiagnoser input=file -> dfoptimizer), the time_range/window
temporal axis note, and a facts.* config table.
…numbers)

Correct the optimizer invocation (python main.py --transport file, not python -m
dfoptimizer), add the eval_rule_file flag, and cite the verified end-to-end run
(reader_pressure time_range -> 76 facts -> finding persistence 39 -> 2 ActionPlans).
Clarify time_range as the offline axis; epoch/window via streaming.
@izzet izzet self-assigned this Jun 24, 2026
@izzet izzet added the enhancement New feature or request label Jun 24, 2026
…ule layer names)

Pair with the dftracer-utils distributed-scan epoch assignment:
- _postread_hlm_config passes epoch_query (the preset's epoch layer def) so the scan
  assigns per-pid epochs; add epoch/step to HLM_INT_INDEX_COLS so they index the HLM.
- Reconcile the dlio* fact rules to main's AILogging preset layer naming:
  fetch_iter -> fetch_data (main renamed the per-iteration fetch layer; 0 fetch_iter
  remain in the preset). source_view stays epoch.
Verified: offline view_types=[epoch] + shipped dlio.yaml -> fetch_pressure fact;
facts-off structurally unchanged; time_range path unaffected (76 facts); 27 fact tests.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 28.03468% with 498 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.18%. Comparing base (97d5c7f) to head (439cb25).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
python/dftracer/analyzer/fact_engine.py 15.49% 300 Missing ⚠️
python/dftracer/analyzer/types.py 53.07% 61 Missing ⚠️
python/dftracer/analyzer/fact_rules.py 40.50% 47 Missing ⚠️
python/dftracer/analyzer/output.py 16.21% 31 Missing ⚠️
python/dftracer/analyzer/analyzer.py 22.22% 28 Missing ⚠️
python/dftracer/analyzer/scoring.py 24.32% 28 Missing ⚠️
python/dftracer/analyzer/dftracer.py 33.33% 2 Missing ⚠️
python/dftracer/analyzer/config.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #59      +/-   ##
===========================================
+ Coverage    26.37%   30.18%   +3.81%     
===========================================
  Files           27       30       +3     
  Lines         3667     3671       +4     
===========================================
+ Hits           967     1108     +141     
+ Misses        2700     2563     -137     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Izzet Yildirim added 5 commits June 25, 2026 10:34
The analyzer CI never ran the fact pipeline, so a facts regression could pass CI.
Add a facts run to "Run DFAnalyzer with external cluster": facts.enabled=true on the
time_range axis (the offline axis the released reader supports) with a CI rule fixture
(tests/data/ci_facts_rules.yaml: reader_pressure), output=file, and assert the bundle's
facts.jsonl is non-empty -- so CI fails loudly if facts stop emitting. Exercises the
fact engine + FileOutput end-to-end on every run.
…t CI rule

The CLI (__main__.py) never forwarded cfg.facts to the analyzer, so
`dfanalyzer facts.enabled=true` was a silent no-op (only init_with_hydra wired it in
stage 4). Pass facts_config=cfg.facts in __main__'s instantiate, matching
init_with_hydra. Facts-off is unchanged (enabled defaults false). Also switch the CI
fact rule to depend on a single dense metric (reader_posix_time_proc_max) instead of a
ratio against the near-empty app_time_proc_max, so the CI facts run reliably emits facts
(verified: dfanalyzer ... facts.enabled=true -> facts.jsonl written, 77 facts on the
time_range view). Fixes the red "Run DFAnalyzer with external cluster" CI step.
…acts in CI

ConsoleOutput previously printed only the view/layer tables, so `facts.enabled=true
output=console` produced facts internally but displayed nothing. Add a "Analysis Facts"
summary table (grouped by fact_type + view: count, peak severity, opportunity tags);
additive -- nothing prints when no facts (facts-off console unchanged). Update the CI
facts step to run output=console (so the table is visible in the CI log) with a grep
assertion, alongside the output=file bundle check. Verified locally: facts.enabled on
the dlio time_range view prints "Analysis Facts (77 total)".
…alysis

The facts step ran dfanalyzer twice (output=console then output=file), printing the
full analysis + log twice. Collapse to one output=console run (Analysis Facts table
visible in the log + grep assertion). FileOutput's facts.jsonl is covered by the
diagnoser CI (which consumes a committed bundle), so dropping the second run loses no
coverage.
The step ran dftracer/dlio twice -- once plain (in the smoke/full branch) and once
facts-enabled -- so its output printed twice. Make the existing dftracer/dlio run
facts-enabled (output=console, time_range) in both branches and drop the separate facts
run. Now dftracer/dlio runs once and still shows the Analysis Facts table; grep asserts
facts emitted. darshan/recorder unchanged (full only).
@hariharan-devarajan hariharan-devarajan merged commit efb51a3 into llnl:develop Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants