Feature: Rule-based Fact Engine by izzet · Pull Request #59 · llnl/dfanalyzer

izzet · 2026-06-24T11:42:12Z

This pull request introduces a major new feature: integration of an "analysis facts" pipeline into DFAnalyzer, enabling machine-readable bottleneck signals that can be consumed by DFDiagnoser and DFOptimizer. This is an opt-in, configurable system for producing compact, actionable findings from analysis runs. The PR also adds the configuration and infrastructure needed to support this feature, including a sample ruleset for DLIO workloads.

The most important changes are:

Analysis Facts Pipeline Integration:

Added a new FactsConfig dataclass and corresponding configuration options to enable, configure, and control the emission of analysis facts. These facts can be generated by rule-based (YAML) or metric-based evaluation modes, and are opt-in by default. (python/dftracer/analyzer/config.py [1] [2]
Updated the Analyzer class to initialize and run the facts pipeline when enabled, evaluating facts over flat views and including them in the analysis result. (python/dftracer/analyzer/analyzer.py [1] [2] [3] [4] [5]
Integrated facts configuration into Hydra initialization and the main config schema. (python/dftracer/analyzer/__init__.py [1] python/dftracer/analyzer/config.py [2]

Output and Configuration Enhancements:

Added a FileOutputConfig for writing output bundles, including facts and supporting files, for downstream consumption by DFDiagnoser and DFOptimizer. (python/dftracer/analyzer/config.py [1] [2]

Ruleset Example:

Added a comprehensive DLIO-specific YAML ruleset (dlio-all.yaml) for fact generation, covering various bottleneck types and optimization opportunities. (python/dftracer/analyzer/configs/fact_rules/dlio-all.yaml python/dftracer/analyzer/configs/fact_rules/dlio-all.yamlR1-R194)

Documentation:

Expanded the README.md with detailed instructions and examples for using the analysis facts feature, including end-to-end workflow and configuration options. (README.md README.mdR119-R207)

These changes collectively enable DFAnalyzer to produce actionable, machine-readable findings for downstream diagnosis and optimization, with a flexible and extensible configuration system.

…+ rules Leaf additions for the facts feature, re-ported onto current main: - types.py: AnalysisFact / FactWindow / FactScope / FactSeverity / FactProvenance / FactEnvelope(+Context) (analyzer.fact-envelope.v1), appended after Views; main's Output*/AnalysisResult types untouched. - scoring.py: continuous slope-severity (normalize_slope) re-centered on the proportional baseline. - configs/schemas/analyzer.fact-envelope.v1.schema.json + configs/fact_rules/*.yaml. - meson.build packages scoring.py + the rule/schema data. Nothing is wired into the analyzer yet, so view/HLM output is structurally identical to main (verified). Leaf unit tests pass (scoring + fact dataclass roundtrip).

fact_engine.py (FactEngine rule builder, MetricFactBuilder slope builder, FactEmitter aggregate/detail + TEMPORAL_VIEW_TYPES, FactPipeline) and fact_rules.py (rule compile + validation). Self-contained on types/scoring/fact_rules; driven by synthetic flat_views, not wired into the analyzer -- view/HLM output stays structurally identical to main (verified). Tests: test_fact_engine / test_fact_rules / test_aggregate_facts + test_metric_facts (22 pass; 2 pipeline tests deferred to stage 3 FactsConfig).

Add FactsConfig (enabled / eval_mode / eval_rule_file / emit_flat_views / emit_mode / strict_time_semantics / allow_mixed_time_aggregates) and a `facts` field on Config. Additive: the analyzer does not consume cfg.facts yet (stage 4 wires it), so Hydra composition and view/HLM output are unchanged (verified). Unblocks the 2 deferred FactPipeline.from_facts_config tests -> full fact-engine suite now 27 pass.

…s-off-safe) Analyzer gains facts_config + fact_pipeline (built only when facts.enabled) and the _build_facts_config / _evaluate_analysis_facts / _materialize_output_artifacts methods; _analyze_hlm evaluates facts over the flat views and gates them by emit_flat_views. AnalysisResult gains analysis_facts + get_analysis_facts/iter_analysis_facts/ to_fact_envelope. With facts disabled (the default) the pipeline is None and flat views pass through untouched -> view/HLM output is structurally identical to main (verified on the dlio trace: file_name 48x1794, proc_name 6x1782). Facts-on path (reader window column + output envelope) follows in 4b/4c.

…=file) Wire facts_config from Hydra into the analyzer (init_with_hydra passes hydra_config.facts), and add FileOutput (output=file): writes the offline bundle facts.jsonl + detail_view_*.parquet + raw_stats.json that dfdiagnoser input=file consumes. Verified end-to-end on main's reader (dftracer-dlio, dftracer-utils 0.0.10): facts.enabled=true on the time_range view -> 84 analysis_facts -> FileOutput bundle -> diagnoser -> io_present finding. Facts-off remains structurally identical to main (file_name 48x1794, proc_name 6x1782). Streaming/window output (ZMQOutput/MofkaOutput) + the window view follow in stage 5.

…ptimizer chain Add a Facts section to the README: the opt-in facts.enabled model (additive; default output unchanged), rule vs metric builders, output=file bundle (facts.jsonl + detail_view_*.parquet + raw_stats.json), the full offline chain (dfanalyzer output=file -> dfdiagnoser input=file -> dfoptimizer), the time_range/window temporal axis note, and a facts.* config table.

…numbers) Correct the optimizer invocation (python main.py --transport file, not python -m dfoptimizer), add the eval_rule_file flag, and cite the verified end-to-end run (reader_pressure time_range -> 76 facts -> finding persistence 39 -> 2 ActionPlans). Clarify time_range as the offline axis; epoch/window via streaming.

…ule layer names) Pair with the dftracer-utils distributed-scan epoch assignment: - _postread_hlm_config passes epoch_query (the preset's epoch layer def) so the scan assigns per-pid epochs; add epoch/step to HLM_INT_INDEX_COLS so they index the HLM. - Reconcile the dlio* fact rules to main's AILogging preset layer naming: fetch_iter -> fetch_data (main renamed the per-iteration fetch layer; 0 fetch_iter remain in the preset). source_view stays epoch. Verified: offline view_types=[epoch] + shipped dlio.yaml -> fetch_pressure fact; facts-off structurally unchanged; time_range path unaffected (76 facts); 27 fact tests.

codecov · 2026-06-24T12:35:25Z

Codecov Report

❌ Patch coverage is 28.03468% with 498 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.18%. Comparing base (97d5c7f) to head (439cb25).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines	Patch %	Lines
python/dftracer/analyzer/fact_engine.py	15.49%	300 Missing ⚠️
python/dftracer/analyzer/types.py	53.07%	61 Missing ⚠️
python/dftracer/analyzer/fact_rules.py	40.50%	47 Missing ⚠️
python/dftracer/analyzer/output.py	16.21%	31 Missing ⚠️
python/dftracer/analyzer/analyzer.py	22.22%	28 Missing ⚠️
python/dftracer/analyzer/scoring.py	24.32%	28 Missing ⚠️
python/dftracer/analyzer/dftracer.py	33.33%	2 Missing ⚠️
python/dftracer/analyzer/config.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #59      +/-   ##
===========================================
+ Coverage    26.37%   30.18%   +3.81%     
===========================================
  Files           27       30       +3     
  Lines         3667     3671       +4     
===========================================
+ Hits           967     1108     +141     
+ Misses        2700     2563     -137

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The analyzer CI never ran the fact pipeline, so a facts regression could pass CI. Add a facts run to "Run DFAnalyzer with external cluster": facts.enabled=true on the time_range axis (the offline axis the released reader supports) with a CI rule fixture (tests/data/ci_facts_rules.yaml: reader_pressure), output=file, and assert the bundle's facts.jsonl is non-empty -- so CI fails loudly if facts stop emitting. Exercises the fact engine + FileOutput end-to-end on every run.

…t CI rule The CLI (__main__.py) never forwarded cfg.facts to the analyzer, so `dfanalyzer facts.enabled=true` was a silent no-op (only init_with_hydra wired it in stage 4). Pass facts_config=cfg.facts in __main__'s instantiate, matching init_with_hydra. Facts-off is unchanged (enabled defaults false). Also switch the CI fact rule to depend on a single dense metric (reader_posix_time_proc_max) instead of a ratio against the near-empty app_time_proc_max, so the CI facts run reliably emits facts (verified: dfanalyzer ... facts.enabled=true -> facts.jsonl written, 77 facts on the time_range view). Fixes the red "Run DFAnalyzer with external cluster" CI step.

…acts in CI ConsoleOutput previously printed only the view/layer tables, so `facts.enabled=true output=console` produced facts internally but displayed nothing. Add a "Analysis Facts" summary table (grouped by fact_type + view: count, peak severity, opportunity tags); additive -- nothing prints when no facts (facts-off console unchanged). Update the CI facts step to run output=console (so the table is visible in the CI log) with a grep assertion, alongside the output=file bundle check. Verified locally: facts.enabled on the dlio time_range view prints "Analysis Facts (77 total)".

…alysis The facts step ran dfanalyzer twice (output=console then output=file), printing the full analysis + log twice. Collapse to one output=console run (Analysis Facts table visible in the log + grep assertion). FileOutput's facts.jsonl is covered by the diagnoser CI (which consumes a committed bundle), so dropping the second run loses no coverage.

The step ran dftracer/dlio twice -- once plain (in the smoke/full branch) and once facts-enabled -- so its output printed twice. Make the existing dftracer/dlio run facts-enabled (output=console, time_range) in both branches and drop the separate facts run. Now dftracer/dlio runs once and still shows the Analysis Facts table; grep asserts facts emitted. darshan/recorder unchanged (full only).

Izzet Yildirim added 8 commits June 24, 2026 11:39

docs(facts): inline a copy-paste time_range rule in the offline example

59786aa

izzet self-assigned this Jun 24, 2026

izzet added the enhancement New feature or request label Jun 24, 2026

Izzet Yildirim added 5 commits June 25, 2026 10:34

hariharan-devarajan merged commit efb51a3 into llnl:develop Jun 25, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Rule-based Fact Engine#59

Feature: Rule-based Fact Engine#59
hariharan-devarajan merged 14 commits into
llnl:developfrom
izzet:feat/facts-report

izzet commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

izzet commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants