Skip to content

Relativistic 2e integrals#399

Draft
kshitij-05 wants to merge 21 commits intomasterfrom
kshitij/feature/2e_rkb_ints
Draft

Relativistic 2e integrals#399
kshitij-05 wants to merge 21 commits intomasterfrom
kshitij/feature/2e_rkb_ints

Conversation

@kshitij-05
Copy link
Copy Markdown
Collaborator

@kshitij-05 kshitij-05 commented Feb 9, 2026

  • Implement 2-electron 4-center relativistic integrals with restricted kinetic balance condition (RKB).

    • (LL|SS)
    • (SS|SS)
  • Implement 2e 3-center relativistic integrals with RKB.

    • (X|SS)
    • (X|LS)

Copy link
Copy Markdown
Collaborator

@loriab loriab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this and saw some tweaks to propose. You might want to add the new class to INSTALL.md, too.

message(VERBOSE "setting components ${_amlist}")

foreach(_cls ONEBODY;ERI;ERI3;ERI2;G12;G12DKH)
foreach(_cls ONEBODY;ERI;RKB_ERI;ERI3;ERI2;G12;G12DKH)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a slight disadv to the underscore if ppl are splitting the integral codes (e.g., rkb_eri_ffff_d1) on underscore, but I think RKB_ERI is fine.

… unique am shell sets and phase change for this operator
…+ progress bar + sign fix

- ShellQuartetSetPredicate: add braket-swap tiebreaker for bra_ket_coswappable
  operators (σpσpCoulombσpσp). When la+lb == lc+ld, use max(la,lb) <= lc to
  pick one canonical representative, reducing duplicate quartet generation.

- Engine (engine.impl.h): update swap_braket logic for opop_coulomb_opop to
  match the new predicate tiebreaker. Add coupled-swap sign correction in the
  swap_braket branch (was missing — exposed by d-shell testing).

- build_libint.cc: disable CSE (do_cse/condense_expr) for multi-component
  operators since their 16 components share no intermediates at the expression
  level. This eliminates the superlinear optimize_rr_out bottleneck (e.g.,
  8.8s → 71ms for (ss|ds) prerequisite DAG).

- build_libint.cc: fix compilation when only LIBINT_INCLUDE_RKB_ERI is defined
  (without LIBINT_INCLUDE_ERI): extend #ifdef guards for build_TwoPRep_2b_2k,
  add forward declaration, move make_descr to detail namespace, use if constexpr
  for component descriptor construction.

- buildtest.h: add CodeGenProgress spinner showing elapsed time, function count,
  and current task name on stderr during code generation.

- int_am.cmake: fix typo in OPT_AM variable reference.
Add a static type_index → visit function cache in optimal_rr(). After the
first vertex of each C++ type is matched via the linear mpl::for_each scan
over MasterIntegralTypeList (~48 types with dynamic_pointer_cast each),
subsequent vertices of the same type dispatch directly to the matching
handler. This eliminates ~31 wasted dynamic_pointer_cast calls per vertex
in RKB prerequisite DAGs where all vertices are TwoPRep_11_11_sq.
Add LIBINT_NUM_WORKERS/LIBINT_WORKER_ID env vars to partition shell quartets
across multiple build_libint processes. Each worker generates code for its
subset of RKB quartets and writes iface fragments to separate files. Worker 0
then merges all fragments and produces the final interface headers.

Non-RKB integrals (onebody, ERI) are generated by all workers (duplicated
but fast). Only the RKB quartet loop is partitioned since it dominates
generation time at higher AM.

Includes bin/build_libint_parallel.sh wrapper script that manages the
two-phase workflow: workers 1..N-1 run in parallel, then worker 0 merges.

Measured 2.8x speedup with 4 workers at RKB_MAX_AM=2 (334s -> 118s).
When LIBINT2_NUM_WORKERS > 1, the export step runs build_libint via the
build_libint_parallel.sh wrapper, launching N parallel workers.

Usage: cmake -S . -B build -DLIBINT2_NUM_WORKERS=4

Default is 1 (serial, same behavior as before).
Removes LIBINT2_NUM_WORKERS, WorkerConfig, build_libint_parallel.sh, and
all worker partitioning logic. The process-level parallelism produced
incomplete output (missing CR header files) because generate_rr_code needs
external symbols from ALL quartets but workers only discover their subset.

Retains: type dispatch cache, CSE disable, braket tiebreaker, progress bar.
@kshitij-05 kshitij-05 force-pushed the kshitij/feature/2e_rkb_ints branch from 25265a6 to 41f1dad Compare March 22, 2026 18:19
Replace derivative-based recurrence relations for CoulombσpσpOper and
σpσpCoulombσpσpOper with direct AM-shift expansion using the Gaussian
derivative identity: ∂/∂R_i G(f) = 2α_f·G(f+1_i) - l_{f,i}·G(f-1_i).

This eliminates all DerivGauss intermediate code for RKB integrals,
reducing generated files by ~74% (2058 fewer .cc files at RKB_MAX_AM=1).

Key changes:
- comp_11_Coulombσpσp_11.h: Expand ket (c,d) derivatives into AM-shifted
  TwoPRep children (4 quaternion components, up to 8 children per term)
- comp_11_σpσpCoulombσpσp_11.h: Expand all 4 center derivatives (a,b,c,d)
  into AM-shifted TwoPRep children (16 components, factored bra×ket)
- build_libint.cc: Enable both operators, extend shell range (lmax+1),
  force unrolling for AM-shift operators, disable CSE for multi-component
- buildtest.h: Extract external symbols before DAG optimization to prevent
  loss of high-order Boys function symbols; add prereq depth safety limit
- test-2body.cc: Enable 16-component σpσpCoulombσpσp unit test

All 5120 test assertions pass (1024 for coulomb_opop + 4096 for
opop_coulomb_opop) at RKB_MAX_AM=1, verified against reference
derivative ERIs.
@kshitij-05
Copy link
Copy Markdown
Collaborator Author

kshitij-05 commented Mar 27, 2026

RKB AM-Shift Rewrite: Generated Code Reduction

Table 1: RKB_MAX_AM=1&2, LIBINT2_MAX_AM=2

MAX_AM=1 Old MAX_AM=1 New MAX_AM=2 Old MAX_AM=2 New
.cc files 2,777 719 (-74%) 7,121 819 (-88%)
RKB files 254 50 (-80%) 1,188 196 (-84%)
DerivGauss files 1,132 45 (-96%) 4,233 45 (-99%)
Export time ~46s ~39s (-15%) ~10m06s ~4m28s (-56%)

…ation

Integrate three Python post-processing scripts into the CMake export
pipeline to reduce generated code size and compile time for RKB
integrals:

1. inline_single_use.py: Inlines single-use fp variables (recursive,
   with address-of exclusion). Reduces biggest file 184MB -> 75MB.

2. cse_postprocess.py: Extracts repeated two_alpha product pairs to
   local variables, eliminates 1.0*x, shortens float constants.
   Reduces 75MB -> 49MB.

3. split_prereq.py: Splits large prereq functions into smaller
   compilation units by dependency analysis. Each part compiles
   independently, excluded from unity batching for parallelism.

Also fixes:
- cmake/modules/int_am.cmake: Set ${class}_MAX_AM when explicitly
  specified (was only setting LIBINT_ prefixed version, causing
  RKB config to register only ssss instead of all AM levels)
- export/CMakeLists.txt.export: Skip unity build for split part files
- export/tests/unit/test-2body.cc: Add d-shell to RKB test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants