Skip to content

commit-reach: add general graph traversal find_reachable()#2142

Draft
spkrka wants to merge 1 commit into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins
Draft

commit-reach: add general graph traversal find_reachable()#2142
spkrka wants to merge 1 commit into
gitgitgadget:masterfrom
spkrka:krka/reachability-wins

Conversation

@spkrka
Copy link
Copy Markdown

@spkrka spkrka commented Jun 8, 2026

In 2018, Stolee consolidated commit walks into commit-reach.c and
extracted can_all_from_reach_with_flag() from upload-pack's
ok_to_give_up() with the observation that we can reuse its
commit walking logic for many other callers (ba3ca1e).
In 4fbcca4 it also got optimized with a memoized DFS so
subsequent from-commits benefit from shared ancestry
(very cool optimization!).

This patch continues that idea by generalizing the algorithm into
find_reachable() and rolling it out to the remaining callers. Most
conversions are just code reuse with preserved performance. The big
win is ref-filter branch --contains, where batching N per-ref DFS
walks into a single call with shared RESULT memoization gives
14.5x on gitgitgadget/git.

This makes can_all_from_reach(), contains_tag_algo and its
infrastructure redundant — all deleted. The contains_cache commit
slab is replaced by temporary flag bits on commit->object.flags.

Benchmarks on gitgitgadget/git (v2.48, ~85k commits, 370 branches,
730 tags), median of 5-10 sequential runs on a quiet machine:

  branch -r --contains v2.30.0:   13.49s -> 928ms  (14.5x faster)
  branch -r --contains v2.47.0:    6.19s -> 1.05s  ( 5.9x faster)
  tag --contains v2.30.0:          1.27s -> 1.32s  (neutral)
  tag --contains v2.47.0:          1.40s -> 1.41s  (neutral)
  merge-base --is-ancestor:        682ms -> 678ms  (neutral)

The branch --contains speedup comes from the O(N*D)->O(D+N) batch
change. tag --contains is neutral because the old contains_tag_algo
already had per-commit slab caching. merge-base --is-ancestor is
neutral since the bottleneck is commit-graph object loading, not
the walk pattern.

Add find_reachable(), a generalization of the memoized DFS in
can_all_from_reach_with_flag().

The benefits are two-fold:
1. make the code more uniform - fewer core graph traversals
   to maintain and reason about.
2. optimization for ref-filter --contains

Most converted callers get equal or slightly better performance,
since this standardizes on the generally "best" implementation.

The big win is ref-filter --contains/--no-contains.
Instead of calling commit_contains() for each ref, we batch it
into batched calls to find_reachable(), changing the time complexity
from O(N * D) to O(D + N) where D is the reachable graph depth and
N is the number of refs.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant