Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions src/sentry/integrations/github/multi_platform_detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,9 @@ def detect_platforms_multi(
languages: dict[str, int] = client.get_languages(repo)
active_platforms = _select_active_platforms(languages)

tree_start = time.monotonic()
entries, is_truncated = _get_tree(client, repo, ref)
tree_duration_ms = (time.monotonic() - tree_start) * 1000
index = _build_tree_index(entries)

results: list[DetectedPlatform] = []
Expand Down Expand Up @@ -524,11 +526,13 @@ def detect_platforms_multi(
# are always within the cap before subdirectory files from monorepo workspaces.
capped_paths = sorted(needed_paths, key=lambda p: (p.count("/"), p))[:MAX_CONTENT_READS]

content_reads_start = time.monotonic()
content_by_path: dict[str, str] = {}
for path in capped_paths:
content = _get_repo_file_content(client, repo, path, ref)
if content is not None:
content_by_path[path] = content
content_reads_duration_ms = (time.monotonic() - content_reads_start) * 1000

manifests_by_path: dict[str, _PackageManifest] = {}
for path, content in content_by_path.items():
Expand Down Expand Up @@ -624,6 +628,21 @@ def detect_platforms_multi(
f"{_MULTI_METRICS_PREFIX}.k_reads_realized",
k_reads_realized,
)
sentry_sdk.metrics.distribution(
f"{_MULTI_METRICS_PREFIX}.tree.duration",
tree_duration_ms,
unit="millisecond",
)
sentry_sdk.metrics.distribution(
f"{_MULTI_METRICS_PREFIX}.content_reads.duration",
content_reads_duration_ms,
unit="millisecond",
)
for needed_path in needed_paths:
sentry_sdk.metrics.distribution(
f"{_MULTI_METRICS_PREFIX}.needed_path_depth",
needed_path.count("/"),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded per-path metric emissions

Medium Severity

Each detection run loops over every path in needed_paths and calls sentry_sdk.metrics.distribution once per path. That set is uncapped and can include every matching manifest or match_ext file in a large tree, so a single request may emit thousands of metric samples and add noticeable latency on top of an already expensive GitHub tree fetch.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6c23be0. Configure here.

@Abdkhan14 Abdkhan14 Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay. Max needed length over the last 2 days was 41, with a p90 of 4

Comment on lines +641 to +645

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The loop over needed_paths emits a separate metric for each path, which can cause high metric volume and unnecessary overhead in large repositories.
Severity: MEDIUM

Suggested Fix

Instead of emitting a metric for every path in needed_paths, aggregate the data first. For example, you could calculate the maximum or average path depth and emit a single metric. This would capture the intended insight without the performance overhead of numerous individual calls.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/integrations/github/multi_platform_detection.py#L641-L645

Potential issue: The code iterates over `needed_paths`, an uncapped list of file paths,
to emit a metric for each path's depth. In large monorepos with many packages (e.g.,
50+), this can result in a high volume of `sentry_sdk.metrics.distribution()` calls for
a single platform detection run. While each metric call is individually lightweight and
non-blocking, the aggregate overhead from dozens of calls is unnecessary and introduces
performance inefficiency and metric noise. This behavior is reachable in common
scenarios involving monorepos with numerous workspaces.

Did we get this right? 👍 / 👎 to inform future reviews.

sentry_sdk.metrics.count(
f"{_MULTI_METRICS_PREFIX}.completed",
1,
Expand Down
Loading