feat(scm-multi-platform-detection): Adding new metrics#118099
feat(scm-multi-platform-detection): Adding new metrics#118099Abdkhan14 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6c23be0. Configure here.
| sentry_sdk.metrics.distribution( | ||
| f"{_MULTI_METRICS_PREFIX}.needed_path_depth", | ||
| needed_path.count("/"), | ||
| ) |
There was a problem hiding this comment.
Unbounded per-path metric emissions
Medium Severity
Each detection run loops over every path in needed_paths and calls sentry_sdk.metrics.distribution once per path. That set is uncapped and can include every matching manifest or match_ext file in a large tree, so a single request may emit thousands of metric samples and add noticeable latency on top of an already expensive GitHub tree fetch.
Reviewed by Cursor Bugbot for commit 6c23be0. Configure here.
There was a problem hiding this comment.
This is okay. Max needed length over the last 2 days was 41, with a p90 of 4
| for needed_path in needed_paths: | ||
| sentry_sdk.metrics.distribution( | ||
| f"{_MULTI_METRICS_PREFIX}.needed_path_depth", | ||
| needed_path.count("/"), | ||
| ) |
There was a problem hiding this comment.
Bug: The loop over needed_paths emits a separate metric for each path, which can cause high metric volume and unnecessary overhead in large repositories.
Severity: MEDIUM
Suggested Fix
Instead of emitting a metric for every path in needed_paths, aggregate the data first. For example, you could calculate the maximum or average path depth and emit a single metric. This would capture the intended insight without the performance overhead of numerous individual calls.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/integrations/github/multi_platform_detection.py#L641-L645
Potential issue: The code iterates over `needed_paths`, an uncapped list of file paths,
to emit a metric for each path's depth. In large monorepos with many packages (e.g.,
50+), this can result in a high volume of `sentry_sdk.metrics.distribution()` calls for
a single platform detection run. While each metric call is individually lightweight and
non-blocking, the aggregate overhead from dozens of calls is unnecessary and introduces
performance inefficiency and metric noise. This behavior is reachable in common
scenarios involving monorepos with numerous workspaces.
Did we get this right? 👍 / 👎 to inform future reviews.


Motive: