Skip to content

Feat reads regional benchmark#16795

Draft
chandra-siri wants to merge 4 commits intogoogleapis:mainfrom
chandra-siri:feat-reads-regional-benchmark
Draft

Feat reads regional benchmark#16795
chandra-siri wants to merge 4 commits intogoogleapis:mainfrom
chandra-siri:feat-reads-regional-benchmark

Conversation

@chandra-siri
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements time-based microbenchmarks for regional bucket read operations using JSON and gRPC APIs, and updates the AsyncGrpcClient to support custom API endpoints and quota project IDs. Review feedback highlights several improvement opportunities: moving authentication token retrieval out of the module level to optimize multiprocessing startup and prevent test collection failures, specifying explicit file encoding when opening configuration files, removing an unused import, and adding HTTP status checks to ensure benchmark accuracy by failing fast on unauthorized or missing resource errors.

Comment on lines +29 to +34
token = subprocess.run(
["gcloud", "auth", "print-access-token"],
capture_output=True,
text=True,
check=True,
).stdout.strip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Executing subprocess.run at the module level to fetch an authentication token is problematic for several reasons:

  1. Test Collection: This command runs whenever the module is imported, which can fail the entire test suite during collection if gcloud is not installed or the user is not authenticated.
  2. Multiprocessing Overhead: Since the benchmark uses the spawn start method (line 214), this module is re-imported in every worker process. For a workload with many processes (e.g., 96), this results in 96 concurrent, redundant calls to gcloud, causing significant startup delay and potential rate limiting.

Consider fetching the token once in the parent process and passing it to the workers via the initializer or as an argument to the worker function.

"""Generates a dictionary of benchmark parameters for time based read operations."""
params: Dict[str, List[TimeBasedReadParameters]] = {}
config_path = os.path.join(os.path.dirname(__file__), "config.yaml")
with open(config_path, "r") as f:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to specify an explicit encoding (e.g., encoding="utf-8") when opening files to ensure consistent behavior across different platforms and locales.

Suggested change
with open(config_path, "r") as f:
with open(config_path, "r", encoding="utf-8") as f:

import pytest
import aiohttp
import subprocess
import math
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The math module is imported but not used in this file. Removing unused imports helps maintain code cleanliness.

Comment on lines +119 to +121
async with session.get(url, headers=headers) as response:
data = await response.read()
bytes_in_iteration += len(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code reads the response body without checking the HTTP status code. If the request fails (e.g., 401 Unauthorized, 404 Not Found), response.read() will return the error message body, which will then be counted as downloaded data, leading to incorrect benchmark results. Adding response.raise_for_status() ensures fail-fast behavior and prevents potential issues with unexpected states.

Suggested change
async with session.get(url, headers=headers) as response:
data = await response.read()
bytes_in_iteration += len(data)
async with session.get(url, headers=headers) as response:
response.raise_for_status()
data = await response.read()
bytes_in_iteration += len(data)
References
  1. When a function encounters an unexpected state or unsupported input, it should raise an error instead of silently continuing to ensure fail-fast behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant