feat(BigQuery): Add support for Stateless Queries: Stateless query#9022
feat(BigQuery): Add support for Stateless Queries: Stateless query#9022Hectorhammett wants to merge 4 commits intomainfrom
Conversation
bshaffer
left a comment
There was a problem hiding this comment.
Looks great! A couple nits, but my main concern is with the QueryResults::fromStatelessQuery method. I think there may be an opportunity for improvement there.
Also, could we add a SystemTest for this? Even if it's not run on CI it would be nice to have it as a sanity check.
| isset($queryConfig['priority']) && | ||
| $queryConfig['priority'] !== 'INTERACTIVE' | ||
| ) || | ||
| (isset($queryConfig['useLegacySql']) && $queryConfig['useLegacySql']) || |
There was a problem hiding this comment.
nit: Consider this more concise variant?
| (isset($queryConfig['useLegacySql']) && $queryConfig['useLegacySql']) || | |
| ($queryConfig['useLegacySql'] ?? false) || |
| return false; | ||
| } | ||
|
|
||
| if (isset($config['configuration']['dryRun']) && $config['configuration']['dryRun']) { |
There was a problem hiding this comment.
Same here
| if (isset($config['configuration']['dryRun']) && $config['configuration']['dryRun']) { | |
| if ($config['configuration']['dryRun'] ?? false) { |
There was a problem hiding this comment.
I know you're right, I know you're right haha, but my brain always liked the other one:
Hey, is this set? Yes? ok so... is this true?
And the bottom one reads like "This true ?? FALSE" to me haha.
But I think is more readable your way haha.
| if (isset($config['jobReference']['jobId']) && !$this->isJobIdGenerated() | ||
| ) { |
There was a problem hiding this comment.
nit: the closing parenthesis should only on a newline if the if statement spans multiple lines
| if (isset($config['jobReference']['jobId']) && !$this->isJobIdGenerated() | |
| ) { | |
| if (isset($config['jobReference']['jobId']) && !$this->isJobIdGenerated()) | |
| { |
| ( | ||
| isset($queryConfig['priority']) && | ||
| $queryConfig['priority'] !== 'INTERACTIVE' | ||
| ) || |
There was a problem hiding this comment.
nit: I find it a tiny bit hard to read that we have a bunch of isset calls, but then we have this block, and the block below, that are different. Also, we have dryRun, which follows the same logic as useLegacySql, but isn't in this block.
My preference would be that we separate out the more complex checks from this large if statement, for readability.
| if ($query instanceof QueryJobConfiguration && $query->isStateless()) { | ||
| $queryRequest = $query->toQueryRequest(); | ||
|
|
||
| if (isset($queryResultsOptions['formatOptions.useInt64Timestamp'])) { |
There was a problem hiding this comment.
nit: This is for backwards compatiblity, correct? Maybe add a comment saying that, for clarity
There was a problem hiding this comment.
This is because for some reason, the RequestBuilder handles this in a dot notation for get requests but not for post request, which in a stateless query is a post request. In the jobs.query api, this is expected to be in the body instead of the query parameters. So I am normalizing the regular options passed by the user into a json body for the request.
This lives on cloud-core::RequestBuilder::build line 115.
| $statelessArgs = $queryRequest + $queryResultsOptions + [ | ||
| 'projectId' => $this->projectId | ||
| ] + $options; |
There was a problem hiding this comment.
I don't love that we are just adding $options at the end like this... can we validate the keys before or after doing this, so we know for sure what values are being added?
| $this->projectId, | ||
| $statelessResponse, | ||
| $this->mapper, | ||
| $queryResultsOptions + $options |
There was a problem hiding this comment.
Same here... we did all the work to validate $queryResultOptions and $queryRequest, it would be nice to do the same to these $options and the ones above, so we know what keys are being passed in.
| */ | ||
| private $config = []; | ||
|
|
||
| private bool $isJobIdGenerated = false; |
There was a problem hiding this comment.
since this is only used in QueryJobConfiguration, I'd advocate for moving it out of the trait and instead keeping it entirely in QueryJobConfiguration. I don't see any need to have it in the trait if it isn't used in the other classes which use the trait.
| if (!isset($this->info['jobReference'])) { | ||
| return $this->info; | ||
| } |
There was a problem hiding this comment.
If we had a stateless query, the response would be in the info field. If it has not JobReference, it means that it was in fact, completely stateless and we cannot "reload" by sending the request as there is no jobId to getQueryResults from.
Maybe we could throw and error if you want to reload a stateless query results? But that means that the user would have to be aware of the internal changes and it would be a breaking change to some people.
| * [documentation](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults#query-parameters) | ||
| * for available options. | ||
| */ | ||
| public static function fromStatelessQuery( |
There was a problem hiding this comment.
I have to say, I don't love this implementation. It seems like we are creating a Job for no reason, where we should make the Job nullable (Stateless queries don't have jobs, right? You run them immediately?).
This may require further discussion, but IMHO it seems like it would be better to instantiate QueryResults in the runQuery method, and then have waitUntilComplete be a no-op (or throw an exception) for Stateless Queries, since they're not applicable.
There was a problem hiding this comment.
A Sateless query may or may not create a job. If we take a look at QueryJobConfiguration::toQueryRequest() line 678, the request has the field:
jobCreationMode => self::JOB_CREATION_MODE_OPTIONAL enum here
So even though we run the stateless query path, BigQuery itself may decide that the best course of action for this query is to actually create a Job, meaning that we do have a job reference and jobId when the job.query endpoint responds.
We cannot force BigQuery to run it stateless.
Add support for stateless queries.
b/331273989