Skip to content

feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4

Open
lohanidamodar wants to merge 7 commits into
mainfrom
feat/utopia-query-0.3.x
Open

feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4
lohanidamodar wants to merge 7 commits into
mainfrom
feat/utopia-query-0.3.x

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

Summary

Migrates src/Usage/Adapter/ClickHouse.php (3252 → 2854 lines) to consume utopia-php/query 0.3.x. The adapter is now a thin HTTP runtime — every DDL, INSERT, SELECT, and DELETE goes through Schema\ClickHouse / Builder\ClickHouse.

Depends on utopia-php/query#11 (branch feat/clickhouse-insert-delete-settings-mv). Once that PR is tagged 0.3.2, flip composer from dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2 to ^0.3.2.

What changed

Schema (DDL via Utopia\Query\Schema\ClickHouse)

  • setup() emits events/gauges tables through Table\ClickHouse with typed columns, Engine::MergeTree, monthly partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.
  • createDailyTable() uses Engine::SummingMergeTree('value').
  • createDailyMaterializedView() uses Schema\ClickHouse::createMaterializedView; the MV body remains a hand SELECT because subquery-aggregation MV bodies do not round-trip cleanly through the builder yet (deferred upstream).

Reads (SELECT via Utopia\Query\Builder\ClickHouse)

  • Each builder is initialised with useNamedBindings() + withParamTypes() so every ? placeholder rewrites to ClickHouse {paramN:Type} form (DateTime64(3), Int64, String, Nullable(String)).
  • Migrated: find / findFromTable, findAggregatedFromTable, count / countFromTable, sum / sumFromTable, findDaily (FINAL), sumDaily, sumDailyBatch, getTotal / getTotalFromEvents / getTotalFromGauges, getTotalBatch, getTimeSeries / getTimeSeriesFromTable.
  • Time-bucketed aggregation consumes Query::groupByTimeBucket via the builder's native GROUP BY emission, paired with selectRaw/orderByRaw for the bucket projection and ordering.
  • Cursor pagination keeps the tuple-keyset comparison as a whereRaw fragment and merges its named bindings into the builder Statement at HTTP execute time (upstream gap, deferred).

Writes (INSERT via Builder\ClickHouse::insertFormat)

  • addBatch() builds INSERT INTO t (...) FORMAT JSONEachRow; row payload still streamed in HTTP body.

Deletes (DELETE via Builder\ClickHouse::delete)

  • purge() and purgeDaily() emit lightweight DELETE FROM through the builder.

Removed

  • src/Usage/UsageQuery.phpgroupByInterval replaced by Utopia\Query\Query::groupByTimeBucket. BREAKING: downstream callers must switch from UsageQuery::groupByInterval('time', '1h') to Query::groupByTimeBucket('time', '1h').

Adapters: Method enum migration

  • convertQueriesToDatabase in Adapter/Database.php switched to Method enum cases. Same for the ClickHouse adapter's filter-method handling.

Infra

  • Dockerfile bumped to php:8.4.21-cli-alpine3.23.
  • composer.json PHP constraint bumped to >=8.4 (matches the 0.3.x query lib requirement).
  • phpstan bumped to ^2.0 for PHP 8.4 compatibility.

Test plan

  • composer lint clean
  • composer check (PHPStan max) clean
  • Snapshot tests ClickHouseSqlSnapshotTest: 9/9 green (DDL, daily, MV, find with typed bindings, groupByTimeBucket aggregation, INSERT FORMAT, getTimeSeries, FINAL, lightweight DELETE)
  • Integration tests against the Docker ClickHouse — to run in CI

Still deferred upstream (follow-up PRs)

  • Tuple-cursor builder helper (cursor pagination still uses whereRaw)
  • LowCardinality column type
  • SummingMergeTree shorthand
  • Materialized view body via builder (currently still raw SQL for the subquery-aggregation case)
  • Eventual HTTP transport extraction (future home: utopia-php/database)

Related

lohanidamodar and others added 5 commits May 17, 2026 09:37
Pin `utopia-php/query` to `dev-feat/clickhouse-insert-delete-settings-mv as
0.3.2` to consume the new ClickHouse builder + schema layer
(groupByTimeBucket, named-typed bindings, Schema\ClickHouse,
Builder\ClickHouse insertFormat/deleteMode). Flip composer to
`minimum-stability: dev` with `prefer-stable: true` so the alias resolves.

Tracks utopia-php/query#11; flip to `^0.3.2` once tagged.
Replace Query::TYPE_* string constants (removed upstream in 0.3.x) with
Method enum cases. Behaviour unchanged; the GroupByTimeBucket case is the
new name for the deferred groupByInterval/intervalGroupBy bucket — still a
no-op for the Database adapter since it has no time-bucketed aggregation
path.
PHPStan 1.x ships php-parser 4.x which cannot parse the asymmetric
visibility (`public protected(set)`) used by utopia-php/query 0.3.x's
ClickHouse schema layer (Schema\Table\ClickHouse), causing an internal
"Lexer::getNextToken returned null" crash on the usage adapter source
file.

Bump to PHPStan ^2.0 (ships php-parser 5.x, understands PHP 8.4) and
capture the pre-existing "mixed array access" findings from json_decode
result handling in a baseline so net analysis stays green while the
adapter migrates. Wire up a phpstan.neon that runs at level max over
src + tests.
Replaces the hand-rolled SQL strings in src/Usage/Adapter/ClickHouse.php
with the utopia-php/query 0.3.x builder + schema layer. The adapter is
now a thin HTTP runtime: every DDL statement, INSERT, SELECT, and DELETE
fragment is produced by Schema\ClickHouse or Builder\ClickHouse.

Schema (DDL via Utopia\Query\Schema\ClickHouse):
- setup() emits events/gauges tables through Table\ClickHouse with typed
  columns (string, datetime, bigInteger), Engine::MergeTree, monthly
  partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.
- createDailyTable() emits the SummingMergeTree daily aggregation
  table via Engine::SummingMergeTree('value').
- createDailyMaterializedView() emits the MV via
  Schema\ClickHouse::createMaterializedView. The MV body remains a hand
  SELECT because subquery-aggregation MV bodies do not round-trip
  cleanly through the builder yet (deferred upstream).

Reads (SELECT via Utopia\Query\Builder\ClickHouse):
- A per-call builder is initialised with `useNamedBindings()` plus
  `withParamTypes()` so every `?` placeholder rewrites to ClickHouse
  `{paramN:Type}` form (DateTime64(3), Int64, String, Nullable(String)).
- find/findFromTable, findAggregatedFromTable, count/countFromTable,
  sum/sumFromTable, findDaily (FINAL), sumDaily, sumDailyBatch,
  getTotal/getTotalFromEvents/getTotalFromGauges, getTotalBatch,
  getTimeSeries/getTimeSeriesFromTable all compose filters via
  $builder->filter([Query::...]) and emit aggregations through
  $builder->sum()/->count()/selectRaw.
- Time-bucketed aggregation consumes Query::groupByTimeBucket via the
  builder's native GROUP BY emission, paired with selectRaw/orderByRaw
  for the bucket projection and ordering.
- Cursor pagination keeps the tuple-keyset comparison as a whereRaw
  fragment and merges its named bindings into the builder Statement at
  HTTP execute time.

Writes (INSERT via Utopia\Query\Builder\ClickHouse::insertFormat):
- addBatch() builds `INSERT INTO t (...) FORMAT JSONEachRow` via the
  builder's insertFormat helper; the JSONEachRow payload is streamed in
  the HTTP body.

Deletes (DELETE via Utopia\Query\Builder\ClickHouse::delete):
- purge() and purgeDaily() emit lightweight DELETE through the builder.

UsageQuery is removed. Downstream callers must switch from
`UsageQuery::groupByInterval('time', '1h')` to
`Query::groupByTimeBucket('time', '1h')`. The 0.3.x library rejects
non-enumerated intervals through ValidationException at construction.

HTTP transport and configuration setters (validateHost/validatePort,
escapeIdentifier, setNamespace/setTenant/setSharedTables/setDatabase/
setSecure/setTimeout/setCompression/setKeepAlive/setMaxRetries/
setRetryDelay/setAsyncInserts, healthCheck, query/insert/executeWithRetry,
logQuery, getConnectionStats/getQueryLog/clearQueryLog) stay untouched -
the migration is strictly to the SQL-building layer.

LowCardinality wrapping on the country column is dropped (the schema
layer wraps `LowCardinality` inside `Nullable`, but ClickHouse only
accepts `LowCardinality(Nullable(T))`; the column stays a plain
Nullable(String) until the upstream wrapping order is fixed - deferred
upstream).

References utopia-php/query#11.
Adds tests/Usage/Adapter/ClickHouseSqlSnapshotTest.php covering DDL
(events table, daily SummingMergeTree, materialized view), the
named-typed binding read path, groupByTimeBucket aggregation,
INSERT FORMAT JSONEachRow, getTimeSeries shape, FINAL on daily reads,
and the lightweight DELETE form. 9 snapshot tests / 30 assertions.

Bumps Dockerfile from php:8.3.3-cli-alpine3.19 to php:8.4.21-cli-alpine3.23
and composer.json's php constraint from >=8.0 to >=8.4 so CI matches the
utopia-php/query 0.3.x asymmetric-visibility requirement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 17, 2026

Greptile Summary

Migrates ClickHouse.php from hand-rolled SQL string concatenation to the utopia-php/query 0.3.x builder/schema layer, covering DDL, SELECT with named typed bindings, INSERT FORMAT JSONEachRow, and lightweight DELETE. UsageQuery is deleted and replaced by upstream Query::groupByTimeBucket.

  • ClickHouse.php shrinks ~400 lines; every SQL emission now goes through a builder, eliminating manual parameter serialization across find, count, sum, daily, total, time-series, and purge paths.
  • Database.php switches from Query::TYPE_* string constants to Method enum cases.
  • New ClickHouseSqlSnapshotTest adds 9 builder-level snapshots covering DDL, aggregation with groupByTimeBucket, INSERT FORMAT, FINAL reads, and DELETE.

Confidence Score: 3/5

The migration is structurally sound, but getTimeSeries rejects valid bucket intervals that find()+groupByTimeBucket accepts, breaking any caller that previously used those intervals.

The INTERVAL_FUNCTIONS constant used by getTimeSeries covers only 1h and 1d, while the new bucketFunctionFor() helper supports seven. Callers migrating from the deleted UsageQuery::groupByInterval to Query::groupByTimeBucket with intervals other than 1h/1d will get a runtime InvalidArgumentException from getTimeSeries despite those intervals working correctly in the find() path.

src/Usage/Adapter/ClickHouse.php — the INTERVAL_FUNCTIONS constant and getTimeSeries validation logic.

Important Files Changed

Filename Overview
src/Usage/Adapter/ClickHouse.php Large migration from hand-rolled SQL to utopia-php/query builder; INTERVAL_FUNCTIONS is narrower than bucketFunctionFor(), making getTimeSeries reject intervals that find()+groupByTimeBucket accepts.
src/Usage/Adapter/Database.php Mechanical switch from Query::TYPE_* string constants to Method enum cases; added GroupByTimeBucket no-op case replacing UsageQuery::TYPE_GROUP_BY_INTERVAL. Clean migration.
src/Usage/UsageQuery.php Deleted; functionality replaced by upstream Query::groupByTimeBucket. Breaking change is documented in PR description.
tests/Usage/Adapter/ClickHouseSqlSnapshotTest.php New snapshot test suite covering DDL, INSERT FORMAT, aggregated SELECT with groupByTimeBucket, FINAL reads, and DELETE; single-shot filter() in aggregation test doesn't mirror the adapter's multi-call pattern.
composer.json PHP requirement bumped to >=8.4, utopia-php/query upgraded to ^0.3.2, phpstan bumped to ^2.0. Dependency is now stable-tagged.
phpstan-baseline.neon New baseline suppressing 34 type errors, mostly json_decode mixed-type accesses; previously discussed and deferred as a follow-up cleanup task.
tests/Usage/UsageQueryTest.php Deleted alongside UsageQuery.php; tests are no longer relevant after migrating to upstream groupByTimeBucket.

Reviews (3): Last reviewed commit: "chore(deps): bump utopia-php/query pin t..." | Re-trigger Greptile

Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread composer.json Outdated
Comment on lines +19 to +21
"prefer-stable": true,
"require": {
"php": ">=8.0",
"php": ">=8.4",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dev-branch dependency and loosened stability setting

"minimum-stability": "dev" means Composer can resolve any transitive dependency to a dev release. Combined with "utopia-php/query": "dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2", the locked revision is a mutable branch tip — if the branch is force-pushed or rebased after composer.lock is committed, any fresh composer update would silently pull a different commit. The PR description acknowledges this needs flipping to ^0.3.2 once tagged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is intentional during the rollout and is called out in the PR description. The plan is:

  1. Merge the upstream utopia-php/query PR and tag 0.3.2 (or whatever the next stable carries the ClickHouse builder + Method enum surface used here).
  2. Flip this PR's composer.json back to "utopia-php/query": "^0.3.2" and restore "minimum-stability": "stable" (dropping prefer-stable) in the same commit, refresh composer.lock, and only then merge.

Until the tag exists the dev-branch alias is required so CI can resolve the dependency at all. Leaving this thread open so it stays visible as a pre-merge gate.

Comment thread phpstan-baseline.neon
…ment import

Inline `$this->query(...)` sites already pass merged named bindings explicitly,
so the wrapper is dead. Removes the corresponding PHPStan baseline entry.
Comment on lines 156 to 170
@@ -172,49 +166,24 @@ public function setTimeout(int $milliseconds): self
return $this;
}

/**
* Enable or disable query logging for debugging.
*
* @param bool $enable Whether to enable query logging
* @return self
*/
public function enableQueryLogging(bool $enable = true): self
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the docs on public methods

Comment on lines +977 to +980
'integer' => $table->addColumn($id, ColumnType::BigInteger),
'float' => $table->addColumn($id, ColumnType::Float),
'boolean' => $table->addColumn($id, ColumnType::Boolean),
'datetime' => $table->datetime($id, 3),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be shorthand methods we can use for integer, float and boolean too

utopia-php/query 0.3.2 is now tagged and published with all ClickHouse
builder/schema features this PR depends on. Switch from the temporary
dev-branch alias to the stable semver constraint and restore
minimum-stability: stable (drop the temporary 'dev' relaxation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +2223 to +2224
if (!isset(self::INTERVAL_FUNCTIONS[$interval])) {
throw new InvalidArgumentException("Invalid interval '{$interval}'. Allowed: " . implode(', ', array_keys(self::INTERVAL_FUNCTIONS)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getTimeSeries interval set is narrower than find() + groupByTimeBucket

INTERVAL_FUNCTIONS only covers 1h and 1d, so calling getTimeSeries($metrics, '5m', ...) throws InvalidArgumentException even though find([Query::groupByTimeBucket('time', '5m'), ...]) succeeds. The deleted UsageQuery::VALID_INTERVALS supported seven intervals (1m, 5m, 15m, 1h, 1d, 1w, 1M), and bucketFunctionFor() (added in this PR) supports the same seven. Any caller that migrated from UsageQuery::groupByInterval('time', '5m') to Query::groupByTimeBucket('time', '5m') and uses getTimeSeries will hit a runtime error after this change.

Comment on lines +46 to 50
/** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */
private const INTERVAL_FUNCTIONS = [
'1h' => 'toStartOfHour',
'1d' => 'toStartOfDay',
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getTimeSeriesFromTable doesn't use bucketFunctionFor() for unsupported intervals

INTERVAL_FUNCTIONS is used both as the allowlist and as the function-name lookup inside getTimeSeriesFromTable. The new bucketFunctionFor() already covers all seven intervals and would make the behaviour consistent with findAggregatedFromTable. Delegating to bucketFunctionFor() eliminates the divergence.

Suggested change
/** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */
private const INTERVAL_FUNCTIONS = [
'1h' => 'toStartOfHour',
'1d' => 'toStartOfDay',
];
/** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */
private const INTERVAL_FUNCTIONS = [
'1m' => 'toStartOfMinute',
'5m' => 'toStartOfFiveMinutes',
'15m' => 'toStartOfFifteenMinutes',
'1h' => 'toStartOfHour',
'1d' => 'toStartOfDay',
'1w' => 'toStartOfWeek',
'1M' => 'toStartOfMonth',
];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants