feat(clickhouse): INSERT FORMAT, DELETE forms+settings, MV, groupByTimeBucket, typed bindings#11
feat(clickhouse): INSERT FORMAT, DELETE forms+settings, MV, groupByTimeBucket, typed bindings#11lohanidamodar wants to merge 14 commits into
Conversation
Adds `Builder\ClickHouse::insertFormat(string $format, array $columns = [])` which flips the builder into FORMAT-pragma mode for the next `insert()` call. The compiled output is `INSERT INTO \`t\` (\`col1\`, \`col2\`) FORMAT <name>` with no VALUES and no bindings — the row payload is streamed into the HTTP body by the calling adapter. The returned `FormattedInsertStatement` extends `Statement` with two extra read-only properties — `columns` and `format` — so adapters can map row arrays to the correct column order and pick the right body encoder without having to re-parse the SQL. Motivates the next-step migration of utopia-php/audit's `INSERT INTO t FORMAT JSONEachRow` POSTs to the ClickHouse HTTP interface onto the builder.
`Builder\ClickHouse::delete()` now appends the same SETTINGS fragment as SELECT when `hint()` or `settings()` has been called on the builder. The compiled output becomes `ALTER TABLE \`t\` DELETE WHERE ... SETTINGS k1 = v1, k2 = v2`. This is what utopia-php/audit's async cleanup needs to emit so the HTTP DELETE returns as soon as the mutation is scheduled rather than after it runs to completion — i.e. `lightweight_deletes_sync = 0`. The two stores stay merged (a `hint()` validated as `key=value` is just a SETTINGS entry on ClickHouse), so no parallel `deleteSettings()` API is introduced.
Adds `Schema\ClickHouse::createMaterializedView()` and `dropMaterializedView()`. `createMaterializedView(string $name, string $targetTable, Builder|string $body, bool $ifNotExists = true)` emits `CREATE MATERIALIZED VIEW [IF NOT EXISTS] \`name\` TO \`target\` AS <body>`. The body accepts either a `Builder` (its compiled SQL is inlined and its bindings ride the returned `Statement`) or a raw SQL string, mirroring the flexibility we need for MV bodies whose subqueries do not yet round-trip through the builder. `dropMaterializedView(string $name, bool $ifExists = true)` emits the symmetric `DROP VIEW [IF EXISTS] \`name\`` — ClickHouse uses the regular `DROP VIEW` form for both regular and materialized views. Drop-in replacement for the inline DDL utopia-php/usage builds today for its SummingMergeTree daily rollup MV.
📊 Coverage
Full per-file breakdown in the job summary. |
Greptile SummaryThis PR closes six ClickHouse builder/schema gaps needed to migrate
Confidence Score: 5/5All six features are narrowly scoped to ClickHouse-specific paths and leave every other dialect's behavior unchanged; the existing 5000+ test suite plus the new snapshot tests cover the critical new code paths thoroughly. The implementation is internally consistent: bindingMeta and bindings are kept in lockstep by the overridden addBinding/addBindings (the base addBindings uses array_push directly, so there is no double-metadata accumulation), reset() clears all new fields, Statement::withExecutor forwards namedBindings, and FormattedInsertStatement::withExecutor is correctly covariant. No stale-state, type-mismatch, or double-compile paths were identified. No files require special attention. Important Files Changed
Reviews (4): Last reviewed commit: "test(clickhouse): move new builder + sch..." | Re-trigger Greptile |
…nctions
Add a first-class base-library method for time bucketing so adapters do not
have to subclass `Query` to model `GROUP BY toStartOfHour(time)`-style
clauses. The new method is dialect-aware: only ClickHouse implements it
today; other dialects throw `UnsupportedException` from the base builder.
API
- `Method::GroupByTimeBucket` enum case + `Query::groupByTimeBucket($attr,
$interval)` factory; intervals validated against
`Query::GROUP_BY_TIME_BUCKET_INTERVALS` (`1m`, `5m`, `15m`, `1h`, `1d`,
`1w`, `1M`).
- `Feature\Aggregates::groupByTimeBucket()` interface method + trait
implementation that pushes a `GroupByTimeBucket` query onto the pending
list.
AST shape
- `ParsedQuery` gains a new readonly `timeBuckets` field
(`list<array{attribute, interval}>`) rather than folding into `groupBy`:
bucket call sites are structurally different from plain columns
(function call vs identifier) and downstream builders need to dispatch
on that distinction without parsing strings. `Query::groupByType` routes
`Method::GroupByTimeBucket` queries into this field; `Builder.compile()`
routes the method to `compileGroupBy`.
Compilation
- `Builder::compileGroupByTimeBucket()` is `protected` and throws by
default; `buildGroupByClause` calls it for every entry in
`$grouped->timeBuckets`, so unsupported dialects fail loudly at
build-time rather than silently dropping the clause.
- `Builder\ClickHouse::compileGroupByTimeBucket()` maps each allowed
interval to its `toStartOf*` function name via a closed lookup table.
Selecting / ordering on the bucket follows the same pattern as
`groupByRaw`: callers re-emit the bucket expression through `selectRaw`
or `orderByRaw` when they need to reference it in the SELECT list or
ORDER BY. Keeping the call sites explicit avoids ambiguity about which
alias the GROUP BY clause is referring to.
Tests
- `tests/Query/Builder/ClickHouse/GroupByTimeBucketTest.php` snapshots
the compiled SQL for all seven intervals, exercises composition with
plain `groupBy()` and `selectRaw/orderByRaw`, and pins the
`ParsedQuery::timeBuckets` shape.
- `tests/Query/Builder/MariaDBTest.php` covers the unsupported-dialect
path with an `UnsupportedException` assertion.
README updated with a Time bucketing subsection under ClickHouse and a
`groupByTimeBucket` row in the feature matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClickHouse over HTTP requires `{name:Type}`-style parameter placeholders for
type-safe parameterization. The builder previously emitted positional `?`
only, which forced adapters to post-process the compiled SQL — fragile
and easy to get wrong against complex predicates. This commit makes the
named-typed form a first-class, opt-in feature of the ClickHouse builder
without disturbing the positional contract every other dialect relies on.
API
- `Builder\ClickHouse::useNamedBindings(bool $enabled = true)` — toggle.
Off by default; positional `?` and `Statement::$bindings` keep working
unchanged.
- `Builder\ClickHouse::withParamType(string $column, string $type)` /
`withParamTypes(array $map)` — register a ClickHouse type for a column.
Type strings are validated against
`^[A-Za-z][A-Za-z0-9_]*(?:\([^)]*\))?$` so we reject anything that
isn't a plain type name with an optional parenthesised parameter list
(e.g. `DateTime64(3)`, `Nullable(String)`).
- New `Builder\Binding` value object scaffolds the binding-with-metadata
shape for future per-call type overrides; the placeholder rewriter
uses a parallel `list<?string>` keyed by binding index for now.
Wiring
- Base `Builder::addBinding(mixed $value, ?string $column = null)` takes
an optional column hint. Existing callers pass nothing and continue to
push to `list<mixed> $bindings` unchanged.
- Base `Builder::compileFilter()` snapshots `$bindingColumn` from the
current query attribute before dispatching the match, and restores in
a `finally` so nested filters (AND/OR/Having) don't leak column hints.
- `Builder\ClickHouse` overrides `addBinding`/`addBindings` to mirror
the column hint into `$bindingMeta` in lockstep with `$bindings`.
Index N in either array always corresponds to the N-th `?` in the
compiled SQL.
Statement
- `Statement` gains a readonly `?array $namedBindings` (default null) so
callers that read the typed map directly don't have to parse the SQL.
`FormattedInsertStatement` keeps working — its positional `parent::`
call hits `Statement::__construct` with the new param defaulted.
Rewriter
- `ClickHouse::applyNamedTypedBindings(Statement)` runs at every CH
Statement boundary (`build()`, `insert()`, `update()`, `delete()`),
walks `?` placeholders left-to-right with the same regex
`AssertsBindingCount` uses, looks up `$paramTypes[$column]` per
binding, and falls back to `inferClickHouseType($value)` when no
registration matches. The positional `$bindings` payload is untouched
so consumers can keep using it.
Tests
- `tests/Query/Builder/ClickHouse/NamedBindingsTest.php` snapshots both
paths — explicit registration and value-based inference — plus the
DELETE rewrite, LIMIT/OFFSET inference, default-off behavior, type
validation, and `reset()` clearing of binding metadata.
README updated with a Named-typed bindings subsection under ClickHouse
and a new row in the feature matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…E DELETE `Builder\ClickHouse::delete()` previously emitted only the mutation form (`ALTER TABLE … DELETE`), which rewrites parts asynchronously. That is not equivalent to the lightweight form (`DELETE FROM … WHERE …`), which marks rows deleted via a mask and is async by default. Adapters that expected the lightweight semantics — e.g. audit's `cleanup()` pre-migration — would observe a silent storage-path regression after switching to this builder. Make the choice explicit, with lightweight as the default to match the ClickHouse server default and the audit baseline. API - `Builder\ClickHouse::deleteMode(string $mode)` — pick either `DELETE_MODE_LIGHTWEIGHT` (`'lightweight'`, the default) or `DELETE_MODE_MUTATION` (`'mutation'`, opt-in). Unknown modes throw `ValidationException`. - Class constants `DELETE_MODE_LIGHTWEIGHT` and `DELETE_MODE_MUTATION` expose the wire strings so call sites can avoid magic strings. Compilation - `delete()` branches on `$deleteMode` to emit either `DELETE FROM `table` WHERE …` or `ALTER TABLE `table` DELETE WHERE …`. - The trailing `SETTINGS …` clause is unchanged — the builder emits whatever `settings()`/`hint()` registered. We do not pair `lightweight_deletes_sync = 0` with the lightweight mode nor `mutations_sync = 0` with the mutation mode automatically; callers pick the setting that matches their chosen storage path. - `reset()` restores the lightweight default. Tests - `tests/Query/Builder/ClickHouse/DeleteSettingsTest.php` extended with coverage of both forms, the explicit mutation opt-in, settings-clause composition for each form, the validation error path, and reset behavior. Renamed the original `testDeleteWithoutSettingsEmitsAlterTableDelete` to `testDefaultDeleteEmitsLightweightDeleteFrom` to reflect the new default. - `tests/Query/Builder/ClickHouseTest.php` — existing tests that asserted the mutation SQL now either call `deleteMode(Builder::DELETE_MODE_MUTATION)` explicitly (when they are there to lock down the mutation form) or assert the new lightweight SQL (when they were testing generic delete behavior). - `tests/Query/Builder/ClickHouse/NamedBindingsTest.php` — `testDeleteUsesNamedTypedPlaceholdersWhenEnabled` updated to match the new default DELETE form. README updated with a DELETE subsection describing both forms and the storage-path tradeoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standard ClickHouse formats are CamelCase, but user-registered or future format names may use underscores (e.g. `My_Format`). The previous regex threw `ValidationException` for valid identifiers.
The previous regex only allowed a single set of parentheses, so common ClickHouse types like `Nullable(DateTime64(3))` or `Array(Decimal(38, 18))` were rejected. Widened the pattern to allow one level of nested parentheses, which covers every ClickHouse type that has a parameterized inner type.
Previously a reused builder kept emitting `{paramN:Type}` placeholders
after `reset()` even when the caller expected fresh positional bindings,
and stale entries in `$paramTypes` could attach the wrong type to a
column that shared a name across queries. Reset now restores both fields
to their defaults.
…ment::withExecutor `FormattedInsertStatement` previously inherited `Statement::withExecutor()`, which called `new self()` on the parent class and silently dropped the `columns` and `format` properties, returning a plain `Statement`. Adapters that chain `withExecutor()` on the result of a FORMAT INSERT would then crash on property-access. Added a covariant override that rebuilds a full `FormattedInsertStatement`, plus a regression test that asserts the returned instance keeps both fields. The constructor docblock is also realigned with the actual parameter order.
…aller-trusted The string overload of `createMaterializedView()` inlines its argument verbatim, so a caller who derives the body from any external source can inject SQL into the resulting DDL. Added an `@security` docblock notice that points callers at the Builder overload for parameterised inputs and makes the trust boundary explicit at the call site.
The ClickHouse builder kept a parallel `list<?string> $bindingMeta` to
remember which column produced each `?` placeholder, while
`Builder\Binding` sat declared but unused. Replace the bare string array
with `list<?Binding>` and trim `Binding` to the fields actually read at
the rewrite site — `value` plus `column`. The `name` and `type` fields
were never set by any caller and the `withName`/`withType` factories
were never invoked.
`resolveBindingType()` now reads `$bindingMeta[$index]->{column,value}`
directly instead of indexing `$this->bindings` in parallel, and
`addBinding()` / `addBindings()` construct the typed value objects so
the meta list and the positional bindings list stay in lockstep.
`createMaterializedView` and `dropMaterializedView` landed as inline public methods on `Schema\ClickHouse`, which deviated from the established Feature interface + Trait pattern that every other Schema feature (Views, Databases, Triggers, …) follows. There is no precedent for dialect-scoped Schema features in the codebase today, but the Builder side already mirrors `Feature/ClickHouse` + `Trait/ClickHouse` for CH-only features (ApproximateAggregates, ArrayJoins, AsofJoins, LimitBy, WithFill). Adopt the same segments here: new `Schema\Feature\ClickHouse\MaterializedViews` interface and `Schema\Trait\ClickHouse\MaterializedViews` trait. `Schema\ClickHouse` now `implements MaterializedViews` and `use`s the trait, matching how it already consumes `Views` and `Databases`. The `Builder|string` body union is left as-is — no `RawExpression` wrapper exists yet that could replace it, and introducing one would expand the change well beyond this refactor.
…House/ Four new builder tests landed at `tests/Query/Builder/ClickHouse/`, a third layout next to the existing `tests/Query/Builder/Feature/ClickHouse/` (ApproximateAggregatesTest, ArrayJoinsTest, AsofJoinsTest). Move them into the established location and re-namespace from `Tests\Query\Builder\ClickHouse` to `Tests\Query\Builder\Feature\ClickHouse`: - InsertFormatTest - DeleteSettingsTest - GroupByTimeBucketTest - NamedBindingsTest The schema MV test (previously `tests/Query/Schema/ClickHouse/MaterializedViewTest`) also moves to `tests/Query/Schema/Feature/ClickHouse/MaterializedViewsTest` to mirror the new `Schema\Feature\ClickHouse\MaterializedViews` location that the previous commit introduced. Class renamed to `MaterializedViewsTest` to match the source feature name. All moves use `git mv` so file history follows. Test count is unchanged (5227 tests, 12166 assertions).
| /** | ||
| * Track each binding's value + column hint in lockstep with the positional | ||
| * list so the placeholder rewriter can attach the right ClickHouse type to | ||
| * the right `?`. | ||
| */ | ||
| #[\Override] | ||
| protected function addBinding(mixed $value, ?string $column = null): void | ||
| { | ||
| parent::addBinding($value, $column); | ||
| $this->bindingMeta[] = new Binding($value, $column ?? $this->bindingColumn); | ||
| } |
There was a problem hiding this comment.
Can we instead store it at bind time of the query, even if we add an unused name param in other dialects? I prefer to avoid bind-matching like this, it's caused a few bugs in current DB library
There was a problem hiding this comment.
We can add this at a higher level, Postgres supports materialized views too
Summary
Closes six ClickHouse builder/schema gaps that block migrating
utopia-php/auditandutopia-php/usageontoutopia-php/query0.3.x. The first three commits land the originally-scoped capabilities; the next three close gaps surfaced by the migration dry-run onauditandusage.1.
INSERT ... FORMAT JSONEachRowon the ClickHouse builderBuilder\ClickHouse::insertFormat(string $format, array $columns = [])flips the nextinsert()into FORMAT-pragma mode. Output:INSERT INTO `t` (`col1`, `col2`) FORMAT JSONEachRowwith no VALUES and no bindings — the row payload is streamed into the HTTP body by the calling adapter.FormattedInsertStatement(extendsStatement) exposing read-onlycolumnsandformatproperties so adapters can map row arrays to the correct column order and pick the right body encoder without re-parsing the SQL.2. DELETE with trailing
SETTINGS ...clauseBuilder\ClickHouse::delete()appends the same SETTINGS fragment as SELECT whenhint()/settings()has been called. Output:ALTER TABLE `t` DELETE WHERE ... SETTINGS k=v, ....hint()validated askey=valueis just a SETTINGS entry on ClickHouse, so no paralleldeleteSettings()API is introduced.3.
CREATE MATERIALIZED VIEW ... TO target_table AS ...onSchema\ClickHousecreateMaterializedView(string $name, string $targetTable, Builder|string $body, bool $ifNotExists = true)anddropMaterializedView(string $name, bool $ifExists = true).Builder(its compiled SQL is inlined and bindings ride the returnedStatement) or a raw SQL string for MV bodies whose subqueries don't yet round-trip through the builder.utopia-php/usagebuilds today for its SummingMergeTree daily rollup MV.4.
Query::groupByTimeBucket($attr, $interval)(new base method) + ClickHouse compilationMethod::GroupByTimeBucketenum case andQuery::groupByTimeBucket(string $attribute, string $interval)factory. Allowed intervals:1m,5m,15m,1h,1d,1w,1M.ParsedQuerygains a readonlytimeBucketsfield;Builder\ClickHouse::compileGroupByTimeBucketmaps each interval to itstoStartOf*function. Other dialects throwUnsupportedExceptionfrom baseBuilder::compileGroupByTimeBucketat build-time.UsageQuery::groupByIntervalsubclass pattern, which no longer works on 0.3.x becauseQuery::__constructcallsMethod::from()unconditionally andMethodis a backed enum.groupByRaw: re-emit the function throughselectRaw/orderByRawwhen you need to reference it in the SELECT list or ORDER BY.5. Named-typed
{name:Type}placeholder bindings onBuilder\ClickHouseBuilder\ClickHouse::useNamedBindings()toggle (off by default — positional?remains the default for parity with every other dialect). When enabled,?placeholders are rewritten to ClickHouse{paramN:Type}form at Statement-emission time.withParamType($column, $type)/withParamTypes($map)for registering column → ClickHouse type. Type strings are validated against^[A-Za-z][A-Za-z0-9_]*(?:\([^)]*\))?$.int → Int64,float → Float64,bool → UInt8,null → Nullable(String),DateTimeInterface → DateTime64(3), default →String.Statementgains a parallel readonly?array $namedBindings. The positional$bindingsis left intact so existing callers keep working.6. Lightweight
DELETE FROMalongsideALTER TABLE DELETEBuilder\ClickHouse::deleteMode($mode)picks betweenDELETE_MODE_LIGHTWEIGHT(DELETE FROM t WHERE …, the new default) andDELETE_MODE_MUTATION(ALTER TABLE t DELETE WHERE …, opt-in).SETTINGSclause is whatever the caller registers —lightweight_deletes_sync = 0andmutations_sync = 0are not auto-paired to the chosen mode.ALTER TABLE … DELETE, item 2 above) is now reachable explicitly viadeleteMode(Builder::DELETE_MODE_MUTATION); the lightweight form is the new default to match the auditcleanup()baseline before the migration.Architectural follow-ups (3 commits)
A read-only audit pass after item 6 flagged three deviations from the library's established patterns. All three are landed in this PR so the architecture stays consistent:
refactor(clickhouse): wire Binding value object through the rewriter—src/Query/Builder/Binding.phpshipped declared-but-unused while the rewriter kept a parallellist<?string>of column hints. Replace the bare string array withlist<?Binding>, trimBindingto{value, column}(the only fields read at the rewrite site), and remove the unusedname/type/withName/withTypesurface so we don't ship a public shape we never exercise.refactor(clickhouse): factor materialized views into Feature/Trait—createMaterializedView/dropMaterializedViewlanded inline onSchema\ClickHouse, deviating from theFeature\Views+Trait\Viewspattern every other Schema feature follows. Factor into newSchema\Feature\ClickHouse\MaterializedViewsinterface +Schema\Trait\ClickHouse\MaterializedViewstrait, mirroring how the Builder side already scopes CH-only features underFeature/ClickHouse/andTrait/ClickHouse/.Schema\ClickHousenowimplements MaterializedViewsanduses the trait.test(clickhouse): move new builder + schema tests under Feature/ClickHouse/— MoveInsertFormatTest,DeleteSettingsTest,GroupByTimeBucketTest,NamedBindingsTestfromtests/Query/Builder/ClickHouse/totests/Query/Builder/Feature/ClickHouse/(whereApproximateAggregatesTest,ArrayJoinsTest,AsofJoinsTestalready live), andMaterializedViewTesttotests/Query/Schema/Feature/ClickHouse/MaterializedViewsTest.php(mirroring the new feature location). All moves usegit mvso file history follows; test count unchanged.Downstream migration plan
utopia-php/auditPR #120 will get follow-up commits to (a) drop thewhereRawescape hatch in favor of typedQuery::lessThan('time', …)filters paired withwithParamType('time', 'DateTime64(3)'), and (b) switch back to the lightweight DELETE form (now the default) so the audit storage path matches the pre-migration behavior. No code change is needed inside audit for that second item — the default behavior just becomes correct again after this PR ships.utopia-php/usagemigration PR (not yet open) will land once this PR is tagged. It replaces the localUsageQuery::groupByIntervalsubclass withQuery::groupByTimeBucket, drops the local SELECT/ORDER raw expressions in favor ofselectRaw('toStartOfHour(time) ASbucket')plusgroupByTimeBucket('time', '1h'), and opts intouseNamedBindings()so the existing HTTP transport stops post-processing positional placeholders.Out of scope (deferred — explicitly not in this PR)
Test plan
composer formatclean (Pint).composer lintclean.composer checkclean (PHPStan max level).composer testgreen — 5227 tests, 12166 assertions.tests/Query/Builder/Feature/ClickHouse/{InsertFormatTest,DeleteSettingsTest,GroupByTimeBucketTest,NamedBindingsTest}.phpandtests/Query/Schema/Feature/ClickHouse/MaterializedViewsTest.php.tests/Query/Builder/MariaDBTest.php.