ClickHouse usage analytics: events/gauges tables with daily MV#3
Merged
Conversation
- Database adapter - ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter. - Introduced new Query class for building ClickHouse queries with fluent interface. - Added support for advanced query operations including find and count methods. - Enhanced error handling and SQL injection prevention mechanisms. - Created comprehensive usage guide for ClickHouse adapter. - Added unit tests for Query class to ensure functionality and robustness. - Maintained backward compatibility with existing methods while improving overall architecture.
…metric logging with deterministic IDs
…ed tags in ClickHouse and Database adapters
…pdate tests for new behavior
…on, getTotal ambiguity - Buffer key now includes tag hash so events with same metric but different tags (e.g. different paths) stay as separate entries instead of silently discarding the second call's tags - Daily table queries (findDaily, sumDaily, sumDailyBatch) now validate attributes against the daily schema (metric, value, time, tenant) instead of the full event schema. Querying path/method/status on the daily table now throws immediately instead of causing a ClickHouse "No such column" runtime error - Changed (int) cast to (float) for agg_value in getTimeSeries to avoid truncating fractional gauge values or large event sums - getTotal() now throws when a metric exists in both event and gauge tables instead of silently adding incompatible aggregations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
loks0n
reviewed
Apr 16, 2026
loks0n
reviewed
Apr 16, 2026
PHPStan level max flagged getTimeSeries() annotations as int while the ClickHouse adapter emits floats via agg_value cast. Updates the abstract, both adapters, the Usage facade, and zeroFillTimeSeries to float. Also throws on json_encode failure in Usage::collect so the md5() input is guaranteed string instead of string|false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts-only sum - extractGroupByInterval: match by method string, not instanceof (parsed queries are base Query) - flush(): selectively clear buffer on per-batch success (retry preserved on failure) - collect(): use TYPE_EVENT constant instead of string literal - addBatch(): require explicit \$type param (no default) - sum(): events-only by default (summing gauges is meaningless) - sumDaily*: document as events-only (daily MV has only events) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Push the count cap down into the DB layer so callers that only need a capped total (e.g. rendering "5000+") can stop ClickHouse early instead of scanning the full filtered set. ClickHouse wraps the count in a LIMIT-bounded subquery; Database delegates to utopia-php/database's existing $max arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds keyset-pagination cursor support (cursorAfter / cursorBefore) to the ClickHouse adapter via parseQueries. Cursor values accept Metric/ArrayObject or plain associative arrays; an `id` tiebreaker is auto-appended to ORDER BY so pagination is deterministic on non-unique columns. cursorBefore flips direction at SQL build time and reverses results post-fetch. Rejects two unsafe combinations: cursor + groupByInterval (no stable identity on aggregated rows), and cursor + null type (paginating across events and gauges has no coherent ordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups based on review feedback on the audit PR's twin implementation: - Drop the always-true `!empty($orderAttributes)` guard inside the cursor branch. resolveCursorOrder() always appends an `id` tiebreaker, so the guard is dead code and was misleading. - normalizeCursorRow now removes `$id` after copying it to `id`, so cursor state is no longer carrying both keys. - Throw an explicit Exception when a cursor value is null. The previous path silently routed null `time` cursors through formatDateTime(null) which returns the current timestamp — a misconfigured cursor would filter on `time < now()` and produce wrong pages instead of failing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds notEqual, notContains, notBetween, isNull, isNotNull, startsWith, endsWith — keeping the supported Query method set in line with the audit ClickHouse adapter. startsWith / endsWith use ClickHouse's built-in functions of the same name; isNull / isNotNull emit `IS NULL` / `IS NOT NULL` (no value binding); the rest follow the existing param-bound pattern. The shared parseQueries logic is now consistent across both adapters: - getParamType() centralises the column → ClickHouse-type mapping (time → DateTime64(3), value → Int64, default → String). Previously each case had an inline `if (\$attribute === 'time')` branch. - formatTypedValue() routes DateTime-typed values through formatDateTime and everything else through formatParamValue, so each case has one code path. - buildCursorWhere() uses the same dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e strictness - ClickHouse purge() now also deletes from the daily aggregated table when purging events. Materialized views are forward-only, so purges on the source table left stale daily rows behind. Daily delete is skipped if any query references an event-only column (path/method/etc). - ClickHouse getTotalBatch() now raises when a metric appears in both the event and gauge tables under $type=null, matching the existing safeguard in getTotal(). Mixing SUM (events) with argMax (gauges) silently produced meaningless totals. - Usage::setNamespace/setTenant/setSharedTables now flush the buffer before changing adapter context. Buffered metrics carry no context, so changing it pre-flush would write them under the new context. - Database adapter now stores a 'type' field per document and filters by it in find/count/purge/getTotal when $type is non-null. Previously the $type argument was accepted but ignored, returning rows of both kinds. - composer.json: add 'test' script. - .github/workflows: bump actions/checkout v3 -> v4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the validator pattern in utopia-php/database
(Validator/Query/Filter.php): contains/notContains/equal/etc. queries
must have at least one value; an empty values array is rejected up front
with `<Method> queries require at least one value.` instead of silently
producing a "no filter applied" WHERE clause.
Without the guard, `Query::contains('metric', [])` would skip the IN
clause entirely and return all rows — exactly the opposite of the
intended IN () semantics, which should match nothing.
Applies the same VALUE_REQUIRED_METHODS allow-list and pre-switch check
that the audit adapter uses, so both libraries reject the same set of
empty-value filter methods consistently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… retry dedup, gauge order, cross-type validation, Database value check)
The Client::fetch(url:, method:, body:) call surface and METHOD_* constants used by the ClickHouse adapter are unchanged between 0.5 and 1.1. Bumping to ^1.1 so the library is installable alongside appwrite/server-ce 1.9.x, which now requires utopia-php/fetch ^1.1 (appwrite/appwrite#12252).
Database adapter: convertQueriesToDatabase() previously dropped TYPE_NOT_EQUAL, TYPE_NOT_BETWEEN, TYPE_NOT_CONTAINS, TYPE_STARTS_WITH, and TYPE_ENDS_WITH silently — the switch had no case for them so the WHERE fragment was skipped, turning a "not X" or "starts with Y" filter into a full-collection match. Adds all five. ClickHouse adapter: find()/count()/getTimeSeries() with $type=null queried both events and gauges. When the query referenced an event-only attribute like 'path', the gauge iteration would throw "Invalid attribute name: path" via parseQueries(). Adds a private queriesMatchType() helper that pre-checks each filter attribute against the type's schema; skip the table when not satisfied. The caller now gets the events side without the gauge crash, which is what null-type semantics should mean. sum() takes type=TYPE_EVENT as a hard default, no null-type path.
…-range scan efficiency
Previous ORDER BY of (tenant, id) had id (random UUID) as the primary
sort key, so ClickHouse stored rows in essentially random physical
order. Time-range predicates like WHERE time > X had to scan every
granule because the primary index had no time information to skip on.
Re-key to (tenant, metric, time, id) so the primary index matches
how the data is actually queried:
- tenant: multi-tenant isolation (cheap first-level filter)
- metric: per-metric series (most queries are scoped to one)
- time: range scans now hit a small contiguous span instead of
the whole table
- id: tiebreaker for stable physical ordering
Gauges get the same shape. Daily MV already had the right key.
Drop the now-redundant bloom_filter indexes on metric and time
(primary key already covers them).
Pre-prod schema change — no migration path needed, just DROP+CREATE
on next deploy.
Updates MetricTest counts to match the trimmed index lists.
… Adapter These were declared abstract on the base, forcing every implementation to provide them even when the underlying backend has no multi-tenant or namespace concept. No caller types against the abstract Adapter to invoke them — every consumer goes through the Usage facade. - Drop the three abstract method declarations from Adapter. - Both ClickHouse and Database adapters keep their concrete impls (the methods are still needed for current usage). - Facade now forwards via method_exists, so a future minimal adapter (no multi-tenancy, no namespacing) can extend Adapter without implementing dead stubs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete rewrite of the usage analytics library with a two-table architecture optimized for both real-time analytics and billing.
Architecture
Key Changes
API
Write
Read
Billing (Daily MV)
Test Plan