Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 97 additions & 1 deletion docs/impulse/docs/config/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Maps the silver-layer input tables.
| `container_tags_table` | `str` | No | Full Unity Catalog path. Container EAV tags. |
| `channel_tags_table` | `str` | No | Full Unity Catalog path. Channel EAV tags. |
| `channel_mapping_table` | `str` | No | Full Unity Catalog path. Logical-to-physical channel alias table. Required when using `QueryBuilder.channel_with_alias()` (currently supported by `KeyValueStoreSolver`). |
| `unit_conversion_table` | `str` | No | Full Unity Catalog path. Per-unit-family conversion factors. When configured together with a `channel_mapping_table` whose rows carry `source_unit` / `target_unit` columns, aliased selectors auto-convert values from source to target unit during `solve()` (currently supported by `KeyValueStoreSolver`). |

Tag tables are required for solvers that consume tag-based filters
(`DeltaSolver` with tag filters, `KeyValueStoreSolver`).
Expand Down Expand Up @@ -172,8 +173,9 @@ Per-table sections (each a `TableConfig`):
| `container_metrics`| All solvers | Custom container_id column, custom timestamp columns |
| `channel_tags` | DeltaSolver | Tag key/value column renames |
| `channel_metrics` | All solvers | Custom channel_id column, custom value/timestamp columns |
| `channel_mapping` | KeyValueStoreSolver | Alias-table column renames; `priority` column |
| `channel_mapping` | KeyValueStoreSolver | Alias-table column renames; `priority` column; optional `join_keys` for non-default alias-resolution composite keys |
| `channels` | All solvers | RLE column renames (`tstart`/`tend`/`value`) |
| `unit_conversion` | KeyValueStoreSolver | Unit-conversion table column renames (`unit`, `group_id`, `conversion_factor`) |

Internal column names that mappings can target:

Expand All @@ -187,6 +189,14 @@ Internal column names that mappings can target:
| `priority` | Tie-breaker column on the `channel_mapping` table |
| `project_id` | Project scoping column |
| `parent_id` | Parent/scope identifier |
| `source_channel`| Source-channel identifier on the `channel_mapping` table |
| `data_key` | Data-key identifier (default present on both `channel_mapping` and `channel_metrics`) |
| `channel_alias` | Alias identifier on the `channel_mapping` table |
| `channel_name` | Channel-name identifier on the `channel_metrics` table |
| `source_unit`, `target_unit` | Source/target unit columns on the `channel_mapping` table |
| `unit` | Unit name column on the `unit_conversion` table |
| `group_id` | Unit-family identifier on the `unit_conversion` table |
| `conversion_factor` | Per-unit factor on `unit_conversion`; also the per-channel factor name carried into the solve UDF |

:::note Per-solver feature support

Expand Down Expand Up @@ -235,6 +245,92 @@ However, only the parts each solver supports are actually consumed:

Sections you don't customize can be omitted; defaults are an empty mapping and no filters.

### Unit conversion (optional)

Set `source.unit_conversion_table` and extend `channel_mapping` with `source_unit` / `target_unit` columns
to have aliased selectors auto-convert values from source to target unit during `solve()`. Direct selectors
via `query.channel(...)` always return raw values, even on a channel that an aliased sibling converts —
conversion is a property of the alias, not of the channel. See
[`unit_conversion`](../data_model/silver_layer_schema.md#unit_conversion-optional) for the table schema.

```python
"source": {
"container_metrics_table": "my_catalog.silver.container_metrics",
"channel_metrics_table": "my_catalog.silver.channel_metrics",
"channels_uri": "my_catalog.silver.channels",
"channel_mapping_table": "my_catalog.silver.channel_mapping",
"unit_conversion_table": "my_catalog.silver.unit_conversion"
},
"query_engine": {
"solver": "KeyValueStoreSolver",
"solver_config": {
"unit_conversion": {
"column_name_mapping": {}
}
}
}
```

### Alias-resolution join keys (optional)

`KeyValueStoreSolver.filter_aliased_channel_metrics` joins `channel_mapping`
to `channel_metrics` to resolve aliased selectors. The default composite key
is `(source_channel, channel_name) + (data_key, data_key)`. Override
`channel_mapping.join_keys` to change the arity or column choice — for
example, a single-column join when `data_key` is not part of the channel
identity in your silver layout:

```python
"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"}
]
}
}
```

Each `mapping_col` / `metrics_col` is an **internal** name (the name as the
solver sees the column **after** `column_name_mapping` has been applied on
the respective table). The two sides of a pair are independent, so the same
column can carry different names on the two tables. For instance, a layout
where the data-key column has different physical names on the two tables
has two equivalent paths:

```python
# Path 1 — rename both physical columns to the same internal name; the
# default join_keys then works unchanged.
"solver_config": {
"channel_mapping": {
"column_name_mapping": {"mapping_data_key": "data_key"}
},
"channel_metrics": {
"column_name_mapping": {"metrics_data_key": "data_key"}
}
}

# Path 2 — leave the physical names as-is and reference them directly.
"solver_config": {
"channel_mapping": {
"join_keys": [
{"mapping_col": "source_channel", "metrics_col": "channel_name"},
{"mapping_col": "mapping_data_key", "metrics_col": "metrics_data_key"}
]
}
}
```

`query.channel(...)` and `query.channel_with_alias(...)` kwargs are column
references against the **post-`column_name_mapping`** schema. If you
override `join_keys` (or skip renames) so that the solver sees a column
under a non-default name, the same name must be used as the kwarg. Example:
if `join_keys` references `metrics_col: "my_chan_name"` and the column is
not renamed via `column_name_mapping`, call
`query.channel(my_chan_name=...)`. The internal-name properties on
`SolverConfig` exist primarily to remove magic strings from the solver
code; the user-facing contract is "kwarg name == column name as the solver
sees it".

### When to use what

- **`solver_config.<table>.column_name_mapping`** — your silver-layer column is named differently from
Expand Down
45 changes: 45 additions & 0 deletions docs/impulse/docs/data_model/silver_layer_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,14 @@ pre-filtering before scanning the much larger `channels` table.
| `pz90` | `float` | Yes | 90th percentile. |
| `pz99` | `float` | Yes | 99th percentile. |

An optional `unit: string` column may also be present. When the report
config sets a `unit_conversion_table` and the solver resolves an aliased
selector, this column is treated as the authoritative source unit of the
physical channel and takes precedence over `channel_mapping.source_unit`
via `COALESCE(channel_metrics.unit, channel_mapping.source_unit)`. The
column is not part of the canonical schema — omit it for layouts that
don't need per-channel physical units.

---

## channel_tags
Expand Down Expand Up @@ -260,7 +268,44 @@ channel name to one or more physical channels keyed by `project_id` /
| `channel_name` | `string` | No | Logical channel name to match against `channel_with_alias` selectors. |
| `data_key` | `string` | No | Physical lookup key joined to `channel_metrics`. |
| `priority` | `int` | Yes | Tie-breaker when multiple physical channels match a logical name. |
| `source_unit` | `string` | Yes | **Fallback** source unit for aliased reads of this mapping. The solver resolves the effective source unit as `COALESCE(channel_metrics.unit, channel_mapping.source_unit)`, so `channel_mapping.source_unit` only takes effect when `channel_metrics.unit` is null or absent. When configured together with `target_unit` and a `unit_conversion_table`, the solver converts values from source to target unit on aliased reads. |
| `target_unit` | `string` | Yes | Target unit for aliased reads of this mapping. Always taken from the mapping (there is no analogous column on `channel_metrics`). |

Configured via `source.channel_mapping_table` (see
[Configuration](../config/configuration.md)). Joins to `channel_metrics`
on `(project_id, data_key, channel_name)`.

**Per-channel unit conversion is single-target per query.** Storing two
distinct aliases that resolve to the same physical channel (same
`(source_channel, data_key)` → same `channel_metrics.channel_id`) with
different `target_unit` (or different `source_unit`) values is allowed at
the table level. The constraint only applies at query time: if a single
query selects **both** such aliases via `channel_with_alias()`, the solver
raises `ValueError`. The current per-channel factor model attaches one
conversion factor per physical channel and cannot apply two distinct
conversions to the same channel in the same query. Workarounds: select
the conflicting aliases in **separate queries**, or align the mapping rows
so they agree on the unit pair per physical channel.

---

## unit_conversion (optional)

Per-unit-family conversion factors. Read by `KeyValueStoreSolver` at
solve time when `source.unit_conversion_table` is configured and the
`channel_mapping` table carries `source_unit` / `target_unit` columns.

| Column | Type | Nullable | Description |
|---------------------|----------|----------|------------------------------------------------------------------------------------------------------------|
| `group_id` | `string` | No | Unit family identifier (e.g. `speed`, `rotation`). Only units within the same family can convert into each other. |
| `unit` | `string` | No | Unit name. Matches the `source_unit` / `target_unit` values on `channel_mapping`. |
| `conversion_factor` | `double` | No | Multiplier that converts a value in this unit to the family's base unit. The base unit has factor `1.0`. **Required to be a positive non-null number** — a row with `conversion_factor` null, zero, or negative is rejected at query time with `ValueError` (validation runs once per query that uses unit conversion). |

For each aliased channel the solver looks up `source_factor` (the row
whose `unit` matches `source_unit`) and `target_factor` (the row whose
`unit` matches `target_unit`, constrained to the same `group_id`) and
multiplies values by `source_factor / target_factor`. Missing rows or a
`group_id` mismatch yield a null factor and no conversion.

Configured via `source.unit_conversion_table` (see
[Configuration](../config/configuration.md)).
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,16 @@ columns, then applies the top-level ``project_id`` filter and any
per-table ``channel_mapping.filters``, and finally joins with
channel_metrics to resolve aliases.

When the database is configured with a ``unit_conversion_table`` and
the ``channel_mapping`` table carries ``source_unit`` / ``target_unit``
columns, this method also propagates the effective unit pair on each
resolved row. The effective ``source_unit`` is computed as
``COALESCE(channel_metrics.unit, channel_mapping.source_unit)`` so
that the authoritative per-channel physical unit on
``channel_metrics`` takes precedence over the mapping-level default
when present. ``target_unit`` is always taken from the mapping —
there is no analogous column on ``channel_metrics``.

**Arguments**:

- `spark` (`SparkSession`): Spark session used for query execution.
Expand All @@ -160,7 +170,9 @@ channel_metrics to resolve aliases.
**Returns**:

`pyspark.sql.DataFrame`: DataFrame with ``(container_id, channel_id, selector_ids)``
where ``selector_ids`` is an array column.
where ``selector_ids`` is an array column. When unit conversion
is active (see above), also carries ``source_unit`` and
``target_unit`` columns.

#### resolve\_channel\_selections

Expand All @@ -171,15 +183,37 @@ def resolve_channel_selections(spark, channel_metrics_df,

Union direct and aliased channel metrics, combining selector_ids.

When the aliased side carries ``source_unit`` / ``target_unit``
columns (added by :meth:`filter_aliased_channel_metrics` when a
unit conversion table is configured), those columns are preserved
through the union and aggregation. Direct selectors produce null
unit columns, which causes the downstream conversion-factor join
in :meth:`solve` to leave their values unchanged.

Validates that each ``(container_id, channel_id)`` carries at most
one distinct ``source_unit`` and one distinct ``target_unit``. Per
physical channel the unit-conversion model can attach only one
factor; conflicting aliases would otherwise pick an arbitrary
target and silently mis-convert one of them.

**Arguments**:

- `spark` (`SparkSession`): Spark session used for query execution.
- `channel_metrics_df` (`pyspark.sql.DataFrame`): Direct channel metrics with ``selector_ids`` array column.
- `aliased_channel_metrics_df` (`pyspark.sql.DataFrame`): Aliased channel metrics with ``selector_ids`` array column.

**Raises**:

- `ValueError`: If two or more aliased selectors resolve to the same physical
channel with conflicting ``source_unit`` or ``target_unit``
values. Up to three offending channels are listed in the
message.

**Returns**:

`pyspark.sql.DataFrame`: Merged DataFrame with ``(container_id, channel_id, selector_ids)``.
`pyspark.sql.DataFrame`: Merged DataFrame with ``(container_id, channel_id, selector_ids)``
(plus ``source_unit`` / ``target_unit`` when present on the
aliased side).

#### solve

Expand All @@ -189,6 +223,13 @@ def solve(query, channels_df, selections, dtypes) -> DataFrame

Solve the query by grouping channels and applying selections.

When a ``unit_conversion_table`` is configured on the database and
*channels_df* carries ``source_unit`` / ``target_unit`` columns
(added upstream by :meth:`filter_aliased_channel_metrics`),
per-channel conversion factors are computed and propagated into
the grouped-map UDF so that time-series values are converted from
the source to the target unit on the fly.

**Arguments**:

- `query` (`QueryBuilder`): Query object containing database and filter information.
Expand Down
Loading
Loading