I am going to use cube cloud to query clickhouse but some of the data currently lives in postgres (account master, metric master, ad label config) and some lives in clickhouse (fact tables with IDs that point to the master tables) as of today we first query postgres convert stuff to UUIDs and then query clickhouse. we are going to use cube cloud as semantic layer to make things like derived measures, sql generation easier. one thing I am exploring is Potsgres table engine in clickhouse. can you explain how this may work? then cube can simply think about doing everything via clickhouse and I avoid multi data source chaos. should I use materialize postgres engine or this one? what are the tradeoffs. I would like to run a single query on clickhouse ideally via cube generated sql here is some more context: We are evaluating Cube Cloud as the semantic layer for analytics queries that read fact data from ClickHouse. Our current production model is split across two systems: - **Postgres is GCP-managed** and is the source of truth for account metadata, metric definitions, label configuration, report configuration, user/account access, and sync status. - **ClickHouse is self-hosted** and stores high-volume analytical facts: ad creatives, ad labels, daily metric facts, and breakdown metric facts. Today the app first queries Postgres to resolve business objects into internal UUIDs, then generates ClickHouse SQL using those UUIDs. The important Postgres tables are: - `organization_auth_account`: account master, including internal account UUID, platform, currency, and merged-account metadata. - `insights_v2_metric`: metric master, including metric UUIDs, names, platform, account-specific metrics, formulas, metric type/subtype. - `creative_module_adlabelconfig`, `creative_module_adlabelgroup`, `creative_module_adlabelv2`: label config and assignments. - Report/section tables: saved report config, selected metrics, groupings, and filters. The important ClickHouse tables are: - `ad_creative` - `ad_label` - `{platform}_daily_metric` - `{platform}_daily_breakdown_metric` The ClickHouse metric tables are EAV-style. A row is keyed roughly by: ```text date, account_id UUID, ad_id String, metric_id UUID, value Float64 ``` Breakdown tables add: ```text breakdown_key, breakdown_value ``` So Cube cannot model a measure like `spend` purely from a stable ClickHouse column. The meaning of a metric comes from Postgres `insights_v2_metric.metric_id`, plus account/platform context. For normal accounts, the app loads active metrics from Postgres with global platform metrics plus account-specific metrics: ```text platform = account.platform AND status = APPROVED AND (account_id IS NULL OR account_id = selected_account_id) ``` Merged accounts are the tricky part. A merged account is not a separate fact-table account in ClickHouse. In Postgres, the parent account has `ad_account_type = 'MERGED'`, and its `ad_account_id` stores a JSON list of child internal `Account.id` UUIDs. The app resolves these with `Account.get_child_accounts()`. For merged accounts: - Metric and creative facts are queried using **child account UUIDs**: `account_id IN ()`. - Labels are filtered using the **merged parent account UUID**, because label config/classification is attached to the merged reporting account. - In prod there are 26 merged accounts: 22 Facebook parents, 1 Google parent, and 3 TikTok parents. - Some merged parents are cross-platform, e.g. a Facebook parent can include Facebook, Google, and TikTok child accounts. Metric mapping is also Postgres-driven. There is no single ClickHouse mapping table that says “for account X/platform Y/metric Z, use column W.” Instead: - `insights_v2_metric.metric_id` is the UUID stored in ClickHouse rows. - `account_id IS NULL` means a global/platform metric. - `account_id = ` means an account-specific metric. - `metric_name` is used as a semantic identity for some cross-account/cross-platform remapping. - `metric_type = PRIMARY` means the metric has raw ClickHouse rows. - `metric_type = CALCULATED` means the metric is formula-derived from primary metrics stored in Postgres JSON formula/dependency fields. For merged accounts, the selected UI metric may be one target metric UUID, but the ClickHouse query may fetch multiple source metric UUIDs across child accounts/platform tables and remap them back to the selected metric before aggregation. Spend metrics may also need FX conversion for non-USD child accounts. **Winning Criteria / Winner Computation** Winning criteria are account-level config stored in Postgres on `organization_auth_account.additional_info.winning_filters`. They define what makes an ad a “winner” for creative reporting. In prod today: - 147 accounts have winning criteria configured. - 181 total criteria exist. - Max criteria per account is 4. - 238 total conditions exist. - Max conditions in one criterion is 9. - Condition usage: - dynamic + calculated metric: 19 - dynamic + primary metric: 5 - static + calculated metric: 143 - static + primary metric: 71 Config shape: ```json { "target_campaigns": ["@_ALL_@"], "criterias": [ { "criteria_name": "Winning Criteria #1", "conditions": [ { "metric_id": "uuid", "type": "static | dynamic", "operator": ">" , "value": 50000, "x": 10 } ] } ] } ``` A criterion is evaluated as an **AND** of all its conditions. An ad is winning for that criterion only if every condition passes. `target_campaigns` scopes the criterion, not the whole query. `["@_ALL_@"]` means all campaigns. Otherwise each entry is matched as a case-insensitive substring against `campaign_name`, e.g. `["iOS", "_FB_"]`. Static conditions compare the selected metric against a fixed value: ```text ad_metric > 1000 ``` Dynamic conditions compare the selected metric against a historical benchmark: ```text ad_metric > avg(metric over historical window) ``` Supported dynamic windows include: - `avg_previous_month` - `avg_previous_week` - `avg_previous_week_from_start_date` - `avg_previous_week_from_end_date` - `avg_last_30_days` - `avg_last_30_days_from_start_date` - `avg_last_30_days_from_end_date` - `avg_last_90_days_from_start_date` - `avg_last_90_days_from_end_date` - `avg_selected_time_window` - `avg_last_year` - `avg_current_year` The optional `x` multiplier adjusts the threshold. For `>` / `>=`, threshold becomes `threshold * (1 + x/100)`. For `<` / `<=`, threshold becomes `threshold * (1 - x/100)`. Important: the schema does not directly support “metric A compared to metric B” as a condition. The annoying case is “calculated metric compared to its own calculated historical benchmark,” e.g. `COST_PER_PURCHASE < avg_last_30_days(COST_PER_PURCHASE)`, where `COST_PER_PURCHASE = spend / purchases`. Examples currently in prod: - Facebook, campaign scope `["mid_", "mm_"]`: - `COMPLETE_REGISTRATION > avg_last_30_days` - `SPENDS > avg_last_30_days` - `CTR < avg_last_30_days` - `CTR` is calculated as `(clicks * 100) / impressions`. - Facebook, campaign scope `["top"]`: - `COST_PER_OMNI_LANDING_PAGE_VIEW < avg_last_30_days` - `CPC < avg_last_30_days` - `SPENDS > 1000` - Both cost metrics are calculated ratios over primary metrics. - Facebook, all campaigns: - `WALLFLOWER VALUE:SPEND RATIO > avg_selected_time_window * 1.10` - `SPENDS > 1000` - The ratio metric is calculated as `value / spend`. - TikTok, all campaigns: - `COST_PER_PURCHASE_(APP) < avg_last_90_days_from_start_date` - `COST_PER_PURCHASE_(SKAN) < avg_previous_week_from_start_date` - Both are calculated cost-per-action metrics. Correct computation requirements for Cube: 1. Winner flags must be computed at the **ad level**, not only at the final grouped row level. 2. Metric facts must first be deduped by latest `updated_at` using `argMax(value, updated_at)` per `ad_id`, `metric_id`, `date`. 3. Primary metric values are aggregated per ad over the selected date range. 4. Calculated metrics must be evaluated from their dependent primary metric totals, not from a raw ClickHouse column. 5. Dynamic thresholds require a separate historical-window aggregation for the same metric definition. 6. For calculated dynamic metrics, the historical benchmark must compute the formula over historical primary metric totals. 7. Winning counts are distinct ad counts: - per criterion: number of unique ads where that criterion is true - percentage: `winning_ads / total_ads * 100` - total winners: unique ads winning at least one criterion, not a double-counted sum across criteria 8. Breakdown queries are especially sensitive: winner status should be based on the correct ad-level or ad+breakdown metric context, otherwise the same ad can be incorrectly marked winning/non-winning across breakdown rows. 9. Winning filters should act as filters on the boolean winner result, e.g. keep winning ads or non-winning ads. Relevant code paths: - `organization_auth_account.additional_info.winning_filters` - `creative_module/services/v2/ch_translator.py`: converts account config into ClickHouse query contract - `clickhouse_read_module/query_planner.py`: resolves winning columns, dynamic windows, multiplier logic - `clickhouse_read_module/sql_builder.py`: builds historical-window CTEs, winner flags, winner filters, and winner counts - `creative_module/services/v2/data_service.py`: shapes winning counts/percentages for API response The design question is how Cube should own this semantic layer