DocsData PipelinesCommon SQL Queries

Common SQL Queries

Total daily count with deduplication logic and timezone adjustment

Events exported via pipelines (i.e. raw exports) can contain duplicates. Deduplication should be performed using 4 event properties: event_name, time, distinct_id, and insert_id (docs here). This is an example of a total daily count, converted to a specific timezone and deduplicated.

SELECT
  DATE(time, 'America/Los_Angeles') AS event_date,
  COUNT(DISTINCT CONCAT(event_name, time, distinct_id, insert_id)) AS event_count,
FROM
  `<your dataset>.mp_master_event`
WHERE
  DATE(time, 'America/Los_Angeles') >= '2025-08-01'
  AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
GROUP BY
  1
ORDER BY
  1 ASC

Unique user count with user ID resolution

Raw events may contain the original distinct_id associated with the user at the time of the event instead of the final canonical distinct_id for the user after authentication. The mp_identity_mappings_data_view contains mappings of the original distinct_ids to the resolved ones (i.e. canonical distinct_ids). You can use this mapping to make sure that the unique users calculations account for ID management and therefore more accurate.

SELECT
  DATE(time, 'America/Los_Angeles') AS event_date,
  COUNT(DISTINCT resolved_user_id) AS unique_users
FROM (
  SELECT
    time,
    IFNULL(id_mappings.resolved_distinct_id, events.distinct_id) AS resolved_user_id
  FROM
    `<your dataset>.mp_master_event` AS events
  LEFT JOIN
    `<your dataset>.mp_identity_mappings_data_view` AS id_mappings
  ON
    events.distinct_id = id_mappings.distinct_id
  WHERE
    DATE(time, 'America/Los_Angeles') >= '2025-08-01'
    AND DATE(time, 'America/Los_Angeles') < '2025-09-16' )
GROUP BY
  1
ORDER BY
  1 ASC

Top 20 events by volume

SELECT
  event_name,
  COUNT(*) AS event_count
FROM
  `<your dataset>.mp_master_event`
WHERE
  DATE(time, 'America/Los_Angeles') >= '2025-08-01'
  AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
GROUP BY
  1
ORDER BY
  2 DESC
LIMIT
  20

Querying duplicate events

Raw exported events can contain duplicates. You can use these 4 event properties to identify duplicates: event_name, time, distinct_id, and insert_id (docs here). This is an example of a query you can use to identify duplicate events in your raw data.

SELECT
 *,
 COUNT(*) OVER (PARTITION BY event_name, time, distinct_id, insert_id ) AS dup_group_size
FROM
 `<your dataset>.mp_master_event`
WHERE
 DATE(time, 'America/Los_Angeles') >= '2025-08-01'
 AND DATE(time, 'America/Los_Angeles') < '2025-09-16'
QUALIFY
 dup_group_size > 1
ORDER BY
 DATE(time, 'America/Los_Angeles'),
 event_name,
 time

Was this page useful?