Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.
Design
Mixpanel exports data to your database. We first load the data into a single-column raw (VARIANT type) data table. Then, we create a view to expose all properties as columns.IP Restrictions
Mixpanel Data Pipelines supports static IP addresses for Snowflake connections when IP restrictions are configured on your Snowflake instance. If you are using Snowflake Network policy to restrict access to your instance, you might need to add the following IP addresses to the allowed list: USSet Export Permissions
Step 1: Create a Role and Grant Permissions
Create a role (MIXPANEL_EXPORT_ROLE as an example) and grant access on your database, schema, warehouse to the role. Replace <database name>, <schema name>, <warehouse name> with actual names.
Step 2: Create a Storage Integration
Mixpanel stages exported data in a GCS bucket (gcs://mixpanel-export-pipelines-<project-id>) before loading it into your warehouse. This bucket is created and managed by Mixpanel — you do not need to create it yourself. To allow Mixpanel to load data from this bucket into your Snowflake warehouse, create a GCS storage integration and grant it to the role. Replace <project-id> with your Mixpanel project ID.
Step 3: Authenticate the User
Refer to Step 2: Creating the Pipeline to create a data pipeline via the UI. We provide two different authentication methods: password and key-pair. In the example, create a user with either a password or public key and then grant the role to the user. You can find the public key in the Snowflake pipeline creation UI. If you already have a user, change the fields and grant the role. password authenticationPartitioning
The data in the raw tables is clustered based on thetime column in the project’s timezone. To be exact, we use CLUSTER BY (TO_DATE(CONVERT_TIMEZONE('UTC','<TIMEZONE>', TO_TIMESTAMP(DATA:time::NUMBER))) where TIMEZONE is the Mixpanel project’s timezone.
Queries
A query is a request for data results. You can perform actions on the data, such as combine data from different tables; add, change, or delete table data; and perform calculations. Snowflake supports an Object type that can store JSON objects and arrays. Mixpanel exposes array and object top-level properties as Object columns in the view. Here is an example of how you can query the raw table when using one table for all the events.DB_NAME and SCHEMA_NAME should be replaced by your Snowflake database and schema name.
Getting the number of events in each day
You will need this if you suspect the export process is not exporting all the events you want. As the time column in the tables is in UTC timezone, you first need to convert that to your Mixpanel project timezone, and then, get the number of events for each day. The following query will do that for you.PROJECT_TIMEZONE, DB_NAME and SCHEMA_NAME should be replaced by your Mixpanel project timezone and your snowflake database and schema name.
Querying the identity mapping table
When using the ID mappings table, you should use the resolveddistinct_id in place of the non-resolved distinct_id whenever present. If there is no resolved distinct_id, you can then use the distinct_id from the existing people or events table.
Below is an example SQL query that references the ID mapping table to count the number of events in a specific date range for each unique user in San Francisco.