Customers on an Enterprise or Growth plan can access Data Pipeline as an
add-on package. See our pricing page for more
details.
Design
Mixpanel exports data to the customer’s Databricks workspace using Unity Catalog Managed Volumes. We first load the data into a single-column (VARIANT type) raw data table, then create a view to expose all properties as typed columns.Supported Features
- Cross-cloud Support: Works with Databricks on AWS, GCP, or Azure
- Date Clustering: Raw tables use liquid clustering on
event_datefor efficient date-based queries - Static IP Addresses: Supports IP allowlisting for secure connections
IP Allowlist
Mixpanel Data Pipelines supports static IP addresses for Databricks connections when IP restrictions are configured on your Databricks workspace. If you are using network policies to restrict access to your instance, you might need to add the following IP addresses to the allowed list: USPrerequisites
Before setting up the integration, ensure you have:- A Databricks workspace with Unity Catalog enabled
- A SQL Warehouse (Serverless recommended for best performance and cost)
- Admin permissions in your Databricks workspace to create Service Principals
Set Export Permissions
Step 1: Create a Service Principal (or use existing one)
A Service Principal is a Databricks identity that Mixpanel will use to access your workspace.- In your Databricks workspace, navigate to Settings → Identity and access → Service principals
- Click Add service principal
- Click Add new
- Note the Application ID - you’ll need this later
Step 2: Generate OAuth Secret
- Click on the Service Principal you just created
- Navigate to the Secrets tab
- Click Generate secret and enter lifetime (730 days recommended)
- Copy the Secret value immediately - it won’t be shown again
- Store it securely - you’ll need it for Mixpanel configuration
Step 3: Create new Catalog (or use existing ones) and Schema
Creating a new catalog requires the
CREATE CATALOG privilege, which is a metastore-level permission. Contact your metastore admin or account admin to grant this privilege.If you are using an existing shared catalog, contact the catalog owner or a metastore admin to grant the following privileges:
Step 4: Grant Permissions to Service Principal
Grant the Service Principal required permissions to operate within the catalog.- Version 1.0: privilege
CREATE TABLEcovers both tables and views - Version 1.1+: Separate
CREATE VIEWprivilege required
Why These Permissions?
USE CATALOG: Required to access the catalogUSE SCHEMA: Required to access objects in the schemaCREATE TABLE: Create raw tables to store event dataCREATE VOLUME: Create temporary volumes for uploading filesCREATE VIEW: Create views with typed columns (metastore v1.1+ only)
Step 5: Grant SQL Warehouse Access
The Service Principal needs permission to use the SQL Warehouse to execute queries.- In your Databricks workspace, navigate to SQL Warehouses
- Click on your SQL Warehouse (or create one if needed)
- Go to the Permissions tab
- Click Add or Grant permissions
- Search for your Service Principal by Application ID (from Step 1)
- Select permission level: Can use (minimum required)
- Click Save
Step 6: Get SQL Warehouse Connection Details
- In the same SQL Warehouse, go to the Connection details tab
- Note the following values:
- Server hostname: e.g.,
abc123.cloud.databricks.com - HTTP Path: e.g.,
/sql/1.0/warehouses/xyz789
- Server hostname: e.g.,
- Fast startup (~3 seconds)
- Auto-scaling
- Pay-per-use pricing
- No idle cluster costs
Step 7: Configure Mixpanel Integration
Refer to Step 2: Creating the Pipeline to create data pipeline via UI. You’ll need to provide:- Server Hostname (from Step 6)
- HTTP Path (from Step 6)
- Catalog (from Step 3, e.g.,
mixpanel_export) - Schema (from Step 3, e.g.,
json_pipelines) - Service Principal ID (Application ID from Step 1)
- Service Principal Secret (from Step 2)
Clustering
Raw tables are clustered by theevent_date column, which is computed in your project’s timezone during data load. This clustering significantly improves query performance when filtering by date.
Data Schema
Mixpanel creates a raw table and a view with typed columns: Raw Table Columns (mp_master_event_raw):
DATA(VARIANT) - Contains the complete event JSONevent_date(DATE) - Computed from event time in your project’s timezone
mp_master_event):
user_id(STRING)time(TIMESTAMP)properties(VARIANT) - All event properties as semi-structured datainsert_id(STRING)event_name(STRING)distinct_id(STRING)device_id(STRING)event_date(DATE)
Queries
Remember to grant necessary permissions to any user who wants to query the table
:: syntax to extract and cast properties from VARIANT columns.
Basic event query
Query nested properties
Getting the number of events per day
Efficient date filtering
Use theevent_date column for best performance:
Costs
- Delta tables: Billed by your cloud provider (AWS S3, GCP GCS, or Azure ADLS) via Databricks
- Managed volumes: Temporary storage cleaned up after each export
- Compute: SQL Warehouse usage during COPY INTO operations