Docs
Tracking Methods
Integrations
Snowplow

Snowplow

If you already use Snowplow to collect event data, it's easy to route that data to Mixpanel. The value of connecting Snowplow and Mixpanel is to enable fast, self-serve product analytics on the events you already collect. In this guide, we share the recommended ways to get events flowing from Snowplow → Mixpanel.

How Snowplow Works

Snowplow is an open-source, self-hosted platform for collecting and storing event data from your application. Teams use Snowplow's SDKs and platform to exercise full control over event data collection in their own cloud. Snowplow's event-centric approach (opens in a new tab) is fully compatible with Mixpanel; our infrastructure (opens in a new tab) is also purpose-built to ingest, store, and query events.

Below we show the architecture of a Snowplow pipeline from their documentation (opens in a new tab).

image

Integrating with Mixpanel

Snowplow's main responsibility is to collect and validate events before storing the events in either:

  • A streaming system like Amazon Kinesis, Google PubSub, or ElasticSearch.
  • A data lake like Amazon S3 or Google Cloud Storage.

Depending on which of the above destinations you've configured in Snowplow, you can follow our guides for ingesting from that destination into Mixpanel. We provide guides for Amazon S3, Google Cloud Storage, and Google Pub/Sub. The process for Kinesis is very similar. Similar to Snowplow itself, these guides run fully in your own cloud and give you granular control over what exactly what gets sent to Mixpanel.

If you're using Snowplow to ingest events directly into a data warehouse like Snowflake or BigQuery, we recommend leveraging our Reverse ETL integrations with Census (opens in a new tab) and HighTouch (opens in a new tab). Reverse ETL tools let you model events using a SQL query and push them to all the tools in your stack, including Mixpanel.

Finally, Mixpanel's Import API (opens in a new tab) is a simple JSON-over-HTTP API. You can always use any other orchestration tools you have in your stack (Airflow, Dagster, Spark, etc.) to read data produced by Snowplow in your cloud and directly hit our import API to ingest them into Mixpanel.

Was this page useful?