Heap Connect for Snowflake

Heap Connect's Snowflake destination empowers you to access retroactive data in your Snowflake instance with no developer resources needed. This gives you direct access to your Heap data using your Snowflake account via Snowflake Data Sharing, which enables Heap to act as a provider and share data directly with your consumer accounts.

No data is copied or transferred during this sharing, so the shared data does not take up any storage in your Snowflake account. Accordingly, this integration does not contribute to your monthly data storage charges in Snowflake. Your account will only be charged for the compute resources (i.e. virtual warehouses) used to query the shared data.

Getting Started

To get started, you need to have the following information about your Snowflake account:

  • Your Snowflake account name - Your account name can be found in the URL used to access Snowflake <account_name>.snowflakecomputing.com.
  • Your account region - We currently support uswest, useast and eudublin. If you are outside of these regions, please let us know and we will work to add support for you.

After Heap successfully shares the data with your Snowflake account, complete the following steps:

  1. Receive the share by creating a database from the shared data (share_name provided by Heap).

Data Sharing for Multiple Environments

You will receive a different share for each environment synced in Heap, so you will need to create a different database to receive the data from each environment.

If you closed the pop-up above before copying the production snippet, you can retrieve it as follows:

a. From the Activate > Connect > Snowflake page, click the refresh icon in the top right of the 2nd left navigation bar.

b. In the pop-up that appears, click the Configuration link.

c. On the Snowflake Configuration pop-up, you will see the Create Database Command. Click the copy icon to retrieve this snippet.

It may take up to 24 hours for us to prepare your account for syncing. This is noted in the 'sync pending' state.

  1. Query the tables shared to your account. We will share the built-in tables (pageviews, users, sessions, all_events) and any event tables you choose to sync. You can explore the tables synced by querying the information_schema or by viewing the sidebar in Snowflake's UI.
// Explore synced tables
SELECT * FROM <database_name>.information_schema.tables;

// Query
SELECT * FROM <database_name>.<schema_name>.<table_name>;

Schema

For each environment, Heap will create the following views in Snowflake:

  1. One users view.
  2. One sessions view.
  3. One pageviews view.
  4. One all_events view.
  5. One view for each event you've defined in the Heap interface and have set up syncing for.

These objects have a dynamic schema that auto-updates to reflect any new events or properties you've tracked.

In addition to the objects above, the following metadata tables will be created:

  1. One _sync_info table.
  2. One _dropped_tables table.
  3. One user_migrations table which lists all instances of merging and migrating users.

Timezones

By default, timezones in Snowflake are set to the local timezone of the region in which your cluster is located. However, you can reconfigure the timezone using their system settings. To do so, review this Snowflake doc on admin account management.

Limitations

For customers moving from our Redshift integration, there are a few things we do not currently support as part of the Snowflake integration. Please provide feedback if these features are important to you.

  1. Segment support - we currently do not sync segments in Heap to Snowflake.
  2. Defined properties - we currently do not sync defined properties in Heap to Snowflake.

Troubleshooting

If you are getting the error message "SQL access control error: Insufficient privileges to operate on foreign share" when attempting to create a share using the provided query, make sure you are running the query with an ACCOUNTADMIN role. You can make this selection in Snowflake.

Heap-hosted Snowflake

If you do not currently have Snowflake but want to take advantage of this integration, our team can create a Reader Account for you and share Heap data via this account. This has the advantage of helping you absorb query costs, though it also has the following limitations:

  • There is a monthly credit usage limit on the account, and your querying usage will be automatically capped when you reach that limit.
  • You cannot import data from other data sources into Snowflake for analysis with the Heap dataset.
  • The shared database is read-only, which means it's not compatible with Looker's persistent derived tables (PDTs) which require write-access.

Looker Blocks on Snowflake

Because we use Snowflake Data Sharing, which shares a read-only data set, there are some modifications that must be made to your Looker Blocks before they can be used on data stored in Snowflake. The modification depends on whether you or Heap is hosting the Snowflake account.

Snowflake hosted by you with Looker Blocks

To modify your database for Looker Blocks, complete the following steps:

  1. Create a new database in Snowflake with a scratch schema that the Looker user has write-access to - review the instructions in Looker's Snowflake documentation
  2. Create two connections in Looker: one to the Snowflake database receiving the share from Heap, and one to the database from #1 that contains the scratch schema
  3. Create a project in Snowflake using the connection to the database with the scratch schema
  4. Install Heap Looker Blocks in this project
  5. Change the name of the connection in the model file to be the name of the connection associated with the database where the scratch schema was created
  6. In every view file of the Looker Block, modify references to Heap tables by prepending the name of the Snowflake database containing the shared data from Heap

Example for #6:

view: session_facts {
   derived_table: {
      sortkeys: ["session_start_time"]
      distribution: "session_unique_id"
      # update trigger value to desired frequency and timezone
      sql_trigger_value: select date(convert_timezone('pst', getdate() - interval '3 hours')) ;;
      sql: SELECT
             all_events.session_id || '-' || all_events.user_id AS session_unique_id,
             user_id,
             row_number() over( partition by user_id order by min(all_events.time)) as session_sequence_number,
            min(all_events.time) AS session_start_time,
            max(all_events.time) AS session_end_time,
            COUNT(*) AS "all_events.count"
            FROM main_production.all_events AS all_events
         GROUP BY 1,2
          ;;
}

main_production.all_events changes to <shared database name>.heap.all_events

Heap-hosted Snowflake with Looker Blocks

Heap Looker Blocks are built using Persistent Derived Tables (PDTs), but PDTs require write-access to the schema holding the underlying data set. A workaround for Heap-hosted Snowflake is to use Looker's "ephemeral" derived tables instead. You can do this by modifying the view files in the Looker Block and commenting out every reference of sql_trigger_value. This Looker thread provides guidance on how to modify the PDTs.

Heap Connect for Snowflake


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.