Heap Connect: Overview & Schema

Overview

Heap Connect can export your data to S3 or to three dynamic data warehouses: Redshift, BigQuery, and Snowflake. All exports are completely retroactive, just like the Heap dashboard. As you define new events and modify existing events, we seamlessly update your cluster schema with all historical data, which ensures that Connect always provides a clean and accurate representation of all of the raw data received since you’ve been using Heap.

Data Hierarchy

In Heap, all data is hierarchical: you have many users, who conduct many sessions, and every session has many events. As a result, you can have metadata attached to every level of this hierarchy, which we refer to as user, session and event-level properties. Any information on the user level propagates to the session and event level, and any information on the session-this level propagates to the event level.

For example, any user property (e.g. their email address) is available as a property on the session and event level in Heap, and any session property (e.g. UTM Source) is available as a property on the event level. However, any event-level property (e.g. URL Path) is not available on the session or user level as it pertains to only where the event itself occurred. For a complete list of all of the information we collect out-of-the-box for each type of property, please refer to our doc on Autocaptured Properties.

Heap Connect adopts a similarly hierarchical schema. By default, every environment schema will have a users table (for all user information), a sessions table (for all session information) and a table for every defined event in Heap or logged via our API. For exports to a data warehouse, we’ll also create a table entitled all_events, which includes basic information about every event in the event tables joined together for ease in aggregate analysis. Please refer to Overview and Schema for a detailed description of each table’s schema.

It is important to note that Heap Connect contains all of your defined data in Heap. Out-of-the-box, Heap will automatically collect all user interactions (pageviews, clicks, etc.), which we refer to as raw data. Using our platform, we enable end-users to associate semantic meaning with this raw data. For instance, clicking on a cta.signup might actually mean that a user signed up. Defined events are created when one associates this semantic meaning and given a name such as Sign Up. After a user defines an event, Heap will automatically sync all of the historical data associated with the user interaction in its own table as well as include these events in the all_events table. Learn more about defining events in our Events doc.

Schema

All data in your account is hosted within a single cluster or datasets (BigQuery) under that project.

By default, these are named using the project name, then an underscore, followed by the environment name. For instance, each table in your main production environment will be part of the main_production schema, while the tables in the development environment of a project named "My iOS App" will be in a schema or dataset called my_ios_app_development.

For each environment, Heap will create the following tables (known as views in BigQuery):

  • One users table
  • One sessions table
  • One pageviews table
  • One table for each event you've defined in the Heap interface and have syncing
  • One view for each segment you've defined in the Heap interface and have syncing (Redshift only)
  • One all_events table which contains the builtin pageview event and all defined or custom events synced
  • One _sync_info table
  • One _dropped_tables table (BigQuery only)
  • One user_migrations table which lists all instances of merging and migrating users

Each of the above tables (except for all_events) is backed by a "raw" table with name <view_name>_raw. This means that every environment will have both a users view and users_raw table, for example. The views perform deduplication, as the underlying raw tables may have duplicated data introduced during the sync process. Additionally, the users view filters out users that are the from user in an identify call. For that reason, we recommend querying only against the deduplicated views.

These tables have a dynamic schema which auto-updates to reflect any new events or properties you've tracked.

Users Table

This table is called users and resides under the projectname_environmentname schema. It contains a row for each user in the environment. The column schema is:

Column Name Type Description
user_id BIGINT Unique ID of user, randomly generated by Heap.
identity TEXT User's username or other unique token, passed via heap.identify API. Must be unique.
handle TEXT User's username or other unique token, passed in via heap.addUserProperties API.
email TEXT User's email address, passed in via heap.addUserProperties API.
joindate TIMESTAMP Timestamp without timezone of when the user was first seen.
last_modified TIMESTAMP Timestamp without timezone of when the user's data was last modified.
user properties… TEXT There will be one column for every unique user property you've sent via the heap.addUserProperties API (name, gender, account status, etc). The column type is automatically inferred from the underlying property values.

Sessions Table

For web, a session ends after 30 minutes of inactivity from the user. On mobile, a session ends after your app has entered the background. The sessions table has the following schema:

Column Name Type Library Description
event_id BIGINT All Unique ID of associated session event, randomly generated by Heap used internally by Heap for data syncing.
user_id BIGINT All Unique ID of associated user, randomly generated by Heap.
session_id BIGINT All Unique ID of associated session, randomly generated by Heap.
time TIMESTAMP All Timestamp without timezone of when session started.
library TEXT All Version of Heap library which initiated the session. Can be one of "web" or "iOS".
platform TEXT Web iOS User's operating system.
device_type TEXT Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country TEXT Web iOS Country in which user session occurred, based on IP.
region TEXT Web iOS Region in which user session occurred, based on IP.
city TEXT Web iOS City in which user session occurred, based on IP.
IP TEXT Web iOS The IP address for the session, which is used for determining geolocation.
referrer TEXT Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page TEXT Web URL of the first pageview of the session.
browser TEXT Web User's browser.
search_keyword TEXT Web Search term that brought the user to your site.
utm_source TEXT Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign TEXT Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium TEXT Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term TEXT Web GA-based utm_term tag associated with the session's initial pageview.
utm_content TEXT Web GA-based utm_content tag associated with the session's initial pageview.
device TEXT iOS User's device model.
carrier TEXT iOS User's mobile carrier.
app_name TEXT iOS Current name of iOS app, as determined by CFBundleName.
app_version TEXT iOS Current version of iOS app, as determined by CFBundleShortVersionString.

Pageviews Table

Heap will create a table that contains every pageview by default.

Column Name Type Library Description
event_id BIGINT All Unique ID of associated pageview event.
user_id BIGINT All Unique ID of associated user, randomly generated by Heap.
session_id BIGINT All Unique ID of associated session, randomly generated by Heap.
session_time TIMESTAMP All Timestamp when session started. This value is primarily for Heap's internal use and drifts on events. It shouldn't be relied on for analysis.
time TIMESTAMP All Timestamp without timezone of when the pageview occurred.
library TEXT All Version of Heap library which initiated the session. Can be one of "web" or "iOS".
platform TEXT Web iOS User's operating system.
device_type TEXT Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country TEXT Web iOS Country in which user session occurred, based on IP.
region TEXT Web iOS Region in which user session occurred, based on IP.
city TEXT Web iOS City in which user session occurred, based on IP.
IP TEXT Web iOS The IP address for the session, which is used for determining geolocation.
referrer TEXT Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page TEXT Web URL of the first pageview of the session.
browser TEXT Web User's browser.
search_keyword TEXT Web Search term that brought the user to your site.
utm_source TEXT Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign TEXT Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium TEXT Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term TEXT Web GA-based utm_term tag associated with the session's initial pageview.
utm_content TEXT Web GA-based utm_content tag associated with the session's initial pageview.
path TEXT Web The path of the pageview.
query TEXT Web The query parameters associated with the pageview.
hash TEXT Web The hash parameters associated with the pageview
title TEXT Web Title of the current page.
device TEXT iOS User's device model.
carrier TEXT iOS User's mobile phone carrier.
app_name TEXT iOS Current name of iOS app, as determined by CFBundleName.
app_version TEXT iOS Current version of iOS app, as determined by CFBundleShortVersionString.
view_controller TEXT iOS Name of the current view controller.
screen_a11y_id TEXT iOS accessibilityIdentifier for the current view controller.
screen_a11y_label TEXT iOS accessibilityLabel for the current view controller.

Event Tables

Heap will create one table (and one corresponding 'raw' table) for every defined event you've created within the Heap interface.

For each event table, the columns are determined by the version of the Heap library that sent the event, along with any custom properties you may have added via Snapshots, the custom API, or defined properties. For instance, if an event was sent by the Heap iOS library, its SQL table will contain iOS-specific columns like app_version or carrier, but not web-specific columns like landing_page or search_keyword. This helps keep your schema as clean as possible.

The name of these tables will be the event_name itself and reside under the projectname_environmentname schema. To produce the event_name portion, we strip the event name of any non-alphanumeric characters and snake-case the result. For example, an event named "Sign Up - Click Link (Any)" becomes sign_up_click_link_any. (Note that this means it's possible for events to collide into the same table. Make sure your event names are sufficiently distinct!)

Each of these event tables will be fully retroactive, meaning it will contain a row for every occurrence of the event since the day you installed Heap. The column schema is:

Column Name Type Libraries Description
event_id BIGINT All Unique ID of associated event, randomly generated by Heap.
user_id BIGINT All Unique ID of associated user, randomly generated by Heap.
session_id BIGINT All Unique ID of associated session, randomly generated by Heap.
time TIMESTAMP All Timestamp without timezone of when event happened.
session_time TIMESTAMP All Timestamp when session started. This value is primarily for Heap's internal use and drifts on events. It shouldn't be relied on for analysis.
type TEXT All For web auto-tracked events, can be any of view page, click, submit, change, with push state events registered as view page events. For iOS auto-tracked events, can be touch, edit field, or a gesture recognizer you've defined. For custom events, this will be the custom event name.
library TEXT All Version of Heap library on which event occurred. Can be one of "web", "iOS", or "server".
platform TEXT Web iOS User's operating system.
device_type TEXT Web iOS Device type, which can be one of "Mobile", "Tablet", or "Desktop".
country TEXT Web iOS Country in which user session occurred, based on IP.
region TEXT Web iOS Region in which user session occurred, based on IP.
city TEXT Web iOS City in which user session occurred, based on IP.
IP TEXT Web iOS The IP address for the session, which is used for determining geolocation.
referrer TEXT Web URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as direct.
landing_page TEXT Web URL of the first pageview of the session.
browser TEXT Web User's browser.
search_keyword TEXT Web Search term that brought the user to your site.
utm_source TEXT Web GA-based utm_source tag associated with the session's initial pageview.
utm_campaign TEXT Web GA-based utm_campaign tag associated with the session's initial pageview.
utm_medium TEXT Web GA-based utm_medium tag associated with the session's initial pageview.
utm_term TEXT Web GA-based utm_term tag associated with the session's initial pageview.
utm_content TEXT Web GA-based utm_content tag associated with the session's initial pageview.
domain TEXT Web Domain including subdomain, e.g. blog.heap.io.
path TEXT Web Portion of the current URL following your domain, e.g. /docs for heap.io/docs.
hash TEXT Web Portion of the current URL following the hash sign, e.g. #install for heap.io/docs#install.
query TEXT Web Query params of the page's current URL, e.g. ?utm_id=1234 for heap.io?utm_id=1234.
title TEXT Web Title of the current page.
href TEXT Web href property of link (used for clicks on anchor tags).
device TEXT iOS User's device model.
carrier TEXT iOS User's mobile phone carrier.
app_name TEXT iOS Current name of iOS app, as determined by CFBundleName.
app_version TEXT iOS Current version of iOS app, as determined by CFBundleShortVersionString.
action_method TEXT iOS Name of the action method triggered by this event, e.g. loginButtonWasPressed.
view_controller TEXT iOS Name of the current view controller.
screen_a11y_id TEXT iOS accessibilityIdentifier for the current view controller.
screen_a11y_label TEXT iOS accessibilityLabel for the current view controller.
target_view_class TEXT iOS Underlying class name of an iOS action's target, e.g. UITableCellView.
target_view_name TEXT iOS Instance variable name of an iOS action's target, e.g. loginButtonView.
target_a11y_id TEXT iOS accessibilityIdentifier of an iOS action's target.
target_a11y_label TEXT iOS accessibilityLabel of an iOS action's target.
target_text TEXT Web iOS Button text of the event target.
Event properties… TEXT All There will be one column for every unique event property you've attached, either via the heap.track API, the addEventProperties API, or Snapshots. The column type is automatically inferred from the underlying property values.

Segments Table (Redshift Only)

Heap will create one table for every defined segment you've created within the Heap interface. These tables only contain one column (user_id) that you can use to join against all other user information (users or other event tables).

Column Name Type Description
user_id BIGINT Unique ID of associated user, randomly generated by Heap.

All Events Table

Heap will create a table called all_events, which contains all defined or custom API events, pageviews, and sessions you have synced, as well as the built-in pageview event. The all_events table is defined on top of the other event tables, and includes an extra column event_view_name that denotes the source table of each row. It will have the following columns:

Column Name Type Description
event_id BIGINT Unique ID of associated event, randomly generated by Heap.
user_id BIGINT Unique ID of associated user, randomly generated by Heap.
session_id BIGINT Unique ID of associated session, randomly generated by Heap.
time TIMESTAMP Timestamp without timezone of when event happened.
event_table_name TEXT Name of the table where similar events can be found.

BigQuery Limitations

In most cases, the schema for the all_events view will consist of the union of all columns from all event tables, with nulls filled in where applicable. However, if you are syncing many tables to your project, this can run into BigQuery query length limits, in which case we'll fall back to one of the following reduced column sets:

  1. All built-in Heap columns - i.e., event_view_name and columns contained in the pageviews/sessions tables
  2. Basic columns - event_view_name, user_id, event_id, session_id, time

If your all_events view is created with a reduced column set, you can join it with the underlying event views to access any property columns that are not available in the view.

Sync Info Table

Heap creates a table called _sync_info that contains metadata around the sync process and last updated times.

Column Name Type Description
event_table_name TEXT the event name
sync_started TIMESTAMP a timestamp for when that table began syncing
sync_ended TIMESTAMP a timestamp for when that table completed syncing
synced_to_time TIMESTAMP a timestamp that reflects the most recent occurrence of the event
inserted_row_count BIGINT the number of rows inserted during the most recent sync

Dropped Tables Table

Heap creates a table called _dropped_tables that contains a record of all data tables and views that were dropped as part of an update.

Column Name Type Description
event_table_name STRING name of table or view dropped
dropped_at TIMESTAMP time of drop

User Migrations Table

When a user is identified, a migration occurs to aggregate the data under the new identity. These migrations are recorded in this table.

Note: In BigQuery, identity resolution is not applied retroactively, so queries should follow the instructions in Identity Resolution for the most accurate dataset.

Column Name Type Description
from_user_id BIGINT the migrating user's ID
to_user_id BIGINT the destination user's ID
time TIMESTAMP a timestamp for when the migration occurred

Data Syncing

As you define new events and modify existing events, we seamlessly update your cluster schema. This ensures that the data is always a clean, accurate representation of your raw data.

In particular, the following changes can happen on each sync:

  • When an event is toggled to sync for the first time, we create a new table for that event.
  • When a defined event is modified in the Heap interface, we tear down the existing table for that event, create a new one in its place, and populate the new table.
  • When a custom event property is seen for the first time, we add a new property column to the corresponding event table.
  • When a custom user property is seen for the first time, we add a new user property column to the users table.
  • When user-level properties are updated with heap.addUserProperties, we update corresponding rows in the users table.
  • When an anonymous user gets identified with an existing identity via heap.identify, we "migrate" the user by running an update on the users table and another update on all event tables that reference the anonymous user's user_id.

Table Relationships and Joining with Other Data

Joining Heap Connect Tables

Given the hierarchical nature of the data model, joining user information with event data is simple in Heap Connect. All you need to do is join the user table using Heap’s unique User ID with the table(s) you are analyzing.

main_production.users
	JOIN main_production.[event_name] 
		ON main_production.users.user_id = main_production.[event_name].user_id

The same approach can be taken using a combination of the user and session ID by joining the sessions table with any event table(s) of interest.

Joining With Other Data Sets

Similarly, Heap Connect allows you to easily combine user interaction data with data from other sources such as support, marketing automation, or revenue data. We recommend sending an internal user ID shared between these sources via our Identify API, which is then exposed retroactively on the users table in Heap and ready to join with other sources. In most cases, you’ll need to join the event and users tables first so that the internal ID is available on the event level.

SELECT main_production.users.email, is_won
	FROM main_production.users 
	JOIN main_production.viewed_customer_page 
    		ON main_production.users.user_id=main_production.viewed_costumer_page.user_id 
	JOIN salesforce_.accounts 
    		ON main_production.users.email=salesforce._account.name

Depending on the structure of your data, this query would show the account email of users who had viewed the customer page and whether or not that contract was won. Analyzing these two sources together can reveal whether or not the customers page is correlated with the number of deals won and help you answer questions like, “How can my sales team push more prospects to view this documentation?” or “How can I change my page to make it a more valuable resource?”

Heap Connect: Overview & Schema


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.