Our BigQuery destination is currently in Open Beta. As such, the info in this article may change.
Google BigQuery is a fully managed, cloud-based big data analytics web service for processing very large read-only data sets. BigQuery was designed for analyzing data on the order of billions of rows, using a SQL-like syntax.
For more information, check out Google’s BigQuery overview.
In this guide, we'll walk you through:
- BigQuery's pricing model,
- Some high-level limitations (including any incompatible data sources),
- How to spin up a BigQuery data warehouse of your own, and
- How Stitch loads and organizes data in BigQuery.
Pricing
Unlike many other cloud-based data warehouse solutions, BigQuery’s pricing model is based on usage and not a fixed-rate. This means that your bill can vary over time.
Before fully committing yourself to using BigQuery as your data warehouse, we recommend familiarizing yourself with the BigQuery pricing model and how using Stitch may impact your costs.
Learn more about Stitch & BigQuery pricing
Limitations
Every database has its own supported limits and way of handling data, and BigQuery is no different. The table below provides a very high-level look at what BigQuery supports, including any possible incompatibilities with Stitch’s integration offerings.
Incompatibile Sources | Possible incompatibilities. Learn more. | ||
Reserved Words |
BigQuery’s full list; Stitch reserves _rjm , _sdc , and data type suffixes (__bigint )
|
||
Table Name Length | 1,024 characters | Column Name Length | 128 characters |
Max # of Columns | 10,000 | VARCHAR Max Width | None |
Nested Structure Support | Native Support | Case | Case Insensitive |
Comparing Destinations
Not sure if BigQuery is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
Setup
While the setup process for BigQuery is pretty straightforward, it isn’t as simple as just entering credentials into Stitch. In addition to completing the authorization process inside Stitch, we also require a user that:
- Has access to a Google Cloud Platform project within BigQuery
- Has access to a project where billing is enabled
- Has Google Cloud Storage privileges
Spin up a BigQuery data warehouse
Schema
After you’ve successfully connected your BigQuery data warehouse to Stitch, you can start adding integrations and replicating data.
For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. This is where all the tables for that inegration will be stored.
Stitch will encounter dozens of scenarios when replicating and loading your data. To learn more about how BigQuery handles these scenarios, check out the Data Loading Guide for BigQuery.
Rejected Records Log
Occasionally, Stitch will encounter data that it can’t load into the data warehouse. For example: a table contains more columns than BigQuery’s allowed limit of 10,000 columns per table.
When this happens, the data will be “rejected” and logged in a table called _sdc_rejected
. Every integration schema created by Stitch will include this table as well as the other tables in the integration.
Learn more about the rejected records log
Replication
Stitch replicates data from your sources based on the integration’s Replication Frequency and the Replication Method used by the tables in the integration. In Stitch, you have the ability to control what and how often data is replicated for the majority of integrations.
The current release of Stitch’s BigQuery destination uses Append-Only Incremental Replication.
For SaaS and database tables that use Incremental Replication, Stitch will replicate data into BigQuery in an append-only fashion. This means that data that updates existing an existing row will NOT overwrite the row. Instead, a new row with the new data will be appended to the end of the table.
This means that there can be many different rows in a BigQuery table with the same Primary Key, each representing what the data was at that moment in time.
Querying an append-only table requires a different strategy than you would normally lose. For more info, check out the Querying Append-Only Tables guide.
The time from the sync start to data being loaded into your data warehouse can vary depending on a number of factors, especially for initial historical loads.
To learn more about Stitch’s replication process and how loading time can be affected, check out the Stitch Replication Process guide.
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, please reach out to us.