Because data can come from a variety of integrations and all those integrations may structure or handle data differently, Stitch will encounter numerous scenarios when replicating and loading data into your BigQuery data warehouse.
This doc covers many of the common scenarios Stitch will encounter and how BigQuery specifically handles them.
To make browsing easier, we’ve grouped these scenarios into the following categories:
- New table scenarios - These scenarios apply when Stitch creates a new table in BigQuery.
- Data loading scenarios - These scenarios apply when Stitch encounters specific data types (ex: timezones).
- Data typing scenarios - These scenarios apply when Stitch encounters specific data types (ex: timezones).
- Schema change scenarios - These scenarios apply when a table undergoes structural changes.
- Nested data scenarios - These scenarios apply when Stitch loads nested data into BigQuery.
- Direct data warehouse changes - These scenarios apply when you make a direct change to your BigQuery data warehouse.
New Table Scenarios
These scenarios apply when Stitch creates a new table in BigQuery.
Data Loading Scenarios
These scenarios apply when Stitch encounters specific data types (ex: timezones).
Summary | Scenario | Expected Behavior |
---|---|---|
Data for existing table version | Data arrives that updates existing records in the data warehouse (matched on Primary Key) | Data is appended to the end of the table. |
Data with a new table version | A full set of data for a table (set to Full Table Replication) arrives | Table is truncated and new data is loaded in a single atomic action. |
Replication Key contains NULLs (initial batch) |
Initial batch of data arrives where the Replication Key column contains NULLs |
As BigQuery is append-only, this scenario is unremarkable. |
Replication Key contains NULLs (subsequent batch) |
Subsequent batch of data arrives where the Replication Key column contains NULLs |
As BigQuery is append-only, this scenario is unremarkable. |
Timestamps out of range | Data arrives with a date that is out of range for BigQuery to handle. BigQuery’s range is Long/MIN_Value(292269055-12-02T16:47:04.192-00:00) AND Long/MAX_LONG (292278994-08-17T07:12:55.807-00:00). |
If the timestamp falls outside of BigQuery’s range, BigQuery will reject the record. |
VARCHARs max width |
Data arrives within a text column that exceeds the BigQuery size maximum. VARCHARs are limited to 2MB/row. |
Stitch will stop processing data for that table. |
Decimal range | Data arrives within a decimal column that exceeds the BigQuery size limit. The limit is 38 numbers, or places. | BigQuery accepts the data but at a loss of precision. All non-integer values are stored as DOUBLE . |
Integer range | Data arrives within an integer column that exceeds the BigQuery size limit | BigQuery rejects all the data in the batch. |
Data Typing Scenarios
These scenarios apply when Stitch encounters specific data types (ex: timezones).
Summary | Scenario | Expected Behavior |
---|---|---|
Single column contains mixed data types (same batch) | Initial batch of data contains mixed data types within a single column. For example: a column contains string and INTEGER data. |
If combo is (INTEGER , FLOAT ), Stitch will create a single DOUBLE column. For any other combo, BigQuery will reject the batch. |
Single column contains mixed data types (different batch) | Initial batch of data contains single data type columns, but a subsequent batch contains mixed data types in a single column. For example: a column contains string and INTEGER data. |
If existing column is FLOAT and new value is INTEGER , Stitch will allow it. For any other combo, BigQuery will reject the batch. |
VARCHARs (same batch) |
Initial data for a VARCHAR column arrives with mixed sizes. Some records contain more than 128 characters. |
No widening will occur. BigQuery stores all VARCHAR/TEXT data as TEXT . |
VARCHARs (different batch) |
Data for an existing VARCHAR column arrives; some records exceed the max size for the existing column. |
No widening will occur. BigQuery stores all VARCHAR/TEXT data as TEXT . |
Dates with timezones | Timestamp data arrives with no timestamp information | BigQuery has no support for timezones. |
Decimals | Data arrives in decimal format with an arbitrary amount of precision. | BigQuery does not offer fixed precision. The only supported decimal datatype is DOUBLE . |
Schema Changing Scenarios
These scenarios apply when a table undergoes structural changes.
Summary | Scenario | Expected Behavior |
---|---|---|
New column | A new column is added to an existing table | New column is appended to the end of the table. |
Column removed | A column is no longer seen within rows of data coming for an existing table | Default NULL values are placed into the column. |
Column added to nested table | Data arrives with a nested table that contains a field that did not exist in that nested table before | New field is added to the nested table. |
Nested Data Scenarios
These scenarios apply when Stitch loads nested data into BigQuery.
Summary | Scenario | Expected Behavior |
---|---|---|
Nested JSON objects | Data arrives that is nested with many top-level properties | Nesting is maintained within BigQuery. If BigQuery can’t support the structure, it will be rejected. |
Nested JSON top-level arrays | Data arrives that is nested and amongst the nested data are properties that contain arrays | See above. |
Nested JSON sub-level arrays | Data arrives that is nested and amongst the nested data are properties that contain arrays, which themselves contain properties that are arrays | See above. |
Nested JSON new columns in arrays | The sub-level arrays in nested JSON data have a varying set of columns | See above. |
Nested table name too long | Data arrives for a nested table with a name that exceeds the maximum length for BigQuery. Table names are limited to 1,024 characters. | BigQuery will reject the table. |
Direct Data Warehouse Changes
These scenarios apply when you make a direct change to your BigQuery data warehouse.
Summary | Scenario | Expected Behavior |
---|---|---|
You add columns | You add additional columns to a Stitch-generated table in BigQuery | Fine, as long as nullable. |
You apply indexes | You apply indexes to columns created by Stitch | This is not applicable to BigQuery. |
You apply Sort / Distribution Keys | You apply Sort / Distribution Keys to columns created by Stitch | This is not applicable to BigQuery. |
You remove Stitch (_sdc ) columns from a table |
You remove one or more of the _sdc columns created by Stitch |
Inserting and updating data will break as Stitch requires the _sdc_table_version and _sdc_sequence columns. |
You remove a data column from the data source | You remove one or more of the data columns from the data source | Default NULL values are placed into the column. |
You remove a data column from the data warehouse | You remove one or more of the data columns created by Stitch from the data warehouse Note: it’s sometimes possible to remove columns from database integrations only. | |
You revoke permissions to schemas and/or tables | You remove Stitch’s access to create schemas/tables or to write to Stitch-created tables | |
You revoke data warehouse permissions | You remove access to the data warehouse as a whole | An error will surface in Stitch, data will fail to load, and you will be notified. |
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, please reach out to us.