From time to time, Stitch will encounter data that can’t be loaded losslessly into the destination table in your data warehouse. When this happens, Stitch may have to alter the structure of the table in order to successfully load the data.

This can happen for a few reasons:

  • New columns have been added to the table
  • Columns have been removed from the table
  • Multiple data types exist in a single column. Depending on the data types in question and the data warehouse you’re using, there are a few things that can happen. We’ll get into that in a moment.

It’s important to note that any of the above taking place will result in a table structural change.


Adding & Removing Columns

  • If a new column is added to the source table, that column will be appended to the end of the table in the data warehouse.
  • If a column is removed from the source table, default NULLs will be placed into the column going forward. Stitch does not remove columns from the data warehouse.

Widening VARCHAR Columns

Depending on the data warehouse you’re using, VARCHAR columns of varying sizes can result in Stitch taking different actions to accommodate the data.

BigQuery & PostgreSQL Destinations

If you’re using BigQuery or PostgreSQL, no widening will occur as all VARCHAR data is stored as TEXT.

Redshift & Panoply Destinations

If a VARCHAR column increases in size, Stitch will automatically widen the column to accommodate the data. To do this, Stitch must drop and re-create the column in your data warehouse, which will require temporarily dropping dependent views.

Let’s say you have an orders table in an integration called rep_sales. During the structure sync, Stitch will create a schema called rep_sales in your data warehouse and inside this schema, a table called orders.

The table is structured as follows, excluding the _sdc columns:

Column           | Data Type
-----------------+--------------+
id [pk]          | BIGINT
rep_name         | VARCHAR(128)
order_amount     | BIGINT
order_confirmed  | BOOLEAN

During the first sync, the following rows are replicated:

id [pk] | rep_name    | order_amount | order_confirmed
--------+-------------+--------------+----------------+
1       | Marty McFly | 12           | true
2       | Doc Brown   | 10           | true

During the next sync, the following rows are replicated:

id [pk] | rep_name                         | order_amount | order_confirmed
--------+----------------------------------+--------------+----------------+
3       | MartyMcFlyMessedUpTheTime        | 13           | true
        | ContinuumAndEliminatedHisOwn     |              | 
        | ExistenceAndNowThereAren’tAny    |              |  
        | HoverboardsInAnAwesomeHitSequal  |              | 
        | WhichIsBothWittyAndFuturistic    |              |              
4       | Doc Brown                        | 11           | true

Take a look at the rep_name column for row 3 - it definitely contains data that exceeds 128 characters. That would alter the table’s structure to this:

Column           | Data Type
-----------------+--------------+
id [pk]          | BIGINT
rep_name         | VARCHAR(256)     // widened column
order_amount     | BIGINT
order_confirmed  | BOOLEAN

Columns with Mixed Data Types

Stitch requires that there only be one data type per column to properly type and store your data. If a column contains multiple data types, Stitch will create a new column for each data type and append the data type to the column name.

There are exceptions for BigQuery data warehouses, however - we’ll get into those in a moment.

As previously mentioned, if a column contains multiple data types, Stitch will create a new column for each data type and append the data type to the column name. Let’s take a look the orders table again as an example.

Here’s the starting structure:

Column           | Data Type
-----------------+--------------+
id [pk]          | BIGINT
order_amount     | BIGINT
order_confirmed  | BOOLEAN

During the first sync, the following rows are replicated:

| id [pk] | order_amount | order_confirmed |
|---------+--------------+-----------------|
| 1       | 12           |                 |
| 2       | 10           | true            |

During the next sync, the following rows are replicated:

| id [pk] | order_amount | order_confirmed |
|---------+--------------+-----------------|
| 3       | 13.25        | true            |
| 4       | 11           | yes             |

Again, these rows are different than the ones replicated during the initial sync. In this case:

  • order_amount can sometimes be a decimal
  • order_confirmed can sometimes be a string

To accommodate the data, Stitch will “split” these columns and create a new column, one for each data type. That would alter the orders table’s structure to this:

| Column                   | Data Type    |
|--------------------------+--------------|
| id [pk]                  | BIGINT       |
| order_amount__bigint     | BIGINT       |
| order_amount__double     | DOUBLE       |
| order_confirmed__boolean | BOOLEAN      |
| order_confirmed__string  | VARCHAR(128) |      // in Postgres, this would be TEXT

BigQuery & Mixed Data Types

As previously mentioned, BigQuery is a special exception to the mixed data type rule.

BigQuery only allows mixed data types in columns if the combo is INTEGER and FLOAT, in which case Stitch will create a single DOUBLE column.

For any other combo, BigQuery will reject the batch and add a record to the Rejected Record table for the integration.

BigQuery’s handling of mixed data types may make some integrations incompatible. For more info, refer to the Destination & Integration Compatibility Guide.


Redshift/Panoply Dependent Views

When a table’s structure has changed and Stitch needs to load data into the new structure, an ALTER command will be issued. To successfully run this command, dependent views must be temporarily ‘dropped’ to allow Stitch to re-create the table. If you receive an error that looks like the following, dependent views may be the root cause:

ERROR: cannot drop table SCHEMA.TABLE column type because other objects depend on it

Hint: Use DROP ... CASCADE to drop the dependent objects too.

While an hour or two is usually sufficient to complete the process, some very large tables may require more time. This troubleshooting guide will walk you through how to locate the dependent views and temporarily drop them.


Identifying & Resolving Record Rejections

If Stitch is unable to fit data into a table, you’ll experience some missing data on your end. To troubleshoot, we recommend taking a look at some of the common causes for dropped/missing data:

  • The table name is longer than the data warehoues’s character limit
  • A column name is longer than the data warehouse’s character limit
  • Stitch’s permissions to your data warehouse have been revoked
  • There are column names appended with __string, __bigint, etc. These are reserved for Stitch.
  • There are column names prepended with _sdc or _rjm. These are reserved for Stitch.

To pinpoint the problem, check the Rejection Records table for the integration or the Data Loading Behavior rollup for your data warehouse. These rollups contain lists of common data loading scenarios and the expected behavior, including why some records may be rejected by the data warehouse.



Questions? Feedback?

Did this article help? If you have questions or feedback, please reach out to us.