Because data can come from a variety of integrations and all those integrations may structure or handle data differently, Stitch will encounter numerous scenarios when replicating and loading data into your PostgreSQL data warehouse.

This doc covers many of the common scenarios Stitch will encounter and how PostgreSQL specifically handles them.

To make browsing easier, we’ve grouped these scenarios into the following categories:


New Table Scenarios

These scenarios apply when Stitch creates a new table in PostgreSQL.

Summary Scenario Expected Behavior
Single Primary Key First batch of data arrives with a single Primary Key Table is created with Primary Key constraint (table_name_pkey); Primary Key column has NOT-NULL constraint.
Multiple Primary Keys First batch of data arrives with multiple Primary Keys Table is created without Primary Key and no NOT-NULL columns. Primary Key info is stored as a comment on the table.
No Primary Keys First batch of data arrives without Primary Keys Table is created without Primary Key and NOT-NULL constraints.
Replication Key contains NULLs First batch of data arrives where the Replication Key column contains NULLs Table is created and rows with NULL Replication Keys are created.

Rows with NULL Replication Keys will not be replicated after the initial sync if there is at least one non-NULL value in the column.
Some empty columns First batch of data arrives with some columns that are completely empty Only columns that are populated in at least one record are created.
ALL empty columns First batch of data arrives with columns that are all NULL Table is created and contains only the Stitch replication (_sdc) columns.
Table name is too long Data arrives with a table name that exceeds the maximum length for PostgreSQL. Table names are limited to 63 characters. Postgres will reject all data for the table.
Column name is too long Data arrives with a column name that exceeds the maximum length for PostgreSQL. Column names are limited to 59 characters. Columns with names longer than 59 characters will be rejected; columns with names less than 59 characters will persist to Postgres.

Note that the limit of 59 characters is to leave room for suffixes that are a result of column splitting.
Too many columns in table A table arrives with more columns than PostgreSQL allows. The maximum number of allowed columns is 250-1,600. Postgres will reject all data for the table and surface a “too many columns” error.
Duplicate fields Data arrives with two column names that canonicalize to the same name Postgres will reject the entire record containing the two columns and a field-collision error will surface. Note that Postgres is case-sensitive.
Negative/positive infinity, NaN Special cases of FLOAT/DOUBLE values Supported.
Mixed case column name Data arrives with columns named LiKeThIs Case is maintained.
Column name with spaces Data arrives with column names containing spaces Spaces are maintained.
Unsupported special characters in column name Data arrives with column names containing special characters that are unsupported by PostgreSQL Special characters are removed.
Column name starts with non-letter Data arrives with column names that start with numeric characters: 123Column This scenario is unremarkable to Postgres.

Back to Top


Data Loading Scenarios

These scenarios apply when Stitch encounters specific data types (ex: timezones).

Summary Scenario Expected Behavior
Data for existing table version Data arrives that updates existing records in the data warehouse (matched on Primary Key) Also known as an upsert, the original row is deleted and a new row is created with the updated data.
Data with a new table version A full set of data for a table (set to Full Table Replication) arrives Table is truncated and new data is loaded in a single atomic action.
Replication Key contains NULLs (initial batch) Initial batch of data arrives where the Replication Key column contains NULLs Table is created and all rows are loaded.
Replication Key contains NULLs (subsequent batch) Subsequent batch of data arrives where the Replication Key column contains NULLs Stitch will not replicate rows where the Replication Key is NULL. Rows with values in the Replication Key column will persist to Postgres.
Timestamps out of range Data arrives with a date that is out of range for PostgreSQL to handle. PostgreSQL’s range is 4713 BC to 294276 AD. Postgres will reject records that contains that fall outside the supported time range.
VARCHARs max width Data arrives within a text column that exceeds the PostgreSQL size maximum. VARCHARs are limited to None. Stitch will stop processing data for that table.
Decimal range Data arrives within a decimal column that exceeds the PostgreSQL size limit. The limit is > 131,072 digits BEFORE the decimal
or
> 16,383 digits AFTER.
Postgres will reject the row.
Integer range Data arrives within an integer column that exceeds the PostgreSQL size limit Postgres will reject records that exceed the size limit.

Back to Top


Data Typing Scenarios

These scenarios apply when Stitch encounters specific data types (ex: timezones).

Summary Scenario Expected Behavior
Single column contains mixed data types (same batch) Initial batch of data contains mixed data types within a single column. For example: a column contains string and INTEGER data. Column will be “split” and the column name will be appended with the data type. For example: age_fl, age_bi
Single column contains mixed data types (different batch) Initial batch of data contains single data type columns, but a subsequent batch contains mixed data types in a single column. For example: a column contains string and INTEGER data. Column will be “split” and the column name will be appended with the data type. For example: age_fl, age_bi
VARCHARs (same batch) Initial data for a VARCHAR column arrives with mixed sizes. Some records contain more than 128 characters. No widening will occur. Postgres stores all VARCHAR/TEXT data as TEXT.
VARCHARs (different batch) Data for an existing VARCHAR column arrives; some records exceed the max size for the existing column. No widening will occur. Postgres stores all VARCHAR/TEXT data as TEXT.
Dates with timezones Timestamp data arrives with no timestamp information Data type is changed to TIMESTAMP WITHOUT TIME ZONE and adjusted to show as UTC.
Decimals Data arrives in decimal format with an arbitrary amount of precision. Postgres stores all decimals without precision.

Back to Top


Schema Changing Scenarios

These scenarios apply when a table undergoes structural changes.

Summary Scenario Expected Behavior
New column A new column is added to an existing table New column is appended to the end of the table.
Column removed A column is no longer seen within rows of data coming for an existing table Default NULL values are placed into the column.
Column added to nested table Data arrives with a nested table that contains a field that did not exist in that nested table before New field is added to the nested table.

Back to Top


Nested Data Scenarios

These scenarios apply when Stitch loads nested data into PostgreSQL.

Summary Scenario Expected Behavior
Nested JSON objects Data arrives that is nested with many top-level properties Keys in the object are de-nested and become columns in the table where the column name follows this naming convention: top_level_key__secondary__third__etc
Nested JSON top-level arrays Data arrives that is nested and amongst the nested data are properties that contain arrays A subtable is created. Subtables follow this naming convention: top_level_table__array_name
Nested JSON sub-level arrays Data arrives that is nested and amongst the nested data are properties that contain arrays, which themselves contain properties that are arrays See above.
Nested JSON new columns in arrays The sub-level arrays in nested JSON data have a varying set of columns See above.
Nested table name too long Data arrives for a nested table with a name that exceeds the maximum length for PostgreSQL. Table names are limited to 63 characters. Postgres will reject all records in the table.

Back to Top


Direct Data Warehouse Changes

These scenarios apply when you make a direct change to your PostgreSQL data warehouse.

Summary Scenario Expected Behavior
You add columns You add additional columns to a Stitch-generated table in PostgreSQL Fine, as long as nullable.
You apply indexes You apply indexes to columns created by Stitch No issue; Stitch will not remove them.
You apply Sort / Distribution Keys You apply Sort / Distribution Keys to columns created by Stitch This is not applicable to Postgres.
You remove Stitch (_sdc) columns from a table You remove one or more of the _sdc columns created by Stitch Inserting and updating data will break as Stitch requires the _sdc_table_version and _sdc_sequence columns.
You remove a data column from the data source You remove one or more of the data columns from the data source Default NULL values are placed into the column.
You remove a data column from the data warehouse You remove one or more of the data columns created by Stitch from the data warehouse Note: it’s sometimes possible to remove columns from database integrations only. Removing a column with a data type suffix (ex: __boolean) will break the table. If new data is detected for the deleted column, Stitch will recreate it in your data warehouse.
You revoke permissions to schemas and/or tables You remove Stitch’s access to create schemas/tables or to write to Stitch-created tables An error will surface in Stitch, data replication will stop until sufficient permissions are re-instated, and you will be notified.
You revoke data warehouse permissions You remove access to the data warehouse as a whole An error will surface in Stitch, data will fail to load, and you will be notified.

Back to Top



Questions? Feedback?

Did this article help? If you have questions or feedback, please reach out to us.