Removing Database Integration Columns from Redshift
Important: Columns can only be removed from tables created by database integrations. Columns created by SaaS integrations cannot be removed from a data warehouse at this time.
When you rename or no longer want to sync a column, what happens? For both Full Table and Incremental replication, the old column and all historical data will remain in the table even if there aren’t any new values being replicated.
For some Stitch users, retaining these columns is perfectly fine. If you like to keep things tidy, however, you can easily remove the unwanted columns by recreating your realized tables without those columns.
We recommend using pg_dump for this process, which is similar to altering the SORT and DIST keys on your tables.
Retrieving the Table Definition
In this example, we’ll show you how to remove the unwanted columns using pg_dump from the command line. We marked everything you’ll need to define yourself in square brackets [like this].
Note: If any new data is detected for the deleted column, Stitch will recreate the column in your data warehouse.
First, you’ll grab a full definition of your target table and then create the new table structure, removing the unwanted column(s):
The above command will return a response similar to the following:
Retrieving the Table’s Primary Key
Next, you need to retrieve the table’s Primary Key comment. This will be used in the next step to indicate which column(s) are the table’s Primary Keys.
Run the following query:
This will be used in the next step to indicate which column(s) are the table’s Primary Keys.
Important: Redshift doesn’t enforce the use of Primary Keys, but Stitch requires them to replicate data. In the following example, you’ll see COMMENT being used to note the table’s Primary Key. Make sure you include the Primary Key comment in the next step, as missing Primary Keys will cause issues with data replication.
Copying Data into the New Table
Next, you’ll SELECT all the historical data from the unwanted column into the new table. When you run this transaction yourself, you’ll need to change everything inside [the square brackets] as well as the following:
The column names in the table. Be sure to add _rjm or _sdc columns into the new table schema.
In the ALTER TABLE OWNER line, you’ll see [stitch_redshift_user]. This is the username of the Redshift user that Stitch uses to connect to your data warehouse. Failing to enter the Stitch username here will prevent Stitch from replicating data for this table.
Here’s the transaction we ran for our example table. Note where we’ve marked the column we want to remove - remove this when running the transaction yourself:
Verifying the Table Owner
When you perform this process yourself, make sure that the Stitch Redshift user retains ownership of the table.
If Stitch isn’t the owner of the table, issues with data replication will arise.
Verifying the New Schema
Verify your changes by using this command to retrieve the table’s schema:
In our case, the response would look something like this: