Want to improve your query performance? In this article, we’ll walk you through how to use encoding, Sort, and Distribution Keys to streamline query processing.
Overview
Encodings
Encodings, or compression types, are used to reduce the amount of required storage space and the size of data that’s read from storage. This in turn can lead to a reduction in processing time for queries.
Sort Keys
Sort Keys determine the order in which rows in a table are stored. When properly applied, Sort Keys allow large chunks of data to be skipped during query processing. Less data to scan means a shorter processing time, thus improving the query’s performance.
Distribution Keys
Distribution, or DIST Keys determine where data is stored in Redshift. When data is replicated into your data warehouse, it’s stored across the compute nodes that make up the cluster. If data is heavily skewed - meaning a large amount is placed on a single node - query performance will suffer. Even distribution prevents these bottlenecks by ensuring that nodes equally share the processing load.
Things to Think About
Consider the following before diving in:
- Optimizing for every single query isn’t possible. We suggest selecting the most important queries and selecting Sort/Dist Keys that will improve the performance of those queries.
- Columns with few unique values aren’t good Sort Keys. Because Sort Keys store records together based on similar values, selecting a column with few unique values as the Sort key will heavily skew the data. This will lead to an increase in query processing time.
- Tables using Full Table Replication aren’t good candidates for this process Due to the nature of Full Table Replication, encodings, Sort, and Dist Keys in these tables may be overwritten during the replication attempts that follow application.
Applying Encodings & Keys
We’ll use a table called orders
, which is contained in the rep_sales
schema.
Log into your Redshift database using your SQL tool to get started.
Retrieving the Table Schema
Use this command to retrieve the table schema for orders:
This command will produce the table schema. In our case, the result looks like this:
Creating a Table Copy & Redefining the Schema
In this step, you’ll create a copy of the table, redefine its structure to include the DIST
and SORT
Keys, insert/rename the table, and then drop the “old” table.
But first, retrieve the table’s Primary Key using the following query:
This will be used in the next step to indicate which column(s) are the table’s Primary Keys.
COMMENT
being used to note the table’s Primary Key. Make sure you include the Primary Key comment in the next step, as missing Primary Keys will cause issues with data replication.Here’s the transaction we’d use on the rep_sales.orders
table:
Verifying the Table Owner
When you perform this process yourself, make sure that the Stitch Redshift user retains ownership of the table.
If Stitch isn’t the owner of the table, issues with data replication will arise.
Verifying the Encoding & Key Application
To verify that thechanges were applied correctly, retrieve the table’s schema again using this command:
In this example, if the Keys and encodings were applied correctly, the response would look something like this:
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, please reach out to us.