Show your work: Writing complex SQL queries
This week I had to write this monster of an SQL query and I wanted to share some tips that I found useful.
Tip 1: Start with a docstring
My first tip is to start complex SQL queries with a docstring explaining the what and the why of the query.
SQL is code and it should be treated as such. Especially for larger queries like these that can be complex to understand if you see them for the first time, start with a docstring!
/* Create-repopulation-for-default-difficulties.sql Creates the table that is needed for continuous calibration (CC). We create this table because we want to revert all previously continuously calibrated difficulties from ... to ... Revert all events that: 1. Had no event before that date, but had an event after 2. Had a default difficulty event before that date and an event after */
Tip 2: Use CTEs liberally
The second tip is to use Common Table Expressions (CTEs) liberally.
Complex things are just many simple things on top of another. This also holds true in SQL queries, a complex SQL query can often be broken down into multiple easier queries.
CTEs are great for this because CTEs (or as I like to call them “subtables”) are temporary tables that you can use to get to your final query output.
In this big query I use three CTEs that all work towards the final output. The first CTE grabs all the calibrated exercises that we need to look at
with calibrated_exercises as ( ... ),
The second CTE considers all the events in our calibration warehouse prior to a certain date
previous_events as ( ... ),
Then the final CTE has all the events after
after_events as ( ... )
And then we use these CTEs to get to the final result by joining and anti-joining them in whatever way we require.
Tip 3: Use sqlfmt
My third and final tip is to use an autoformatter for your SQL code.
As mentioned before, SQL is code and should be treated as such. Treat your SQL like your Python and give it the attention and care it deserves.
I don’t always like how Python Black looks but consistency and other benefits far outweigh the cons (for me, at least).
A formatter I really like is sqlfmt. It prioritizes vertical space over horizontal space and was originally made for dbt, but I’ve grown to really like the style.
It’s the same with Black, first you hate it, then you love it. Give it some time.
Three tips for better SQL:
- Start with a docstring
- Use CTE’s liberally
- Use sqlfmt