GLC Grid: Telling The Story with CTE's

I can divide the history of my knowledge of SQL in two ages, before and after Common Table Expressions (CTE's). I shake my head in disbelief when I think of all the times I could have used CTE's and used subqueries instead. To think that many years ago I was in a conversation and I said "I don't like subqueries" but didn't know there was a better way. I was still new with SQL and I found a subquery like a hole to another dimension. But the skill of SQL seemed to be measured in the ability to manage multiple nested subqueries so I had to learn the hard way. Having the mind of a developer, I am more interested in being able to make sense of what the query is doing based on its readability.


SELECT TranDate, sum(Amount)
  FROM Sales s
  join (SELECT ID FROM Customers where State in 
           (SELECT State from Regions
             where AvgTemp > 70)
        ) f
   on s.ID = f.ID

What does this do?

I can understand why people might not see much difference between a subquery and a CTE. Aren't there two ways of doing the same? Yes and no. A CTE is defined by RedGate software as "a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement". To be fair, both approaches are supposedly useful for different purposes, but I still prefer CTEs. Why?

Encapsulation

You can tell me that a subquery is encapsulated and it's correct, although you have to give it an alias and it has to be short, such as "x", unless you want to see long references. With a CTE you can make more declarative. It makes your query easier to read. Not to mention that the CTE is separated from the block of the main query, and you abstract it as if it were a table.

CTE's Tell the Story

Do you still put a lot of comments in your SQL code? Why not let the code structure be self-explanatory? when you can write a function name that describes just enough? In SQL, CTE's act like functions. You can get super descriptive with the name, and when you call the CTE, you can alias it.

A CTE is more readable than a subquery. If I open up an existing view and it has CTE's, then kudos to you. But then there is something else that can make your CTE's go to the next level. Let me give you an example.


WITH CustomersInWarmRegions as
  (SELECT c.ID FROM Customers c
    join Regions r on c.State = r.State
    where r.AvgTemp > 70
   )
SELECT TranDate, sum(Amount)
  FROM Sales s
  join CustomersInWarmRegions c
   on s.ID = c.ID

This tells me a little more.

Naming matters

Supposed that in your new job you need to alter a view for a report because the source table has changed names. You open the view to edit it and see that the previous developer was thoughtful and used CTEs. But you're not too happy when you see that the first CTE is called "pyt". What does that stand for? Should anyone care? Yes, if you want to know what it does. Since there are no comments, you do some analysis. The view is a Join between 5 tables, with a calculated date as a parameter, based on today's date but for one year prior, and it's all grouped. It's possible for me to deduce that after doing some analysis and making an educated guess as to what is the intent.

Imagine another scenario. You open the view, and the CTE is called PreviousYearTotals. You don't need to do analysis and guesswork because someone had the good sense of naming the CTE. The long name is not a problem because you can alias it to a short name when you join to it.

I haven't gotten into the topic of performance, although an advantage with a CTE is that, unlike a subquery, it only makes one trip to get the data. A subquery has to call data each time it repeats.

With the flexibility of CTE's, you can do something that can't be done with subqueries: a more self-explanatory structure.

I hope this was a way to persuade you to use CTEs.

GLC Grid

Monday, July 09, 2018

Telling The Story with CTE's

Encapsulation

CTE's Tell the Story

Naming matters

No comments: