select *
from sport_interests
limit 5| Name | Activity | Number |
|---|---|---|
| Charlie | Dancing | 65.35 |
| Charlie | Basketball | 38.10 |
| Jack | Basketball | 36.19 |
| Bob | Hiking | 53.84 |
| Frank | Running | 74.29 |
At this point it’s likely you’re using a database that supports pivoting and unpivoting but it’s good to know how to do it yourself.
The most basic way to pivot is to use a CASE statement for each column you want to pivot.
select
ts_id,
sum(case when ts_year = 2020 then ts_value end) as ts_2020,
sum(case when ts_year = 2021 then ts_value end) as ts_2021
from yearly_values_long
group by ts_idWhy I like pivoting in SQL is that I can create arbitrary case when statements to control how my data is pivoted:
select
ts_id,
sum(case when ts_year = 2020 and ts_value > 0.5 then ts_value end) as ts_2020,
sum(case when ts_year = 2021 and ts_value > 0.5 then ts_value end) as ts_2021
from yearly_values_long
group by ts_idI wish you don’t ever need to unpivot manually in SQL. A universal way to unpivot in SQL is to take each column of interest and do a UNION ALL.
select
ts_id,
'2020' as ts_year,
[2020] as ts_value
from yearly_values_wide
union all
select
ts_id,
'2021' as ts_year,
[2021] as ts_value
from yearly_values_wide
https://sqlperformance.com/2019/09/t-sql-queries/t-sql-pitfalls-pivoting-unpivoting
Pivoting refers to a transformation where you take two columns: one containing values that will represent the names of your columns and another containing the values that you want those columns to contain. In a sense, we’re making our data wider (dplyr’s R refers to pivoting and unpivoting as pivot_wider and pivot_longer, respectively). Accordingly, unpivoting collects your column names into one column and the contents of those columns into another.
Chances are that your tooling allows you to perform these operations. Unless, of course, you are an unlucky fellow or you prefer to roll your own code.
The data we’re using looks like this:
select *
from sport_interests
limit 5| Name | Activity | Number |
|---|---|---|
| Charlie | Dancing | 65.35 |
| Charlie | Basketball | 38.10 |
| Jack | Basketball | 36.19 |
| Bob | Hiking | 53.84 |
| Frank | Running | 74.29 |
Pivoting in SQL looks like this:
select
name,
sum(case when Activity = 'Running' then Number end) as running,
sum(case when Activity = 'Tennis' then Number end) as tennis
from sport_interests
group by name| Name | running | tennis |
|---|---|---|
| David | NA | NA |
| Jack | NA | NA |
| Isabel | NA | 60.97 |
| Alice | NA | NA |
| Bob | NA | NA |
| Emma | 29.54 | 86.50 |
| Henry | 96.27 | NA |
| Charlie | NA | NA |
| Frank | 101.59 | NA |
| Grace | NA | NA |
If you need to pivot into a lot of columns, you are going to have a bad time. First, you need to specify each column that is not being pivoted into the select statement as well as the group by clause. Second, you need to manually type out each column that appears in the pivot.
Nonetheless, I love pivoting in SQL because you can individually define how each column is pivoted. Maybe you need to average one column, but sum another? Maybe you want to add two conditional statements when you are pivoting Activity = 'Running'? I’ve done my fair share of intricate SQL pivots that would have taken me much more time in other languages.
Unpivoting is a little tricky in SQL. Essentially, we’ll be creating multiple select statements for each column we would like to unpivot, with each select statement containing a different column:
select
name,
'Running' as Activity,
running as Number
from sport_interests_pivot
UNION ALL
select
name,
'Tennis' as Activity,
tennis as Number
from sport_interests_pivot| Name | Activity | Number |
|---|---|---|
| Charlie | Running | NA |
| Jack | Running | NA |
| Henry | Running | 96.27 |
| David | Running | NA |
| Frank | Running | 101.59 |
| Grace | Running | NA |
| Bob | Running | NA |
| Emma | Running | 29.54 |
| Isabel | Running | NA |
| Alice | Running | NA |
I don’t love unpivoting in SQL. When pivoting, a single line of code translates into a column, whereas a single column translates into a whole separate SQL statement. Please use some framework or an engine that supports unpivoting through a function (e.g. Snowflake’s UNPIVOT or dbt’s unpivot macro).