create table events as
select '2024-06-10'::date as d
UNION ALL
select '2024-08-20'::date as d
UNION ALL
select '2024-08-22'::date as d
UNION ALL
select '2024-09-17'::date as d
UNION ALL
select '2024-09-19'::date as d
UNION ALL
select '2024-11-01'::date as d
UNION ALL
select '2024-12-11'::date as d
UNION ALL
select '2024-12-21'::date as d
Qualifying Events Using Recursive CTEs
Erika Pullum recently shared a great SQL brainteaser. Given a list of dates, the first date qualifies as TRUE and all subsequent dates qualify if it’s been more than 90 days since the last one. This requires a recursive CTE to solve because the 90 day gaps depend on the first row in the dataset.
The final solution is at the end of the post. Along the way I’ve learned quite a few bits:
- Any aggregate function will attempt to return a single row;
- You can limit the output of recursive CTEs in MySQL by placing the LIMIT clause inside the recursive CTE. You can also limit the output in DuckDB by placing the LIMIT at the end of the final query;
Creating the dataset
Bad attempt - doing a plain group by
My first attempt looked like this - as you can see, it identifies the rows correctly but then starts printing NULLs. The code would print NULLs indefinitely because aggregate functions such as max()
or min()
always attempt to return a single row. When you run aggregate functions on an empty set, it will return a single row with the value NULL. Recursive CTEs stop only when a loop returns zero rows. So the query would keep running indefinitely - The LIMIT 10 clause ensures it stops running after 10 rows.
with recursive recursive_cte as (
select
min(events.d) as d
from events
union all
SELECT
min(recursive_events.d) as d
from events recursive_events
inner join (select max(d) as d from recursive_cte) latest_date
on recursive_events.d > latest_date.d + 90
)select *
from recursive_cte
LIMIT 10
d |
---|
2024-06-10 |
2024-09-17 |
2024-12-21 |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
OK attempt - ditching aggregate functions
My second attempt worked but it results in an additional subquery inside of a CTEs. Right after writing this I understood that debugging such a query would be very difficult!
with recursive recursive_cte as (
select
min(events.d) as d
from events
union all
select d
from
(SELECT
as d,
recursive_events.d row_number() over (order by recursive_events.d) as rn
from events recursive_events
inner join (select max(d) as d from recursive_cte) latest_date
on recursive_events.d > latest_date.d + 90
) ranked_dateswhere rn = 1
)select *
from recursive_cte
LIMIT 10
d |
---|
2024-06-10 |
2024-09-17 |
2024-12-21 |
Good attempt - aggregates using HAVING
As I was brushing my teeth, I remembered about the HAVING clause! I don’t think I’ve ever used HAVING to filter for non-NULL values. Here’s the final solution:
with recursive recursive_cte as (
select
min(events.d) as d
from events
union all
SELECT
min(recursive_events.d) as d
from events recursive_events
inner join (select max(d) as d from recursive_cte) latest_date
on recursive_events.d > latest_date.d + 90
having min(recursive_events.d) is not null
)select
events.d,
case when recursive_cte.d is not null then true else false end as is_after_cooldown
from events
left join recursive_cte
on events.d = recursive_cte.d
order by events.d
d | is_after_cooldown |
---|---|
2024-06-10 | TRUE |
2024-08-20 | FALSE |
2024-08-22 | FALSE |
2024-09-17 | TRUE |
2024-09-19 | FALSE |
2024-11-01 | FALSE |
2024-12-11 | FALSE |
2024-12-21 | TRUE |