7 Factless Fact Table

8 Factless Fact Tables

If I wanted to show the number of subscriptions that were active for each month in my favourite BI tool, I would be in trouble. In Power BI, I would have to resort to a multi-line DAX formula. In Tableau, the options have 8 steps in them. I couldn’t even find a tutorial for Looker. Luckily, Factless Fact Tables sidestep these solutions altogether so that your BI tool only has to do a simple count.

Kimball defines Factless Fact Tables as a table of dimensional entities coming together at a moment in time. Now that’s one eldritch definition! In human terms, the raison d’être of these tables comes from the need to count whether a thing existed at a point in time. It’s best to show the technique using an example.

Let’s say I have a table of subscriptions that looks like this:

library(duckdb)

Warning: package 'duckdb' was built under R version 4.4.3

Loading required package: DBI

con <- dbConnect(duckdb::duckdb(), ":memory:")

dbExecute(con, "CREATE TABLE subscriptions (id INTEGER, start_date DATE, end_date DATE);")

[1] 0

Whenever we want to count something, it’s best to count it using a fact table. However, subscriptions are a SCD2 type dimension and you want to count whether they existed between the start_date and end_date. Counting becomes trivial if the date range is expanded into their dedicated rows. For example, if I were building a monthly fact table, the first id would be 6 rows and my second id would be 4 rows because they were active subscriptions during those months. In order to build it, it’s best to have a calendar table so that we could perform a cross join using a BETWEEN statement.

8.1 SQL code

8.2 How is this different from snapshot fact tables?

You may have heard about snapshot fact tables - they are VERY similar to factless fact tables. In fact, if you saved snapshots of active subscriptions for each month, you’d end up with the same table! The different comes down to implementation: snapshots are run on a regular basis to create the final dataset, a factless fact table can be recreated from scratch.

Kimball defines Factless Fact Tables as a table of dimensional entities coming together at a moment in time. Now that’s one eldtrich definition! In human terms, the raison d’être of these tables comes from the need to count whether a thing existed at a point in time. And if that definition is too vague, here’s an exercise - try showing how many subscribers you had on a monthly basis. For example, let’s say I had 10 subscriptions that were each active between January and December of last year, how would you show those 10 subscriptions for each month of the year? In Power BI, you need to resort to a multi-line DAX formula. In Tableau, the options have 8 steps in them. Factless fact tables sidestep these solution altogether so that all you need is a simple count.

Factless fact tables can be built by combining your SCD2 type table (i.e. data that has an end date and a start date) with a calendar table (a date dimension). Here’s how to do it in SQL:

in your favourite BI tool

Factless fact tables solve this problem by showing whether a thing existed for a given time point. For example, if we were to build a factless table on a monthly granularity, a Netflix subscriber whose subscription starts at 2025-01-01 and ends at 2025-12-31 would show up as 12 rows, one for each of the months in a year. As a result, if you were to put your months on an x-axis, you could just do a count of rows from this table along the y axis. And it’s not just subscribers - it works on any table that has a start date and an end date¹. Factless fact tables allow you to keep the modeling and KPI layer relatively simple - learning to build and use them is extremely helpful.

8.3 Sample SQL

9 Creating a Factless Fact Table

Note

Hey Paulius, can you quickly create this table that shows our subscriber count over time?

If you’re working with SCD Type 2 data (e.g. subscriptions, marketing campaigns), it’s straightforward to count the number of rows at a single point in time:


select count(*)
from subscriptions
where '2024-08-01' between date_from and date_to

But it’s not if you’d like to count it over multiple points in time, i.e. how do you group by each month of a year and show how many subscriptions were active for a given month? The problem can be solved two ways!

9.1 Without using a calendar table

with d as (
      select validfrom as dte, 1 as inc
      from t
      union all
      select validto, -1
      from t
     )
select dte, sum(sum(inc)) over (order by dte)
from d
group by dte
order by dte;

9.2 Using a calendar table

Essentially, if it’s a SCD2 type dimension.↩︎