Categories
Mastering Development

How to find combination of intersection from many tables?

I have a list of different channels that could potentially bring users to a website (organic, SEO, online marketing, etc.). I would like to find an efficient way to count daily active user that comes from the combination of these channels. Each channel has its own table and track its respective users.

The tables looks like the following,

channel A
date         user_id
2020-08-01   A
2020-08-01   B
2020-08-01   C

channel B
date         user_id
2020-08-01   C
2020-08-01   D
2020-08-01   G

channel C
date         user_id
2020-08-01   A
2020-08-01   C
2020-08-01   F

I want to know the following combinations

  1. Only visit channel A
  2. Only visit channel A & B
  3. Only visit channel B & C
  4. Only visit channel B
  5. etc.

However, when there are a lot of channels (I have around 8 channels) the combination is a lot. What I’ve done roughly is as simple as this (this one includes channel A)

SELECT 
    a.date, 
    COUNT(DISTINCT IF(b.user_id IS NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a,
    COUNT(DISTINCT IF(b.user_id IS NOT NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a_b,
    ...
FROM a LEFT JOIN b ON a.user_id = b.user_id AND a.date = b.date 
LEFT JOIN c ON a.user_id = c.user_id AND a.date = c.date
GROUP BY 1

but extremely tedious when the total channels is 8 (28 variations for 2 combinations, 56 for 3, 70 for 4, and many more).

Any smart ideas to solve this? I was thinking to use FULL OUTER JOIN but can’t seem to get the grasp out of it. Answers really appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *