Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Bridge table to manage customer multi interests

3 posters

Go down

Bridge table to manage customer multi interests Empty Bridge table to manage customer multi interests

Post  mezzovento Fri May 16, 2014 4:25 am

Hi all,
this is my first post on the Kimball forum, so please be lenient

My question is on a particular use of a bridge table.

In my case, I have an operational system, that has data about customer and registration process. When a new account is created, the customer
fill a multi-value form of interests (something like: shopping on line, travels, book, music, games, cinema etc.).
Then at the end of the registration process, each registered user, has a set of interests with 1 or more interest.
This data is recorded into the DB with a single value that identifies the set itself. The technique used is that of bitmask. To be more clear, this is a portion of a typical row in the operational DB:
Code:
account_id        date      mask_interests
------------------------------------------
7236712534    20140508                  14

in this row, mask_interests=14 mean that the interests selected by the customer are the 2nd, 3rd and 4th interest in the form, indeed the bit conversion of 14 is 00000000000000001110 (we can have 20 distinct value), 1 mean that the interest is "ON".

So the question is: what is the best design solution in this case, if the fact table must contain for each indicator, the calculation for each interest and be able to answer questions like:

  • How many Registered Users have interest "Books"?
  • How many Registered Users have interest "Travels", "Games"?


The grain is: one row for each registered user.
I figured a scenario with:


  • a bridge table, with a group interest (grp_interests) which corresponds to the mask_interests and for each of these, the "base" interest:

Code:
grp_interests    interest_id
----------------------------
           14              2
           14              4
           14              8
          ...            ...

this table is updated every time, a new group appear in the operational system.


  • a dimension table of "base" interests:

Code:
interest_id     description
--------------------------------
    1           shopping on line
    2           travels
    4           book
    8           games
   16           music
   ...          ...


  • a fact table with a fk to interest_id:

Code:
date_id   interest_id       registered_users
--------------------------------------------
20140508            2                    300
20140508            4                    300
20140508            8                    300


In this scenario the ETL, use the bridge table for decoding the mask and load the fact table with the most atomic data.
So, there are other way to model this kind of data?

Thanks in advance,
Davide

mezzovento

Posts : 2
Join date : 2014-05-12

Back to top Go down

Bridge table to manage customer multi interests Empty Re: Bridge table to manage customer multi interests

Post  manickam Fri May 16, 2014 8:23 am

Hope you can change the grain of the fact table by capturing the date, customer_id, individual interest

Handling bridge table is compilcated and also your ETL will be abe to split the rows and put the data into fact table for the multiple interest pertain to a customer.

Anyhow let us wait for our experts comments.

manickam

Posts : 27
Join date : 2013-04-26

Back to top Go down

Bridge table to manage customer multi interests Empty Re: Bridge table to manage customer multi interests

Post  nick_white Fri May 16, 2014 9:19 am

The most flexible approach is to have an Interests Dim with 1 record per Interest and then a Interest Group Bridge table that links the fact with the Interests. This allows any number of interests to be associated to a fact and the model won't break if in the future more than 20 interests are permissible.
However, while this model easily answers questions about single interests it is less easy to answer questions about multiple interests ("who has Travel and Games") and makes it even more difficult to answer questions about lack of interest ("who has Travel and Games but not Books"). If you want to be able to answer these sorts of queries then I'd do the following:

Create an Interest Group Dimension with a column for each interest holding Y/N or True/False values - and relate this directly to the fact. You can then easily answer your queries by filtering on the relevant columns rather than having to run cross-record queries.
This is quite a rigid design but I am suggesting it as a workable solution for 2 reasons:
1. If I have understood you correctly, the source system is already restricted to 20 interests so your business must have accepted this as a restriction already? Presumably any change to this would require source system development and so you wouldn't suddenly get 21 interests coming into the data warehouse which would break your design - if the source system design changed you would change your design in parallel
2. You only have 20 interests. If my maths is correct this is 2^20 = approx 1M records which is not a lot for a dimension and you could even create all these records up front, rather than wait for an example to come through in a transaction, which would speed up your regular ETL. Extending this model by, say, another 10 interest fields would probably still be supportable - so as long as you're reasonably confident that your business is never going to significantly increase the number of interests someone can have then this design is reasonably future-proof.

Also bear in mind that there is nothing to stop you having a fact table related to this Interest Group Dim to answer one type of question and another fact table associated with a single Interest Dim to answer another type of question.

Hope this helps a bit?


nick_white

Posts : 364
Join date : 2014-01-06
Location : London

Back to top Go down

Bridge table to manage customer multi interests Empty Re: Bridge table to manage customer multi interests

Post  mezzovento Fri May 16, 2014 11:13 am

Hi, thanks for all answers.
Nick: yes the interests are 20 and are unchanged from years.
I try your suggestion with the Interest Group Dimension table like:


Code:
CREATE TABLE k_br_interests
(
  grp_interests numeric(10,0) NOT NULL,
  int_0 boolean NOT NULL,
  int_1 boolean NOT NULL,
  int_2 boolean NOT NULL,
  int_3 boolean NOT NULL,
  int_4 boolean NOT NULL,
  int_5 boolean NOT NULL,
...
)



and a fact table like:


Code:
CREATE TABLE ft_cb
(
  date_id numeric(8,0),
  grp_interests numeric(10,0),
  reg_users numeric(20,0)
)



linked via grp_interests to the dim table.

I can do query like:
Code:
SELECT
  a.date_id,
  SUM(a.reg_users)
FROM ft_cb a
INNER JOIN k_br_interests b USING(grp_interests)
WHERE b.int_5 IS TRUE
AND b.int_9 IS TRUE
GROUP BY a.date_id
ORDER BY 1,2;

With my previous approach I did something like:
 
Code:
SELECT
  a.date_id,
  b.interests_id,
  SUM(a.reg_users) AS reg_users
FROM ft_cb a
INNER JOIN br_interests b USING(grp_interests)
GROUP BY
  b.interests_id,
  a.date_id
ORDER BY 1;

Your solution seems to work better.
Thanks again

mezzovento

Posts : 2
Join date : 2014-05-12

Back to top Go down

Bridge table to manage customer multi interests Empty Re: Bridge table to manage customer multi interests

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum