Bridge table to manage customer multi interests
3 posters
Page 1 of 1
Bridge table to manage customer multi interests
Hi all,
this is my first post on the Kimball forum, so please be lenient
My question is on a particular use of a bridge table.
In my case, I have an operational system, that has data about customer and registration process. When a new account is created, the customer
fill a multi-value form of interests (something like: shopping on line, travels, book, music, games, cinema etc.).
Then at the end of the registration process, each registered user, has a set of interests with 1 or more interest.
This data is recorded into the DB with a single value that identifies the set itself. The technique used is that of bitmask. To be more clear, this is a portion of a typical row in the operational DB:
in this row, mask_interests=14 mean that the interests selected by the customer are the 2nd, 3rd and 4th interest in the form, indeed the bit conversion of 14 is 00000000000000001110 (we can have 20 distinct value), 1 mean that the interest is "ON".
So the question is: what is the best design solution in this case, if the fact table must contain for each indicator, the calculation for each interest and be able to answer questions like:
The grain is: one row for each registered user.
I figured a scenario with:
this table is updated every time, a new group appear in the operational system.
In this scenario the ETL, use the bridge table for decoding the mask and load the fact table with the most atomic data.
So, there are other way to model this kind of data?
Thanks in advance,
Davide
this is my first post on the Kimball forum, so please be lenient
My question is on a particular use of a bridge table.
In my case, I have an operational system, that has data about customer and registration process. When a new account is created, the customer
fill a multi-value form of interests (something like: shopping on line, travels, book, music, games, cinema etc.).
Then at the end of the registration process, each registered user, has a set of interests with 1 or more interest.
This data is recorded into the DB with a single value that identifies the set itself. The technique used is that of bitmask. To be more clear, this is a portion of a typical row in the operational DB:
- Code:
account_id date mask_interests
------------------------------------------
7236712534 20140508 14
in this row, mask_interests=14 mean that the interests selected by the customer are the 2nd, 3rd and 4th interest in the form, indeed the bit conversion of 14 is 00000000000000001110 (we can have 20 distinct value), 1 mean that the interest is "ON".
So the question is: what is the best design solution in this case, if the fact table must contain for each indicator, the calculation for each interest and be able to answer questions like:
- How many Registered Users have interest "Books"?
- How many Registered Users have interest "Travels", "Games"?
The grain is: one row for each registered user.
I figured a scenario with:
- a bridge table, with a group interest (grp_interests) which corresponds to the mask_interests and for each of these, the "base" interest:
- Code:
grp_interests interest_id
----------------------------
14 2
14 4
14 8
... ...
this table is updated every time, a new group appear in the operational system.
- a dimension table of "base" interests:
- Code:
interest_id description
--------------------------------
1 shopping on line
2 travels
4 book
8 games
16 music
... ...
- a fact table with a fk to interest_id:
- Code:
date_id interest_id registered_users
--------------------------------------------
20140508 2 300
20140508 4 300
20140508 8 300
In this scenario the ETL, use the bridge table for decoding the mask and load the fact table with the most atomic data.
So, there are other way to model this kind of data?
Thanks in advance,
Davide
mezzovento- Posts : 2
Join date : 2014-05-12
Re: Bridge table to manage customer multi interests
Hope you can change the grain of the fact table by capturing the date, customer_id, individual interest
Handling bridge table is compilcated and also your ETL will be abe to split the rows and put the data into fact table for the multiple interest pertain to a customer.
Anyhow let us wait for our experts comments.
Handling bridge table is compilcated and also your ETL will be abe to split the rows and put the data into fact table for the multiple interest pertain to a customer.
Anyhow let us wait for our experts comments.
manickam- Posts : 27
Join date : 2013-04-26
Re: Bridge table to manage customer multi interests
The most flexible approach is to have an Interests Dim with 1 record per Interest and then a Interest Group Bridge table that links the fact with the Interests. This allows any number of interests to be associated to a fact and the model won't break if in the future more than 20 interests are permissible.
However, while this model easily answers questions about single interests it is less easy to answer questions about multiple interests ("who has Travel and Games") and makes it even more difficult to answer questions about lack of interest ("who has Travel and Games but not Books"). If you want to be able to answer these sorts of queries then I'd do the following:
Create an Interest Group Dimension with a column for each interest holding Y/N or True/False values - and relate this directly to the fact. You can then easily answer your queries by filtering on the relevant columns rather than having to run cross-record queries.
This is quite a rigid design but I am suggesting it as a workable solution for 2 reasons:
1. If I have understood you correctly, the source system is already restricted to 20 interests so your business must have accepted this as a restriction already? Presumably any change to this would require source system development and so you wouldn't suddenly get 21 interests coming into the data warehouse which would break your design - if the source system design changed you would change your design in parallel
2. You only have 20 interests. If my maths is correct this is 2^20 = approx 1M records which is not a lot for a dimension and you could even create all these records up front, rather than wait for an example to come through in a transaction, which would speed up your regular ETL. Extending this model by, say, another 10 interest fields would probably still be supportable - so as long as you're reasonably confident that your business is never going to significantly increase the number of interests someone can have then this design is reasonably future-proof.
Also bear in mind that there is nothing to stop you having a fact table related to this Interest Group Dim to answer one type of question and another fact table associated with a single Interest Dim to answer another type of question.
Hope this helps a bit?
However, while this model easily answers questions about single interests it is less easy to answer questions about multiple interests ("who has Travel and Games") and makes it even more difficult to answer questions about lack of interest ("who has Travel and Games but not Books"). If you want to be able to answer these sorts of queries then I'd do the following:
Create an Interest Group Dimension with a column for each interest holding Y/N or True/False values - and relate this directly to the fact. You can then easily answer your queries by filtering on the relevant columns rather than having to run cross-record queries.
This is quite a rigid design but I am suggesting it as a workable solution for 2 reasons:
1. If I have understood you correctly, the source system is already restricted to 20 interests so your business must have accepted this as a restriction already? Presumably any change to this would require source system development and so you wouldn't suddenly get 21 interests coming into the data warehouse which would break your design - if the source system design changed you would change your design in parallel
2. You only have 20 interests. If my maths is correct this is 2^20 = approx 1M records which is not a lot for a dimension and you could even create all these records up front, rather than wait for an example to come through in a transaction, which would speed up your regular ETL. Extending this model by, say, another 10 interest fields would probably still be supportable - so as long as you're reasonably confident that your business is never going to significantly increase the number of interests someone can have then this design is reasonably future-proof.
Also bear in mind that there is nothing to stop you having a fact table related to this Interest Group Dim to answer one type of question and another fact table associated with a single Interest Dim to answer another type of question.
Hope this helps a bit?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Bridge table to manage customer multi interests
Hi, thanks for all answers.
Nick: yes the interests are 20 and are unchanged from years.
I try your suggestion with the Interest Group Dimension table like:
and a fact table like:
linked via grp_interests to the dim table.
I can do query like:
With my previous approach I did something like:
Your solution seems to work better.
Thanks again
Nick: yes the interests are 20 and are unchanged from years.
I try your suggestion with the Interest Group Dimension table like:
- Code:
CREATE TABLE k_br_interests
(
grp_interests numeric(10,0) NOT NULL,
int_0 boolean NOT NULL,
int_1 boolean NOT NULL,
int_2 boolean NOT NULL,
int_3 boolean NOT NULL,
int_4 boolean NOT NULL,
int_5 boolean NOT NULL,
...
)
and a fact table like:
- Code:
CREATE TABLE ft_cb
(
date_id numeric(8,0),
grp_interests numeric(10,0),
reg_users numeric(20,0)
)
linked via grp_interests to the dim table.
I can do query like:
- Code:
SELECT
a.date_id,
SUM(a.reg_users)
FROM ft_cb a
INNER JOIN k_br_interests b USING(grp_interests)
WHERE b.int_5 IS TRUE
AND b.int_9 IS TRUE
GROUP BY a.date_id
ORDER BY 1,2;
With my previous approach I did something like:
- Code:
SELECT
a.date_id,
b.interests_id,
SUM(a.reg_users) AS reg_users
FROM ft_cb a
INNER JOIN br_interests b USING(grp_interests)
GROUP BY
b.interests_id,
a.date_id
ORDER BY 1;
Your solution seems to work better.
Thanks again
mezzovento- Posts : 2
Join date : 2014-05-12
Similar topics
» Bridge Table and Customer Hierarchy
» Bridge table - two customer-related dimensions
» bridge table and junk dimension on customer dimension (bank/credit union)
» How to manage nulls in Fact talble
» How to resolve a many to many relationship? Bridge Table or Utilize the same fact table?
» Bridge table - two customer-related dimensions
» bridge table and junk dimension on customer dimension (bank/credit union)
» How to manage nulls in Fact talble
» How to resolve a many to many relationship? Bridge Table or Utilize the same fact table?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum