Bridge tables issue
2 posters
Page 1 of 1
Bridge tables issue
Hello.
I am trying to build a bridge table between my sales fact table and Programme dimension in SSAS 2008.
Essentially any one sales transaction can fall under multiple programmes. I have created a table called DimProgramme and a table called DimProgrammeGroup as such:
DimProgramme: ProgrammeKey int PK, ProgrammeName varchar(150)
DimProgrammeGroup: ProgrammeGroupKey int PK, ProgrammeKey int PK, DollarWeighting float, & a unique key that is a varchar.
The Programmegroup would be the intermediate table (?fact or dimension?), and Programme the reference dimension.
The Programmegroup has a compound key in this because of the M2M, so I cannot use ProgGroup as a regular key because of duplicates ( (1, 1), (1, 2), (2, 1), (2, 3) etc).
Am I working this correctly? I have tried creating a reference relationship, but I assume because I cannot link the ProgGrp to the fact with one surrogate that I am unable to achieve this. I have tried to do a M2M, but it tells me that there is no intermediate fact table or dimension. I am stumped, and have googled this for some time but cannot work out how to do this.
I understand that the M2M dimension relationship is at a lower grain than the fact table - that is the point to this. Can anyone help?
Thanks, Nick.
I am trying to build a bridge table between my sales fact table and Programme dimension in SSAS 2008.
Essentially any one sales transaction can fall under multiple programmes. I have created a table called DimProgramme and a table called DimProgrammeGroup as such:
DimProgramme: ProgrammeKey int PK, ProgrammeName varchar(150)
DimProgrammeGroup: ProgrammeGroupKey int PK, ProgrammeKey int PK, DollarWeighting float, & a unique key that is a varchar.
The Programmegroup would be the intermediate table (?fact or dimension?), and Programme the reference dimension.
The Programmegroup has a compound key in this because of the M2M, so I cannot use ProgGroup as a regular key because of duplicates ( (1, 1), (1, 2), (2, 1), (2, 3) etc).
Am I working this correctly? I have tried creating a reference relationship, but I assume because I cannot link the ProgGrp to the fact with one surrogate that I am unable to achieve this. I have tried to do a M2M, but it tells me that there is no intermediate fact table or dimension. I am stumped, and have googled this for some time but cannot work out how to do this.
I understand that the M2M dimension relationship is at a lower grain than the fact table - that is the point to this. Can anyone help?
Thanks, Nick.
nlarge- Posts : 5
Join date : 2011-07-22
Re: Bridge tables issue
There are two ways to handle multi valued dimensions.
The simplest is to not create groups:
Have a bridge table with a unique key (program group), program key, allocation factor and carry the program group key on the sales fact.
This will create a separate group for every row in sales facts. It will be a large table.
If the number of clusters of programs is much smaller than the number of sales facts, the alternate way is to create a group table which will contain all unique group combinations:
program group key, program group members
The program group members is the 'natural' key to this table and contains a list of concatenated program IDs sorted in some manner. This is a reference table to get group ID numbers and is used when loading, not in queries.
The bridge table would be the same as above (group, program, allocation) and the fact could carry the group key. The difference is the bridge will be much smaller as it only contains entries for unique groups as encountered. However, this will only work if the allocations are the same, based on group, and do not vary from sale to sale.
The simplest is to not create groups:
Have a bridge table with a unique key (program group), program key, allocation factor and carry the program group key on the sales fact.
This will create a separate group for every row in sales facts. It will be a large table.
If the number of clusters of programs is much smaller than the number of sales facts, the alternate way is to create a group table which will contain all unique group combinations:
program group key, program group members
The program group members is the 'natural' key to this table and contains a list of concatenated program IDs sorted in some manner. This is a reference table to get group ID numbers and is used when loading, not in queries.
The bridge table would be the same as above (group, program, allocation) and the fact could carry the group key. The difference is the bridge will be much smaller as it only contains entries for unique groups as encountered. However, this will only work if the allocations are the same, based on group, and do not vary from sale to sale.
Thanks ngalemmo, please clarify
Thanks for your reply ngalemmo, I need to understand your answer a little more because I am still trying to envision the design. Please see the notes below:
"There are two ways to handle multi valued dimensions.
The simplest is to not create groups:
Have a bridge table with a unique key (program group), program key, allocation factor and carry the program group key on the sales fact."
So, using this design I get:
FactSales: ...
ProgramGroupKey
...
DimProgramGroup: ProgramGroupKey (surrogate PK int)
ProgramKey
Dim Program: ProgramKey
ProgramName
So the data reads:
Fact Sales | DimProgramGroup
|
PK ProgramGroup Key | PGKey ProgramKey
--- ----------------- | ------ -----------
1 20 | 20 1
2 35 | 35 1
3 40 | 40 2
In the case of the above example, the Program Group Key on the sales fact is 1:1 with the program group dimension. I will not be able to break up the granularity of the link to 1:M using this method if the individual fact row has 1 program group, and the program group has 1 value because the link between program group and program would be 1:1 using this design so the implementation of program group would be pointless. I am still trying to understand how I can use this method to create a 1:M [Fact - Program] via ProgramGroup.
"This will create a separate group for every row in sales facts. It will be a large table."
"If the number of clusters of programs is much smaller than the number of sales facts, the alternate way is to create a group table which will contain all unique group combinations:
program group key, program group members"
"The program group members is the 'natural' key to this table and contains a list of concatenated program IDs sorted in some manner. This is a reference table to get group ID numbers and is used when loading, not in queries.
The bridge table would be the same as above (group, program, allocation) and the fact could carry the group key. The difference is the bridge will be much smaller as it only contains entries for unique groups as encountered. However, this will only work if the allocations are the same, based on group, and do not vary from sale to sale"
So an example would be:
Fact Sales:
Key PGKey
10 20
11 40
Program Group
Key PGMembers
20 "1,2,3"
40 "2,3,8,4"
Program
Key Program Name
1 Program 1
2 Program 2
3 Program 3
4 Program 4
5 Program 5
6 Program 6
7 Program 7
8 Program 8
So that would be the physical table design, but how would that help the design in SSAS? The program group and program link would not operate at all because the program members column is a string value and also contains > 1 program key so could not be used to link the program group to the program.
Again I thank you for your time in answering my questions and (based on my questions) I need to understand your answers more clearly. Could you please supply some links to help, and possibly some quick examples of the 2 solutions that you have provided?
Thank you, Nick.
"There are two ways to handle multi valued dimensions.
The simplest is to not create groups:
Have a bridge table with a unique key (program group), program key, allocation factor and carry the program group key on the sales fact."
So, using this design I get:
FactSales: ...
ProgramGroupKey
...
DimProgramGroup: ProgramGroupKey (surrogate PK int)
ProgramKey
Dim Program: ProgramKey
ProgramName
So the data reads:
Fact Sales | DimProgramGroup
|
PK ProgramGroup Key | PGKey ProgramKey
--- ----------------- | ------ -----------
1 20 | 20 1
2 35 | 35 1
3 40 | 40 2
In the case of the above example, the Program Group Key on the sales fact is 1:1 with the program group dimension. I will not be able to break up the granularity of the link to 1:M using this method if the individual fact row has 1 program group, and the program group has 1 value because the link between program group and program would be 1:1 using this design so the implementation of program group would be pointless. I am still trying to understand how I can use this method to create a 1:M [Fact - Program] via ProgramGroup.
"This will create a separate group for every row in sales facts. It will be a large table."
"If the number of clusters of programs is much smaller than the number of sales facts, the alternate way is to create a group table which will contain all unique group combinations:
program group key, program group members"
"The program group members is the 'natural' key to this table and contains a list of concatenated program IDs sorted in some manner. This is a reference table to get group ID numbers and is used when loading, not in queries.
The bridge table would be the same as above (group, program, allocation) and the fact could carry the group key. The difference is the bridge will be much smaller as it only contains entries for unique groups as encountered. However, this will only work if the allocations are the same, based on group, and do not vary from sale to sale"
So an example would be:
Fact Sales:
Key PGKey
10 20
11 40
Program Group
Key PGMembers
20 "1,2,3"
40 "2,3,8,4"
Program
Key Program Name
1 Program 1
2 Program 2
3 Program 3
4 Program 4
5 Program 5
6 Program 6
7 Program 7
8 Program 8
So that would be the physical table design, but how would that help the design in SSAS? The program group and program link would not operate at all because the program members column is a string value and also contains > 1 program key so could not be used to link the program group to the program.
Again I thank you for your time in answering my questions and (based on my questions) I need to understand your answers more clearly. Could you please supply some links to help, and possibly some quick examples of the 2 solutions that you have provided?
Thank you, Nick.
nlarge- Posts : 5
Join date : 2011-07-22
Re: Bridge tables issue
My reply "...unique key (program group),..." was misleading. I meant unique to the particular sale row, not a primary key. The primary key to the bridge table (DimProgramGroup) would be the program group key and the program key.
For the alternative, there is still a bridge table. So using your sample data:
Fact Sales:
Key PGKey
10 20
11 40
Program Group
Key PGMembers
20 "1,2,3"
40 "2,3,8,4"
Program
Key Program Name
1 Program 1
2 Program 2
3 Program 3
4 Program 4
5 Program 5
6 Program 6
7 Program 7
8 Program 8
There would be a fourth bridge table that would be used in queries:
ProgramGroupBridge
PGKey ProgramKey Alloc
20 1 .333
20 2 .333
20 3 .334
40 2 .250
40 3 .250
40 4 .250
40 8 .250
You join the fact through the bridge to the Program dimension.
For the alternative, there is still a bridge table. So using your sample data:
Fact Sales:
Key PGKey
10 20
11 40
Program Group
Key PGMembers
20 "1,2,3"
40 "2,3,8,4"
Program
Key Program Name
1 Program 1
2 Program 2
3 Program 3
4 Program 4
5 Program 5
6 Program 6
7 Program 7
8 Program 8
There would be a fourth bridge table that would be used in queries:
ProgramGroupBridge
PGKey ProgramKey Alloc
20 1 .333
20 2 .333
20 3 .334
40 2 .250
40 3 .250
40 4 .250
40 8 .250
You join the fact through the bridge to the Program dimension.
Thanks
Thanks!
However, after reviewing the business requirements and looking at the design I noticed that I could avoid having a bridge dimension after all. Here is what I did.
I kept the bridge table:
"PG Key" (now incremental) | "Fact Compound Key 1" | "Fact Compound Key 2" | "Program Key"
1 | 29000 | 5685438 | 1
2 | 29000 | 5685438 | 3
3 | 29000 | 5685439 | 1
... and so on. (NB. The Fact Compound Keys are actually the keys from the source transaction table that I included in each row for tracking purposes. They are large because we load about 7M rows a week). The PG Key is actually just a surrogate that is used for indexing the compound keys and program key.
I modified the DSV in SSAS:
* Modified the Fact table query (which I will actually modify the view that feeds the fact table in the cube), such that the source data is joined to the bridge table. This then duplicates each sales transaction at the granularity of the program, and included Program Key in the columns at the fact level so that I can join the Program dimension directly to it.
* Joined the Program dimension to it.
* Processed the dimensions and cube.
It worked!
OK, so I know what your thinking ... "but what about the All Members member?".
In escence, if I take the sales gross totals for 1 day for each of the 3 programs as such:
Program 1 $25,000,000
Program 2 $100,000,000
Program 3 $ 70,000,000
The total will be $195M, right? Correct. However, programs do share full rights to a sale, so essentially, if Program 1 and 2 both are active on a sale, eg. $25, then the $25 appears in both, which means that there is double counting because 100% of the sale is allocated out to both programs (200% of a sale).
Although this seems weird, the client understands that and it is what is expected. So, how can I avoid the user seeing excessive values when they total accross all programs? Easy, Hide it. I essentially made the Program Name and Key non aggregatable so that, even though you can drill down on the products/customers/Territory managers over the program semi-additively etc, and still see the totals for those entities; you cannot see the full total amount accross all programs. Well, if you do it appears as blank, which makes sense and resolves issues with members deducing totals if they do not have access to specific programmes. However, I also cannot add hierarchies to this dimension because of this issue, but that is not a requirement as of yet.
Another point, there are also transactions that are not attributed to any programme. So, I created a "default" programme called "{client name} Sales Totals". This is fed with facts at a 1:1 with transactions. Now, when a user wishes to see sales totals for a program versus all sales totals (if they have permission to do so), then they can because they would be given access to both the sales totals and specified program.
Although I would have liked to work on the bridge table process for this, and I will still try to understand how that works for future reference, I now have a model that is acceptable. My only other issue is that during processing the fact table will be considerably bigger than the physical table because of the grain alteration. This does not appear to have affected load time, but we are intending about 350M transactions (* the number of programmes), so we will see how that pan's out. I may have to consider taking the grain of the fact table to week instead of at actual transactional level as it is, but we will see how that works.
Looks like I'm in for some fun when crunch time arrives.
Please feel free to ask me to elaborate on any points, and also I would be delighted for you to pass on feedback on my implementation as I have described it.
Thanks again, Nick.
However, after reviewing the business requirements and looking at the design I noticed that I could avoid having a bridge dimension after all. Here is what I did.
I kept the bridge table:
"PG Key" (now incremental) | "Fact Compound Key 1" | "Fact Compound Key 2" | "Program Key"
1 | 29000 | 5685438 | 1
2 | 29000 | 5685438 | 3
3 | 29000 | 5685439 | 1
... and so on. (NB. The Fact Compound Keys are actually the keys from the source transaction table that I included in each row for tracking purposes. They are large because we load about 7M rows a week). The PG Key is actually just a surrogate that is used for indexing the compound keys and program key.
I modified the DSV in SSAS:
* Modified the Fact table query (which I will actually modify the view that feeds the fact table in the cube), such that the source data is joined to the bridge table. This then duplicates each sales transaction at the granularity of the program, and included Program Key in the columns at the fact level so that I can join the Program dimension directly to it.
* Joined the Program dimension to it.
* Processed the dimensions and cube.
It worked!
OK, so I know what your thinking ... "but what about the All Members member?".
In escence, if I take the sales gross totals for 1 day for each of the 3 programs as such:
Program 1 $25,000,000
Program 2 $100,000,000
Program 3 $ 70,000,000
The total will be $195M, right? Correct. However, programs do share full rights to a sale, so essentially, if Program 1 and 2 both are active on a sale, eg. $25, then the $25 appears in both, which means that there is double counting because 100% of the sale is allocated out to both programs (200% of a sale).
Although this seems weird, the client understands that and it is what is expected. So, how can I avoid the user seeing excessive values when they total accross all programs? Easy, Hide it. I essentially made the Program Name and Key non aggregatable so that, even though you can drill down on the products/customers/Territory managers over the program semi-additively etc, and still see the totals for those entities; you cannot see the full total amount accross all programs. Well, if you do it appears as blank, which makes sense and resolves issues with members deducing totals if they do not have access to specific programmes. However, I also cannot add hierarchies to this dimension because of this issue, but that is not a requirement as of yet.
Another point, there are also transactions that are not attributed to any programme. So, I created a "default" programme called "{client name} Sales Totals". This is fed with facts at a 1:1 with transactions. Now, when a user wishes to see sales totals for a program versus all sales totals (if they have permission to do so), then they can because they would be given access to both the sales totals and specified program.
Although I would have liked to work on the bridge table process for this, and I will still try to understand how that works for future reference, I now have a model that is acceptable. My only other issue is that during processing the fact table will be considerably bigger than the physical table because of the grain alteration. This does not appear to have affected load time, but we are intending about 350M transactions (* the number of programmes), so we will see how that pan's out. I may have to consider taking the grain of the fact table to week instead of at actual transactional level as it is, but we will see how that works.
Looks like I'm in for some fun when crunch time arrives.
Please feel free to ask me to elaborate on any points, and also I would be delighted for you to pass on feedback on my implementation as I have described it.
Thanks again, Nick.
nlarge- Posts : 5
Join date : 2011-07-22
Similar topics
» Bridge table and double counting issue
» Bridge Tables
» BRIDGE TABLES
» Bridge tables
» Oh no, not Bridge tables again!!!
» Bridge Tables
» BRIDGE TABLES
» Bridge tables
» Oh no, not Bridge tables again!!!
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum