Multivalued attribute
2 posters
Page 1 of 1
Multivalued attribute
Hi
I have read a lot about this modeling technique but either I didn't understand what I read either this is not the solution suitable to my needs.
Here it is. I am dealing with sales of products in stores.
Each product can be assigned to a collection of assortment values.
Each store can be assigned to a collection of assortment values.
A product can be sold in a store (and is sent to) if one or more of these assortment values match.
Now we want to analyze the sold amounts, quantity... by time, product, store and assortment value.
Examples of what we want to achieve:
- Display the sold amount for a given store by product displaying the assortment value(s),
- Display the sold amount for a given assortment value,
- Display the sold amount for a given collection of assortment values.
I tried to apply the technique exposed in Managing Helper Tables but I don't see what a bridge table between the fact table and the multivalued dimension table with a weighting factor is any different than adding this dimension to the grain of the fact table and distribute the indicators using the same rules that you would use to determine the weight factor.
I can't see how to model this. Using weighting factors in the bridge table and filtering to one assortment value in the report would not return the correct amount but a fraction of it.
Am I missing something? Any clues?
Thanks for any insight.
Guillaume
PS: English is not my mother tongue so I might have done some grammatical errors.
I have read a lot about this modeling technique but either I didn't understand what I read either this is not the solution suitable to my needs.
Here it is. I am dealing with sales of products in stores.
Each product can be assigned to a collection of assortment values.
Each store can be assigned to a collection of assortment values.
A product can be sold in a store (and is sent to) if one or more of these assortment values match.
Now we want to analyze the sold amounts, quantity... by time, product, store and assortment value.
Examples of what we want to achieve:
- Display the sold amount for a given store by product displaying the assortment value(s),
- Display the sold amount for a given assortment value,
- Display the sold amount for a given collection of assortment values.
I tried to apply the technique exposed in Managing Helper Tables but I don't see what a bridge table between the fact table and the multivalued dimension table with a weighting factor is any different than adding this dimension to the grain of the fact table and distribute the indicators using the same rules that you would use to determine the weight factor.
I can't see how to model this. Using weighting factors in the bridge table and filtering to one assortment value in the report would not return the correct amount but a fraction of it.
Am I missing something? Any clues?
Thanks for any insight.
Guillaume
PS: English is not my mother tongue so I might have done some grammatical errors.
glaurent- Posts : 2
Join date : 2010-07-22
Re: Multivalued attribute
The purpose of the bridge is to avoid messing up the fact table with detail that may otherwise result in bad results using queries that don't use the multi dimensional values. A query showing quantities sold by product would be a mess if the fact contained extra rows for each potential assortment values.
As far as allocation values in the bridge go, they may be useful for some queries, but you do not need to use them in every query. It all depends on what you are trying to do.
As far as allocation values in the bridge go, they may be useful for some queries, but you do not need to use them in every query. It all depends on what you are trying to do.
Re: Multivalued attribute
First, thanks for your insight, I'm trying to find out the best way to do things and this concern just seems unresolvable for the moment.
But I guess there are some case you can't aggregate or some other case I can't see right now.
That being said, I think I will try to explain my self a little more clearly using a known (I think) example extracted from a 2008 Ralph Kimball article called Help for Dimensional Modeling - Helper tables let you design and manage multivalued dimensions successfully:
(I couldn't find any rules about quoting but if forbidden, I will edit my post.)
In this example, what if there is a billing record with $200 billed joined to a four-diagnosis group containing two "Not Contagious" diagnosis, one "Contagious" diagnosis and a final "Unknown contagious status" (just to keep with this example) diagnosis. Let's say each diagnosis have the same weighting factor of 0.25 (1/4).
Then on report level if you constrain the Contagious Indicator with "Contagious" and "Not Contagious" you don't get the right total since it would only display $150, 3/4 of the bill the other quarter not being taken into account because it got filtered.
To keep up with this example, what I'm trying to do is not having a report grouping bills amongst an attribute of the diagnosis dimension. I would like to display the billed_amount indicator with the new Diagnosis group dimension whose name should be created at loading time (by concatening diagnosis names for example or whatever canonical name asked by users like "C2NC1" meaning "two contagious and one non contagious diagnoses").
Where I am stuck is that I can't find a way to model things allowing reports to get the bills (and group diagnosis dimension) which are related to, at least, a given diagnosis (and even better a diagnosis within a given list).
I can see how to do this using an exists SQL clause but such a clause can't be generated by the reporting tool.
Anyway to do this?
Thanks for reading
Guillaume
OK. I can't see why the results using queries queries that don't use the multivalued dimension table(s) should be bad if one takes good to allocate indicators value correctly amongst each value in the fact table: if the multivalued dimension table(s) is(are) not selected, indicators should be aggregated at report level and it should been seen as if there weren't no multivalued dimension(s). I can see how it could be less time efficient, especially if this(these) dimension(s) are not frequently used, but again one could create an aggregate to speed up things.ngalemmo wrote:The purpose of the bridge is to avoid messing up the fact table with detail that may otherwise result in bad results using queries that don't use the multi dimensional values. A query showing quantities sold by product would be a mess if the fact contained extra rows for each potential assortment values.
But I guess there are some case you can't aggregate or some other case I can't see right now.
I can see that: one could want to display the whole values of indicators for each value of a multivalued dimension - I've read these are called "impact reports".ngalemmo wrote:As far as allocation values in the bridge go, they may be useful for some queries, but you do not need to use them in every query. It all depends on what you are trying to do.
That being said, I think I will try to explain my self a little more clearly using a known (I think) example extracted from a 2008 Ralph Kimball article called Help for Dimensional Modeling - Helper tables let you design and manage multivalued dimensions successfully:
(I couldn't find any rules about quoting but if forbidden, I will edit my post.)
Ok I like this a lot, the group becoming kind of a new dimension containing several components.R. Kimball wrote:
FIG 2
If you really insist on modeling this multivalued situation, then a "helper" table placed between the Diagnosis dimension and the fact table is the best solution. (See Figure 2.) The Diagnosis key in the fact table is changed to be a Diagnosis Group key. The helper table in the middle is the Diagnosis Group table. It has one record for each diagnosis in a group of diagnoses. If I walk into the doctor's office with three diagnoses, then I need a Diagnosis Group with three records in it. It is up to the modeler to build either these Diagnosis Groups for each individual or a library of "known" Diagnosis Groups. Perhaps my three diagnoses would be called "Kimball's Syndrome."
So far so good, that is exactly what is expected even though I still think it can be done just allocating the billed_amount indicator amongst each diagnosis (until I see the right case I guess) but let's continue.R. Kimball wrote:The Diagnosis Group table [the helper - or bridge - table] is joined to the original Diagnosis dimension [the multivalued dimension] on the Diagnosis key. The Diagnosis Group table in Figure 2 contains a very important numeric attribute: the weighting factor. The weighting factor allows reports to be created that don't double count the Billed Amount in the fact table.
Ok but that's not it...R. Kimball wrote:For instance, if you constrain some attribute in the Diagnosis dimension such as "Contagious Indicator" with the values Contagious and Not Contagious, then you can group by the Contagious Indicator and produce a report with the correct totals. To get the correct totals, we must multiply the Billed Amount by the associated weighting factor. This is a correctly weighted report.
In this example, what if there is a billing record with $200 billed joined to a four-diagnosis group containing two "Not Contagious" diagnosis, one "Contagious" diagnosis and a final "Unknown contagious status" (just to keep with this example) diagnosis. Let's say each diagnosis have the same weighting factor of 0.25 (1/4).
Then on report level if you constrain the Contagious Indicator with "Contagious" and "Not Contagious" you don't get the right total since it would only display $150, 3/4 of the bill the other quarter not being taken into account because it got filtered.
To keep up with this example, what I'm trying to do is not having a report grouping bills amongst an attribute of the diagnosis dimension. I would like to display the billed_amount indicator with the new Diagnosis group dimension whose name should be created at loading time (by concatening diagnosis names for example or whatever canonical name asked by users like "C2NC1" meaning "two contagious and one non contagious diagnoses").
Where I am stuck is that I can't find a way to model things allowing reports to get the bills (and group diagnosis dimension) which are related to, at least, a given diagnosis (and even better a diagnosis within a given list).
I can see how to do this using an exists SQL clause but such a clause can't be generated by the reporting tool.
Anyway to do this?
Thanks for reading
Guillaume
glaurent- Posts : 2
Join date : 2010-07-22
Re: Multivalued attribute
There is a discussion on how to query using bridge tables in
http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/what-to-do-when-the-weighting-factor-of-a-bridge-table-no-longer-seems-relevant-t441.htm
You don't necessarily need to use an EXSITS, and trying to implement one in a typical BI tool is difficult, if not impossible. But, when you think about it, the bridge is logically equivilent to a fact table with a higher grain and preallocated measures. The same query issues exist either way.
The idea of having some derived attributes at the group level is not a bad one (if it makes sense to the business) but I would not implement such attributes in the group table, which can be quite large, instead, I would place them in a much smaller junk dimension and add a FK to the fact.
http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/what-to-do-when-the-weighting-factor-of-a-bridge-table-no-longer-seems-relevant-t441.htm
You don't necessarily need to use an EXSITS, and trying to implement one in a typical BI tool is difficult, if not impossible. But, when you think about it, the bridge is logically equivilent to a fact table with a higher grain and preallocated measures. The same query issues exist either way.
The idea of having some derived attributes at the group level is not a bad one (if it makes sense to the business) but I would not implement such attributes in the group table, which can be quite large, instead, I would place them in a much smaller junk dimension and add a FK to the fact.
Similar topics
» Multivalued attribute on Dimension
» Single Attribute Dimensions - why not instead write textual attribute into Fact table?
» Dimension Attribute or Fact Attribute
» Healthcare, diagnosis and Dimension model
» Multivalued attributes for dimension
» Single Attribute Dimensions - why not instead write textual attribute into Fact table?
» Dimension Attribute or Fact Attribute
» Healthcare, diagnosis and Dimension model
» Multivalued attributes for dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum