Multivalued attribute

glaurent Mon Jul 26, 2010 10:10 am

Hi

I have read a lot about this modeling technique but either I didn't understand what I read either this is not the solution suitable to my needs.

Here it is. I am dealing with sales of products in stores.
Each product can be assigned to a collection of assortment values.
Each store can be assigned to a collection of assortment values.
A product can be sold in a store (and is sent to) if one or more of these assortment values match.
Now we want to analyze the sold amounts, quantity... by time, product, store and assortment value.

Examples of what we want to achieve:
- Display the sold amount for a given store by product displaying the assortment value(s),
- Display the sold amount for a given assortment value,
- Display the sold amount for a given collection of assortment values.

I tried to apply the technique exposed in Managing Helper Tables but I don't see what a bridge table between the fact table and the multivalued dimension table with a weighting factor is any different than adding this dimension to the grain of the fact table and distribute the indicators using the same rules that you would use to determine the weight factor.

I can't see how to model this. Using weighting factors in the bridge table and filtering to one assortment value in the report would not return the correct amount but a fraction of it.

Am I missing something? Any clues?

Thanks for any insight.
Guillaume

PS: English is not my mother tongue so I might have done some grammatical errors.

ngalemmo Mon Jul 26, 2010 1:04 pm

The purpose of the bridge is to avoid messing up the fact table with detail that may otherwise result in bad results using queries that don't use the multi dimensional values. A query showing quantities sold by product would be a mess if the fact contained extra rows for each potential assortment values.

As far as allocation values in the bridge go, they may be useful for some queries, but you do not need to use them in every query. It all depends on what you are trying to do.

glaurent Tue Jul 27, 2010 9:04 am

First, thanks for your insight, I'm trying to find out the best way to do things and this concern just seems unresolvable for the moment.

ngalemmo wrote:The purpose of the bridge is to avoid messing up the fact table with detail that may otherwise result in bad results using queries that don't use the multi dimensional values. A query showing quantities sold by product would be a mess if the fact contained extra rows for each potential assortment values.

OK. I can't see why the results using queries queries that don't use the multivalued dimension table(s) should be bad if one takes good to allocate indicators value correctly amongst each value in the fact table: if the multivalued dimension table(s) is(are) not selected, indicators should be aggregated at report level and it should been seen as if there weren't no multivalued dimension(s). I can see how it could be less time efficient, especially if this(these) dimension(s) are not frequently used, but again one could create an aggregate to speed up things.
But I guess there are some case you can't aggregate or some other case I can't see right now.

ngalemmo wrote:As far as allocation values in the bridge go, they may be useful for some queries, but you do not need to use them in every query. It all depends on what you are trying to do.

I can see that: one could want to display the whole values of indicators for each value of a multivalued dimension - I've read these are called "impact reports".

That being said, I think I will try to explain my self a little more clearly using a known (I think) example extracted from a 2008 Ralph Kimball article called Help for Dimensional Modeling - Helper tables let you design and manage multivalued dimensions successfully:
(I couldn't find any rules about quoting but if forbidden, I will edit my post.)

R. Kimball wrote:
FIG 2

If you really insist on modeling this multivalued situation, then a "helper" table placed between the Diagnosis dimension and the fact table is the best solution. (See Figure 2.) The Diagnosis key in the fact table is changed to be a Diagnosis Group key. The helper table in the middle is the Diagnosis Group table. It has one record for each diagnosis in a group of diagnoses. If I walk into the doctor's office with three diagnoses, then I need a Diagnosis Group with three records in it. It is up to the modeler to build either these Diagnosis Groups for each individual or a library of "known" Diagnosis Groups. Perhaps my three diagnoses would be called "Kimball's Syndrome."

Ok I like this a lot, the group becoming kind of a new dimension containing several components.

R. Kimball wrote:The Diagnosis Group table [the helper - or bridge - table] is joined to the original Diagnosis dimension [the multivalued dimension] on the Diagnosis key. The Diagnosis Group table in Figure 2 contains a very important numeric attribute: the weighting factor. The weighting factor allows reports to be created that don't double count the Billed Amount in the fact table.

So far so good, that is exactly what is expected even though I still think it can be done just allocating the billed_amount indicator amongst each diagnosis (until I see the right case I guess) but let's continue.

R. Kimball wrote:For instance, if you constrain some attribute in the Diagnosis dimension such as "Contagious Indicator" with the values Contagious and Not Contagious, then you can group by the Contagious Indicator and produce a report with the correct totals. To get the correct totals, we must multiply the Billed Amount by the associated weighting factor. This is a correctly weighted report.

Ok but that's not it...
In this example, what if there is a billing record with $200 billed joined to a four-diagnosis group containing two "Not Contagious" diagnosis, one "Contagious" diagnosis and a final "Unknown contagious status" (just to keep with this example) diagnosis. Let's say each diagnosis have the same weighting factor of 0.25 (1/4).
Then on report level if you constrain the Contagious Indicator with "Contagious" and "Not Contagious" you don't get the right total since it would only display $150, 3/4 of the bill the other quarter not being taken into account because it got filtered.

To keep up with this example, what I'm trying to do is not having a report grouping bills amongst an attribute of the diagnosis dimension. I would like to display the billed_amount indicator with the new Diagnosis group dimension whose name should be created at loading time (by concatening diagnosis names for example or whatever canonical name asked by users like "C2NC1" meaning "two contagious and one non contagious diagnoses").
Where I am stuck is that I can't find a way to model things allowing reports to get the bills (and group diagnosis dimension) which are related to, at least, a given diagnosis (and even better a diagnosis within a given list).
I can see how to do this using an exists SQL clause but such a clause can't be generated by the reporting tool.

Anyway to do this?

Thanks for reading
Guillaume

ngalemmo Tue Jul 27, 2010 11:48 am

There is a discussion on how to query using bridge tables in
http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/what-to-do-when-the-weighting-factor-of-a-bridge-table-no-longer-seems-relevant-t441.htm

You don't necessarily need to use an EXSITS, and trying to implement one in a typical BI tool is difficult, if not impossible. But, when you think about it, the bridge is logically equivilent to a fact table with a higher grain and preallocated measures. The same query issues exist either way.

The idea of having some derived attributes at the group level is not a bad one (if it makes sense to the business) but I would not implement such attributes in the group table, which can be quite large, instead, I would place them in a much smaller junk dimension and add a FK to the fact.

Sponsored content

Multivalued attribute

Multivalued attribute

Re: Multivalued attribute

Re: Multivalued attribute

Re: Multivalued attribute

Re: Multivalued attribute