Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

"Shared Dimensions" ... Complicates Usage of Data Warehouse??

5 posters

Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ian.coetzer Wed Jun 09, 2010 3:15 am

Question about 'Shared' dimensions.

I have the following shared dimensions:

Dim_Shared_Status & Dim_Shared_StatusType

Dim_Shared_Status contains: [Id], [StatusTypeId], [Name]
Dim_Shared_StatusType contains: [Id], [Name]

Now Dim_Shared_Status is referenced by several Fact Tables.

Within Dim_Shared_Status there is thus several lists of statusses,
which I group via the Status Type.

Question, is this ok? why I ask is that for a business-user to at the end of the day want to group facts on say several of these status types they will have to pull Dim_Shared_Status in twice into their reporting software / powerpivot, qlikview etc.

For Example:

Let's say a user wants to group and filter facts based on two status types, they will have to retrieve the Dim_Shared_Status twice - once for each status type, this means they will have to query the data warehouse for example:

1. Dim_Shared_Status >> retrieve all where StatusTypeId = 2 then call this ApplicationStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'ApplicationStatus')

2. Dim_Shared_Status >> retrieve all where StatusTypeId = 5 then call this LogStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'LogStatus')

WOULD IT NOT BE EASIER TO HAVE STATUS DIMENSIONS containing only ONE list of status types each??

Dim_ApplicationStatus
Dim_LogStatus
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  BoxesAndLines Wed Jun 09, 2010 9:26 am

Yes it would. This would also enable grouping hierarchies for your statuses.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ian.coetzer Wed Jun 09, 2010 10:00 am

Hi,

Okay so instead of having two dimensions (one for statusses and one for status types) to store all my status lists.
I should rather have say 25 dimensions! one dimension table for each status?

What is best practice?

I would rather simplify the design a bit and have the SSAS developer / PowerPivot user write specific queries to retrieve a list of statusses that they want?

I'm not sure which direction to take here?

Thanks,
Ian
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ngalemmo Wed Jun 09, 2010 11:58 am

Let me re-phrase what you described. You have a bunch of different status codes for different things throughout the system. You decided to put them all in one table with a type, to differentiate statuses, a status code and a desription of the status. Status type has a description as well. Correct?

OK... If you decided to put all the statuses in one table, why do you have another table for the description of the type? And are you using surrogate keys?

Your status dimension should look like this:

status_key (surrogate primary key)
status_code
status_description
status_type_code
status_type_description

It's not clear, other than for documentation, what the status type description provides you because your foreign key to the dimension should represent the role of the status. Users would not be selecting by status type, that would be an ETL function to get the correct surrogate key. Because you are using a surrogate key the key, the tyoe is impiled by the foreign key as it will only reference one status of one type.

It is true, if the fact table has references to statuses of differnt types, you need to alias the dimension table for each type, but how else can it be done? Anyway, most BI tools that provide a meta layer allow you to handle this easily by defining the different status as separate fields tied to a particular alias and join condition.

If you don't have a tool with a meta layer, you can always define a view or synonyms for each type in the database.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ian.coetzer Wed Jun 09, 2010 2:06 pm

Hi

Thank you for the detailed reply, so you agree with having a 'shared' dimension to store all the statusses?

I just do not agree with duplicating all the status types?

Using your example of a shared status dimension this is what it could contain:

1 APP_STATUS Application Status 01 Captured
2 APP_STATUS Application Status 02 Reviewing
3 APP_STATUS Application Status 03 Manager Approval
4 APP_STATUS Application Status 04 Contract Created
5 LOG_STATUS Log Status 01 Log Created
6 LOG_STATUS Log Status 02 Log Opened
7 LOG_STATUS Log Status 03 Log Closed
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ngalemmo Wed Jun 09, 2010 2:53 pm

I don't agree or disagree. It is one way to do it. I don't know if I would have status as a dimension to begin with. I would try to incorporate it into other dimensions if I could.

As far as eliminating the status type table, what would be the gain in keeping it? As I mentioned, other than for documentation, I don't see any purpose for it. If there is no intent to include the status type description in queries, then its own table is fine. Otherwise, if the intent is to provide this description to the users in some manner, why complicate things with another table?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  sgudavalli Thu Jun 10, 2010 9:31 am

Ian,

In your example Dimension Status is always your shared dimension. And Status Type is only used to further define your Dimension Status.
If so, then why dont you simply have status type in the status table.

Is Dim Status Table large? Do you see any benefit in keeping the flakes?? If not, I will suggest go with one shared dim table.

Regards
Shiv

sgudavalli

Posts : 29
Join date : 2010-06-10
Age : 39
Location : Pune, India

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ian.coetzer Thu Jun 10, 2010 1:20 pm

Hi,

No the shared status dimension won't every contain 100,000 records.
So I think I'm starting to agree that I should have only ONE shared status dimension with types etc. in there, even though type names and codes will be duplicated - it will simplify the design and use of the table.

BUT

back to the users perspective (who will be using this ...)

How will they?

Let's say they want to pull in two facts into their client tools (powerpivot, qlikview etc. etc.)

If they have:

Fact_Table_01
Fact_Table_02
Dim_Shared_Status

and they want to group on two different types of status types how can they do that?

since they can select the status name / code from the shared dimension and then maybe filter by status type.
but they may not be able to filter and group as they would have it there was two different status dimensions that related to each of the fact tables? or am i still not seeing the big picture on this topic?

thanks,
ian
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  BoxesAndLines Thu Jun 10, 2010 2:45 pm

If you have 10 statuses, you need 10 FK's to the shared status dim.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ian.coetzer Thu Jun 10, 2010 3:12 pm

Hi,

okay, let's say a single fact can be associated to 10 status types. (of which each status type may consist of 20 individual statuses each)
then you could do this.
but still, how would this be easy to analyze?

Let's say the user wants to see the total facts grrouped by a specific status type.
Let's say that status type "A" could have 3 statuses "A.1", "A.2", "A.3"
and the user also wants to split those totals by another status type say status type "B" of which it has "B.1", "B.2" as statuses.


[Status A] "A.1" "A.2" "A.3"

[Status B]
"B.1" 5 2 1

"B.2" 3 1 0

So in the fact table these could be the records (not using FK id's but text values for simplicity)

Fact_Table
[Id] [StatusA_FK] [StatusB_FK]
1 "A.1" "B.1"
2 "A.1" "B.1"
3 "A.1" "B.1"
4 "A.1" "B.1"
5 "A.1" "B.1"
6 "A.1" "B.2"
7 "A.1" "B.2"
8 "A.1" "B.2"
9 "A.2" "B.1"
10 "A.2" "B.1"
11 "A.2" "B.2
12 "A.3" "B.1"
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  ngalemmo Thu Jun 10, 2010 4:08 pm

Fact Table

statusA_Key
statusB_Key
...


SELECT SA.CODE, SB.CODE, fact measures
FROM FACT, STATUS SA, STATUS SB
WHERE STATUSA_KEY = SA.KEY
AND STATUSB_KEY = SB.KEY
GROUP BY...
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  hang Fri Jun 11, 2010 8:03 am

Have you thought about using junk dimension. If they don't belong to other major dimensions and the total number of possible combinations of these statuses is order of magnitude smaller than the fact tables, then you can denormalise the statuses into single or groups of junk dimensions so that you would have minimal dimension entries in your fact tables. You can use SELECT DISTINCT StatusA, StatusB ... FROM Fact to estimate how big the junk dimension could be.

In ngalemmo's illustration, the STATUS dimension is like, maybe not quite the same as a role playing dimension in DW. The Status dimension looks to me like a generic lookup table sectioned for different types of statuses. If those types don't share common status values, it's not a true role playing dimension per se. They could belong to other dimensions or can be grouped into junk dimensions, or even standalone dimensions. If you don't feel comfortable about creating too many small tables, then use ngalemmo's idea to treat it like a role playing dimension. However, Kimall suggested creating views for role playing dimension for the purpose of clarity to the dimension users and I think the point may be more relevant in this case.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

"Shared Dimensions" ... Complicates Usage of Data Warehouse?? Empty Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum