"Shared Dimensions" ... Complicates Usage of Data Warehouse??
5 posters
Page 1 of 1
"Shared Dimensions" ... Complicates Usage of Data Warehouse??
Question about 'Shared' dimensions.
I have the following shared dimensions:
Dim_Shared_Status & Dim_Shared_StatusType
Dim_Shared_Status contains: [Id], [StatusTypeId], [Name]
Dim_Shared_StatusType contains: [Id], [Name]
Now Dim_Shared_Status is referenced by several Fact Tables.
Within Dim_Shared_Status there is thus several lists of statusses,
which I group via the Status Type.
Question, is this ok? why I ask is that for a business-user to at the end of the day want to group facts on say several of these status types they will have to pull Dim_Shared_Status in twice into their reporting software / powerpivot, qlikview etc.
For Example:
Let's say a user wants to group and filter facts based on two status types, they will have to retrieve the Dim_Shared_Status twice - once for each status type, this means they will have to query the data warehouse for example:
1. Dim_Shared_Status >> retrieve all where StatusTypeId = 2 then call this ApplicationStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'ApplicationStatus')
2. Dim_Shared_Status >> retrieve all where StatusTypeId = 5 then call this LogStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'LogStatus')
WOULD IT NOT BE EASIER TO HAVE STATUS DIMENSIONS containing only ONE list of status types each??
Dim_ApplicationStatus
Dim_LogStatus
I have the following shared dimensions:
Dim_Shared_Status & Dim_Shared_StatusType
Dim_Shared_Status contains: [Id], [StatusTypeId], [Name]
Dim_Shared_StatusType contains: [Id], [Name]
Now Dim_Shared_Status is referenced by several Fact Tables.
Within Dim_Shared_Status there is thus several lists of statusses,
which I group via the Status Type.
Question, is this ok? why I ask is that for a business-user to at the end of the day want to group facts on say several of these status types they will have to pull Dim_Shared_Status in twice into their reporting software / powerpivot, qlikview etc.
For Example:
Let's say a user wants to group and filter facts based on two status types, they will have to retrieve the Dim_Shared_Status twice - once for each status type, this means they will have to query the data warehouse for example:
1. Dim_Shared_Status >> retrieve all where StatusTypeId = 2 then call this ApplicationStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'ApplicationStatus')
2. Dim_Shared_Status >> retrieve all where StatusTypeId = 5 then call this LogStatus
(SELECT [Id], [Name] FROM Dim_Shared_Status INNER JOIN Dim_Shared_StatusType ON Dim_Shared_Status.StatusTypeId = Dim_Shared_StatusType.Id WHERE Dim_Shared_StatusType.[Name] = 'LogStatus')
WOULD IT NOT BE EASIER TO HAVE STATUS DIMENSIONS containing only ONE list of status types each??
Dim_ApplicationStatus
Dim_LogStatus
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Yes it would. This would also enable grouping hierarchies for your statuses.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Hi,
Okay so instead of having two dimensions (one for statusses and one for status types) to store all my status lists.
I should rather have say 25 dimensions! one dimension table for each status?
What is best practice?
I would rather simplify the design a bit and have the SSAS developer / PowerPivot user write specific queries to retrieve a list of statusses that they want?
I'm not sure which direction to take here?
Thanks,
Ian
Okay so instead of having two dimensions (one for statusses and one for status types) to store all my status lists.
I should rather have say 25 dimensions! one dimension table for each status?
What is best practice?
I would rather simplify the design a bit and have the SSAS developer / PowerPivot user write specific queries to retrieve a list of statusses that they want?
I'm not sure which direction to take here?
Thanks,
Ian
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Let me re-phrase what you described. You have a bunch of different status codes for different things throughout the system. You decided to put them all in one table with a type, to differentiate statuses, a status code and a desription of the status. Status type has a description as well. Correct?
OK... If you decided to put all the statuses in one table, why do you have another table for the description of the type? And are you using surrogate keys?
Your status dimension should look like this:
status_key (surrogate primary key)
status_code
status_description
status_type_code
status_type_description
It's not clear, other than for documentation, what the status type description provides you because your foreign key to the dimension should represent the role of the status. Users would not be selecting by status type, that would be an ETL function to get the correct surrogate key. Because you are using a surrogate key the key, the tyoe is impiled by the foreign key as it will only reference one status of one type.
It is true, if the fact table has references to statuses of differnt types, you need to alias the dimension table for each type, but how else can it be done? Anyway, most BI tools that provide a meta layer allow you to handle this easily by defining the different status as separate fields tied to a particular alias and join condition.
If you don't have a tool with a meta layer, you can always define a view or synonyms for each type in the database.
OK... If you decided to put all the statuses in one table, why do you have another table for the description of the type? And are you using surrogate keys?
Your status dimension should look like this:
status_key (surrogate primary key)
status_code
status_description
status_type_code
status_type_description
It's not clear, other than for documentation, what the status type description provides you because your foreign key to the dimension should represent the role of the status. Users would not be selecting by status type, that would be an ETL function to get the correct surrogate key. Because you are using a surrogate key the key, the tyoe is impiled by the foreign key as it will only reference one status of one type.
It is true, if the fact table has references to statuses of differnt types, you need to alias the dimension table for each type, but how else can it be done? Anyway, most BI tools that provide a meta layer allow you to handle this easily by defining the different status as separate fields tied to a particular alias and join condition.
If you don't have a tool with a meta layer, you can always define a view or synonyms for each type in the database.
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Hi
Thank you for the detailed reply, so you agree with having a 'shared' dimension to store all the statusses?
I just do not agree with duplicating all the status types?
Using your example of a shared status dimension this is what it could contain:
Thank you for the detailed reply, so you agree with having a 'shared' dimension to store all the statusses?
I just do not agree with duplicating all the status types?
Using your example of a shared status dimension this is what it could contain:
1 | APP_STATUS | Application Status | 01 | Captured | |
2 | APP_STATUS | Application Status | 02 | Reviewing | |
3 | APP_STATUS | Application Status | 03 | Manager Approval | |
4 | APP_STATUS | Application Status | 04 | Contract Created | |
5 | LOG_STATUS | Log Status | 01 | Log Created | |
6 | LOG_STATUS | Log Status | 02 | Log Opened | |
7 | LOG_STATUS | Log Status | 03 | Log Closed |
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
I don't agree or disagree. It is one way to do it. I don't know if I would have status as a dimension to begin with. I would try to incorporate it into other dimensions if I could.
As far as eliminating the status type table, what would be the gain in keeping it? As I mentioned, other than for documentation, I don't see any purpose for it. If there is no intent to include the status type description in queries, then its own table is fine. Otherwise, if the intent is to provide this description to the users in some manner, why complicate things with another table?
As far as eliminating the status type table, what would be the gain in keeping it? As I mentioned, other than for documentation, I don't see any purpose for it. If there is no intent to include the status type description in queries, then its own table is fine. Otherwise, if the intent is to provide this description to the users in some manner, why complicate things with another table?
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Ian,
In your example Dimension Status is always your shared dimension. And Status Type is only used to further define your Dimension Status.
If so, then why dont you simply have status type in the status table.
Is Dim Status Table large? Do you see any benefit in keeping the flakes?? If not, I will suggest go with one shared dim table.
Regards
Shiv
In your example Dimension Status is always your shared dimension. And Status Type is only used to further define your Dimension Status.
If so, then why dont you simply have status type in the status table.
Is Dim Status Table large? Do you see any benefit in keeping the flakes?? If not, I will suggest go with one shared dim table.
Regards
Shiv
sgudavalli- Posts : 29
Join date : 2010-06-10
Age : 40
Location : Pune, India
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Hi,
No the shared status dimension won't every contain 100,000 records.
So I think I'm starting to agree that I should have only ONE shared status dimension with types etc. in there, even though type names and codes will be duplicated - it will simplify the design and use of the table.
BUT
back to the users perspective (who will be using this ...)
How will they?
Let's say they want to pull in two facts into their client tools (powerpivot, qlikview etc. etc.)
If they have:
Fact_Table_01
Fact_Table_02
Dim_Shared_Status
and they want to group on two different types of status types how can they do that?
since they can select the status name / code from the shared dimension and then maybe filter by status type.
but they may not be able to filter and group as they would have it there was two different status dimensions that related to each of the fact tables? or am i still not seeing the big picture on this topic?
thanks,
ian
No the shared status dimension won't every contain 100,000 records.
So I think I'm starting to agree that I should have only ONE shared status dimension with types etc. in there, even though type names and codes will be duplicated - it will simplify the design and use of the table.
BUT
back to the users perspective (who will be using this ...)
How will they?
Let's say they want to pull in two facts into their client tools (powerpivot, qlikview etc. etc.)
If they have:
Fact_Table_01
Fact_Table_02
Dim_Shared_Status
and they want to group on two different types of status types how can they do that?
since they can select the status name / code from the shared dimension and then maybe filter by status type.
but they may not be able to filter and group as they would have it there was two different status dimensions that related to each of the fact tables? or am i still not seeing the big picture on this topic?
thanks,
ian
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
If you have 10 statuses, you need 10 FK's to the shared status dim.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Hi,
okay, let's say a single fact can be associated to 10 status types. (of which each status type may consist of 20 individual statuses each)
then you could do this.
but still, how would this be easy to analyze?
Let's say the user wants to see the total facts grrouped by a specific status type.
Let's say that status type "A" could have 3 statuses "A.1", "A.2", "A.3"
and the user also wants to split those totals by another status type say status type "B" of which it has "B.1", "B.2" as statuses.
[Status A] "A.1" "A.2" "A.3"
[Status B]
"B.1" 5 2 1
"B.2" 3 1 0
So in the fact table these could be the records (not using FK id's but text values for simplicity)
Fact_Table
[Id] [StatusA_FK] [StatusB_FK]
1 "A.1" "B.1"
2 "A.1" "B.1"
3 "A.1" "B.1"
4 "A.1" "B.1"
5 "A.1" "B.1"
6 "A.1" "B.2"
7 "A.1" "B.2"
8 "A.1" "B.2"
9 "A.2" "B.1"
10 "A.2" "B.1"
11 "A.2" "B.2
12 "A.3" "B.1"
okay, let's say a single fact can be associated to 10 status types. (of which each status type may consist of 20 individual statuses each)
then you could do this.
but still, how would this be easy to analyze?
Let's say the user wants to see the total facts grrouped by a specific status type.
Let's say that status type "A" could have 3 statuses "A.1", "A.2", "A.3"
and the user also wants to split those totals by another status type say status type "B" of which it has "B.1", "B.2" as statuses.
[Status A] "A.1" "A.2" "A.3"
[Status B]
"B.1" 5 2 1
"B.2" 3 1 0
So in the fact table these could be the records (not using FK id's but text values for simplicity)
Fact_Table
[Id] [StatusA_FK] [StatusB_FK]
1 "A.1" "B.1"
2 "A.1" "B.1"
3 "A.1" "B.1"
4 "A.1" "B.1"
5 "A.1" "B.1"
6 "A.1" "B.2"
7 "A.1" "B.2"
8 "A.1" "B.2"
9 "A.2" "B.1"
10 "A.2" "B.1"
11 "A.2" "B.2
12 "A.3" "B.1"
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Fact Table
statusA_Key
statusB_Key
...
SELECT SA.CODE, SB.CODE, fact measures
FROM FACT, STATUS SA, STATUS SB
WHERE STATUSA_KEY = SA.KEY
AND STATUSB_KEY = SB.KEY
GROUP BY...
statusA_Key
statusB_Key
...
SELECT SA.CODE, SB.CODE, fact measures
FROM FACT, STATUS SA, STATUS SB
WHERE STATUSA_KEY = SA.KEY
AND STATUSB_KEY = SB.KEY
GROUP BY...
Re: "Shared Dimensions" ... Complicates Usage of Data Warehouse??
Have you thought about using junk dimension. If they don't belong to other major dimensions and the total number of possible combinations of these statuses is order of magnitude smaller than the fact tables, then you can denormalise the statuses into single or groups of junk dimensions so that you would have minimal dimension entries in your fact tables. You can use SELECT DISTINCT StatusA, StatusB ... FROM Fact to estimate how big the junk dimension could be.
In ngalemmo's illustration, the STATUS dimension is like, maybe not quite the same as a role playing dimension in DW. The Status dimension looks to me like a generic lookup table sectioned for different types of statuses. If those types don't share common status values, it's not a true role playing dimension per se. They could belong to other dimensions or can be grouped into junk dimensions, or even standalone dimensions. If you don't feel comfortable about creating too many small tables, then use ngalemmo's idea to treat it like a role playing dimension. However, Kimall suggested creating views for role playing dimension for the purpose of clarity to the dimension users and I think the point may be more relevant in this case.
In ngalemmo's illustration, the STATUS dimension is like, maybe not quite the same as a role playing dimension in DW. The Status dimension looks to me like a generic lookup table sectioned for different types of statuses. If those types don't share common status values, it's not a true role playing dimension per se. They could belong to other dimensions or can be grouped into junk dimensions, or even standalone dimensions. If you don't feel comfortable about creating too many small tables, then use ngalemmo's idea to treat it like a role playing dimension. However, Kimall suggested creating views for role playing dimension for the purpose of clarity to the dimension users and I think the point may be more relevant in this case.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» difference between data mart and data warehouse at logical/physical level
» Data Marts, Conformed dimensions and Data Warehouse
» Is it a best practice that Data warehouse follows the source system data type?
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» Data Marts, Conformed dimensions and Data Warehouse
» Is it a best practice that Data warehouse follows the source system data type?
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum