Dimension with two hierarchies
+2
Jeff Smith
neo.helios
6 posters
Page 1 of 1
Dimension with two hierarchies
I need to design a dimension with two hierarchies
1. Location -> Building -> Room -> Cabin
2. Location ->Building
I need the hierarchy 1 to be referenced in one FACT table and hierarchy 2 in other FACT. But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
Is it proper approach to create both the hierarchies in the same dimension and referencing different FACT tables
1. Location -> Building -> Room -> Cabin
2. Location ->Building
I need the hierarchy 1 to be referenced in one FACT table and hierarchy 2 in other FACT. But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
Is it proper approach to create both the hierarchies in the same dimension and referencing different FACT tables
neo.helios- Posts : 11
Join date : 2010-11-02
Re: Dimension with two hierarchies
I would build 1 dimension. I would build the dimension in 2 steps. I would create a staging dimension with Location and Building, creating a surrogate key (Location, Building, Loc_bldg_Skey). I would create a second dimension build with Location, Building, Loc_bldg_Skey, Room, Cabin, Loc_bldg_Rm_cbn_skey using the staging dimension as the source for the Location, Building, Loc_bldg_Skey columns.
I would create a view with distinct Location, Building, Loc_bldg_Skey as a mini dimension but I think an argument could be made to go ahead and make that initial staging table as a dimension table - my ETL allows for multiple surrogate keys in a dimension table.
By the way, it's one hierarchy with a rollup point (a mini dimension). Multiple hierachies in a dimension is when the lowest level can roll up along 2 or more different paths.
I would create a view with distinct Location, Building, Loc_bldg_Skey as a mini dimension but I think an argument could be made to go ahead and make that initial staging table as a dimension table - my ETL allows for multiple surrogate keys in a dimension table.
By the way, it's one hierarchy with a rollup point (a mini dimension). Multiple hierachies in a dimension is when the lowest level can roll up along 2 or more different paths.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dimension with two hierarchies
Is the Location/Building relationship different in #2? If not, how are these two hierarchies different?
Or is the issue that the facts are at different grains?
And what ETL tool are you using?
Or is the issue that the facts are at different grains?
And what ETL tool are you using?
Re: Dimension with two hierarchies
Thanks for the quick reply Jeff.
With the approach that you suggest, if I have more levels in the dimension and if I have to use each level as a hierarchy ref. in different fact tables then I would end up creating many staging and mini dimensions. Please suggest
With the approach that you suggest, if I have more levels in the dimension and if I have to use each level as a hierarchy ref. in different fact tables then I would end up creating many staging and mini dimensions. Please suggest
neo.helios- Posts : 11
Join date : 2010-11-02
Re: Dimension with two hierarchies
the fact tables are at different grains. And I am using Oracle Warehouse Builder as the ETL tool
neo.helios- Posts : 11
Join date : 2010-11-02
Re: Dimension with two hierarchies
ngalemmo wrote:
And what ETL tool are you using?
Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?
dellsters- Posts : 39
Join date : 2009-02-11
Re: Dimension with two hierarchies
If you have mutliple levels in a hierarchy and you plan to create aggregate tables at the various levels, then yes, I would create surrogates at each level. If you plan to only rollup the data in cubes, then I wouldn't bother creating additional surrogate keys.
Let's say your detailed data was at the City level. You wanted to create aggregate tables for the county, state, and region level. I would have 1 dimension table with 4 surrogate keys - 1 for each level, city, county, state, region. I would create 3 views from the dimension table with County/State/Region, State/Region, and Region.
If I was planning to aggregate the data only in Cubes, then I would not create Surrogates for each level.
But, when in doubt, create the additinal surrogates. Easier to do it from the get go then after the fact.
Let's say your detailed data was at the City level. You wanted to create aggregate tables for the county, state, and region level. I would have 1 dimension table with 4 surrogate keys - 1 for each level, city, county, state, region. I would create 3 views from the dimension table with County/State/Region, State/Region, and Region.
If I was planning to aggregate the data only in Cubes, then I would not create Surrogates for each level.
But, when in doubt, create the additinal surrogates. Easier to do it from the get go then after the fact.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dimension with two hierarchies
dellsters wrote:ngalemmo wrote:
And what ETL tool are you using?
Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?
I think the older the version of the ETL tool, the less functionality it was with regards to dimension building.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dimension with two hierarchies
dellsters wrote:ngalemmo wrote:
And what ETL tool are you using?
Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?
I asked because of the comment:
But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
Most tools are flexible enough to do whatever you need to do...
Re: Dimension with two hierarchies
For high level ETL architecture, don't you populate the dimensions first and when it comes to populating the fact, just source all the dimensions and look up the keys and load the fact table?
dellsters- Posts : 39
Join date : 2009-02-11
Re: Dimension with two hierarchies
So don't use ETL tool if it does not do the job. Once you understand the process of ETL, the best tool is SQL. Vendor specific ETL tools are pretty for their look and having skills on them somehow makes you a serious ETL professional on the market. In reality, a lot of hard core stuff need to be scripted out by SQL, and believe me, properly scripted native SQL always outperforms visual tools.aluri.prabhakar wrote:But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Dimension with two hierarchies
hang wrote:So don't use ETL tool if it does not do the job. Once you understand the process of ETL, the best tool is SQL. Vendor specific ETL tools are pretty for their look and having skills on them somehow makes you a serious ETL professional on the market. In reality, a lot of hard core stuff need to be scripted out by SQL, and believe me, properly scripted native SQL always outperforms visual tools.aluri.prabhakar wrote:But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
Not exactly true. Cognos DecisionStream did a very good job with dimensions and was fast, at the time, when it came to loading fact tables and aggregates. But, it was a little limiting. I think either SSIS or SSAS does a real nice job - it's very fast.
I like SQL and am comfortable with it but I don't think it's the fastest thing in the world. At least not for SQL Server databases.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dimension with two hierarchies
True, they both do good job in their respective field. But we are talking about ETL, so SSAS is irrelevant, and it is irreplaceable by SQL when a cube is needed.Jeff Smith wrote:I think either SSIS or SSAS does a real nice job - it's very fast.
I like SSIS too when doing ETL, but I use it for its special features that can hardly be implemented in SQL, like connecting to alian database, connection string config, file management, email, cube processig and fauzzy matching etc. However some of the core ETL functionality like Surrogate Key generation and lookup, SCD implementation can be achieved in stored proc more efficiently.
What I am saying is, combine SQL scripts with the ETL tools. When a ETL tool does not do the job, use SQL instead of counting on tools to do everything for you. And also, if tools are too slow, try SQL and I bet the latter will be likely faster.
Jeff, have you compared the performance of using SSIS SCD2 component with stored proc in SQL for million records dimension. If you have you may not want to do everything in ETL by SSIS. My experience is 10 times difference sometimes.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Dimension with two hierarchies
I'll be the first to admit my strength is not ETL. I was just sharing some of the comments I heard among the developers. We have a raging battle going on between 2 camps - one that swears by Stored Proc the other that thinks SQL Server's ETL tool is the way to go.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Dimension with two hierarchies
As I said, SSIS has got pretty face and is impressive to show the picture to other people which is probably a good selling point. On the other hand, stored proc can do all the hard core stuff that SSIS can't, but with ugly old looking SQL code. However the value should only be measured by ..., you know what I mean. My advice is use both to their best strength.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Dimension with two hierarchies
ngalemmo wrote:Is the Location/Building relationship different in #2? If not, how are these two hierarchies different?
Or is the issue that the facts are at different grains?
And what ETL tool are you using?
I'm trying to model a similar scenario where the facts are at different grains. Normally I would snow flake at this point but never really like doing that so I'm intrigued about this concept of using multiple surrogate keys. So each dimension row we are going to store 2 surrogate keys, one for each grain? I would prefer not to snow flake but I'm not exactly thrilled about having more than one surrogate key either, which way would you go?
robber- Posts : 41
Join date : 2009-02-28
Location : Canada
Re: Dimension with two hierarchies
What's wrong with having multiple surrogate keys in a dimension table representing different levels of the hierarchy? Say you geographic dimension starting multi-state regions drilling down to census block groups. You want to create an aggregate table at the county level. You would need a surrogate key for each county. Create a view from the geographic dimension selecting distinct rows from the County Surrogate Key on up.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» One dimension different hierarchies
» Hierarchies in the customer dimension.
» Purpose of Hierarchies in a Dimension
» How to store multiple hierarchies within a dimension
» Location Dimension(s) with multiple hierarchies
» Hierarchies in the customer dimension.
» Purpose of Hierarchies in a Dimension
» How to store multiple hierarchies within a dimension
» Location Dimension(s) with multiple hierarchies
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|