Enterprise Date Warehouse/Bus Architecture Physical Implementation
+3
caderoux
robber
anandraj
7 posters
Page 1 of 1
Enterprise Date Warehouse/Bus Architecture Physical Implementation
What is the best database design for EDW/Bus Architecture?
creating all the dimensions (confirmed) in one oracle schema/sql server database and
all the facts in another oracle schema/sql server database? OR
creating all dimensions and facts in one oracle schema/sql server database?
Please comment on these two different design approaches
Is there document or book which talks about EDW/Bus Architecture physical database implementation and best practices?
Thanks in advance.
creating all the dimensions (confirmed) in one oracle schema/sql server database and
all the facts in another oracle schema/sql server database? OR
creating all dimensions and facts in one oracle schema/sql server database?
Please comment on these two different design approaches
Is there document or book which talks about EDW/Bus Architecture physical database implementation and best practices?
Thanks in advance.
anandraj- Posts : 1
Join date : 2009-07-16
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
I would keep dims and facts in one schema, I would separate dims and facts and data and indexes into separate tablespaces(oracle) or filegroups(sql server). You can then manage the physical file placement (separate disks) to optimise performance.
robber- Posts : 41
Join date : 2009-02-28
Location : Canada
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
In addition, dims and facts really need to be within the same database (in SQL Server terminology) if you want to enforce your referential integrity on the dimension IDs which join the facts to the dimensions with a foreign-key contraint.
caderoux- Posts : 8
Join date : 2009-02-03
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
caderoux wrote:In addition, dims and facts really need to be within the same database (in SQL Server terminology) if you want to enforce your referential integrity on the dimension IDs which join the facts to the dimensions with a foreign-key contraint.
If you are assigning surrogate keys, there really isn't a need to enforce RI at the database level. Enforcing FK restraints is a waste of time.
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
ngalemmo wrote:If you are assigning surrogate keys, there really isn't a need to enforce RI at the database level. Enforcing FK restraints is a waste of time.
I believe this statement to be true, however, I cannot find documentation anywhere stating that this really is true. Can you please provide some type of documentation/references/internet links that say this? My problem is that I have some DBAs that are insisting that FKs exist and be enforced and in my limited experience that makes managing conformed dimensions and fact tables that sit on top of them a PITA.
Please, anyone, provide references to this quoted assertion.
smckee6452- Posts : 3
Join date : 2009-08-10
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
smckee6452 wrote:ngalemmo wrote:If you are assigning surrogate keys, there really isn't a need to enforce RI at the database level. Enforcing FK restraints is a waste of time.
I believe this statement to be true, however, I cannot find documentation anywhere stating that this really is true. Can you please provide some type of documentation/references/internet links that say this? My problem is that I have some DBAs that are insisting that FKs exist and be enforced and in my limited experience that makes managing conformed dimensions and fact tables that sit on top of them a PITA.
Please, anyone, provide references to this quoted assertion.
The impact is on the fact table loading process. I think it's more of a logical argument than something you'll find in a whitepaper. You've already used your ETL process to look up the surrogate keys and make sure you have a valid surrogate/business key relationship. If you have DRI between your fact table and your dimension tables you will slow down the loading process. With DRI, each row inserted into the fact table will need to do a lookup against each dimensional table. Why take that that performance hit twice (during ETL and then again during fact loading)? If you're loading a small number of facts each time, probably no big deal. With lots of fact rows to load and lots of dimensions to check, it can be a big deal.
Omaha- Posts : 6
Join date : 2009-08-07
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
Please bear with me as I'm not an ETL or BI expert...
For my situation, it's more than impact on load time. We have several conformed dimensions which are truncated and reloaded every night. We also have many fact tables that sit on these dimensions and we are creating new fact tables all the time. So my problem is from a design perspective (although, our load times are getting insane so this FK stuff may creep that way as well). Having enabled FK constraints on fact tables requires the packages that manage the conformed dimensions (loaded first in the ETL process) to know about and keep track of FKs that reside on fact tables that are keyed (surrogate) to these dimensions... That becomes hard to manage and gets very "spaghetti" very quickly.
I guess my problem is that my DBAs are still living in a transactional world, and this BI/ETL stuff is not that sort of beast to me. I could be way off base, I'm new to this stuff, but that's how it seems. I'm finding it very hard to convince them that enabled FK constraints are doing more harm than good: they are redundant, they decrease maintainability, they increase load time, etc. But they insist that FKs will exist and will be enabled until I can provide them evidence to the contrary... What they want is a BI/ETL best practices document that says "do not enable FK constraints and here's why" that usurps the generally held best practice in the transactional world of "make sure you have FKs and here's why".
Am I just missing something here?
For my situation, it's more than impact on load time. We have several conformed dimensions which are truncated and reloaded every night. We also have many fact tables that sit on these dimensions and we are creating new fact tables all the time. So my problem is from a design perspective (although, our load times are getting insane so this FK stuff may creep that way as well). Having enabled FK constraints on fact tables requires the packages that manage the conformed dimensions (loaded first in the ETL process) to know about and keep track of FKs that reside on fact tables that are keyed (surrogate) to these dimensions... That becomes hard to manage and gets very "spaghetti" very quickly.
I guess my problem is that my DBAs are still living in a transactional world, and this BI/ETL stuff is not that sort of beast to me. I could be way off base, I'm new to this stuff, but that's how it seems. I'm finding it very hard to convince them that enabled FK constraints are doing more harm than good: they are redundant, they decrease maintainability, they increase load time, etc. But they insist that FKs will exist and will be enabled until I can provide them evidence to the contrary... What they want is a BI/ETL best practices document that says "do not enable FK constraints and here's why" that usurps the generally held best practice in the transactional world of "make sure you have FKs and here's why".
Am I just missing something here?
Last edited by smckee6452 on Mon Aug 10, 2009 12:55 pm; edited 1 time in total (Reason for editing : typo)
smckee6452- Posts : 3
Join date : 2009-08-10
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
I've never used database enforced RI in any data warehouse I have built... and it has never been a problem. When a DBA balks, I sit down with him/her and explain how the ETL process works, how a surrogate key can only be assigned if there is a row in the dimension table, and how much overhead is reduced, particularly with large data loads.
I don't know if anyone has published a formal white paper (usually you see white papers from someone is trying to sell you something), but it has been a generally accepted practice for as long as I can remember. The key difference between a data warehouse and a transactional system is the update processes for a DW are controlled and structured, rather than random and uncontrolled. The rigor of the ETL process avoids the issues database RI is meant to prevent.
I don't know if anyone has published a formal white paper (usually you see white papers from someone is trying to sell you something), but it has been a generally accepted practice for as long as I can remember. The key difference between a data warehouse and a transactional system is the update processes for a DW are controlled and structured, rather than random and uncontrolled. The rigor of the ETL process avoids the issues database RI is meant to prevent.
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
smckee6452 wrote:
...
We have several conformed dimensions which are truncated and reloaded every night. We also have many fact tables that sit on these dimensions and we are creating new fact tables all the time.
...
How do you do that with DRI without deleting all the facts also? Are you completely reloading the DW every day? Seems pretty processing intensive.
Omaha- Posts : 6
Join date : 2009-08-07
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
Omaha wrote:smckee6452 wrote:
...
We have several conformed dimensions which are truncated and reloaded every night. We also have many fact tables that sit on these dimensions and we are creating new fact tables all the time.
...
How do you do that with DRI without deleting all the facts also? Are you completely reloading the DW every day? Seems pretty processing intensive.
Agreed... First off, you NEVER delete anything from a dimension table. It is critical to preserve the surrogate primary key so the facts reference something that exists. Second, you should be loading atomic level facts (Rule #1 in Kimball U's 10 essential rules - http://www.intelligententerprise.com/showArticle.jhtml?articleID=217700810), which implies incremental loads. RI is the least of your worries...
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
Omaha wrote:
How do you do that with DRI without deleting all the facts also? Are you completely reloading the DW every day? Seems pretty processing intensive.
In the cases where the dimensions are being truncated, any fact tables joined to the dimensions are also being truncated 99% of the time. This is because our source data may not have audit dates which can be trusted... You're right, it's processing intensive.
smckee6452- Posts : 3
Join date : 2009-08-10
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
smckee6452 wrote: I believe this statement to be true, however, I cannot find documentation anywhere stating that this really is true. Can you please provide some type of documentation/references/internet links that say this? My problem is that I have some DBAs that are insisting that FKs exist and be enforced and in my limited experience that makes managing conformed dimensions and fact tables that sit on top of them a PITA.
Please, anyone, provide references to this quoted assertion.
If you need something 'official' to give to the DBA's (and give me a chance to do a little self promotion - Heck, I get a $1 for every book sold), it is covered in pages 299-300 of "Mastering Data Warehouse Design - Relational and Dimensional Techniques" (Wiley). It goes into a little more depth (or maybe just more verbose) but essentially the same as my comments in the earlier posts.
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
$1?? That's a lot of work for $1 per book!
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Enterprise Date Warehouse/Bus Architecture Physical Implementation
BoxesAndLines wrote:$1?? That's a lot of work for $1 per book!
Tell me about it...
It's actually a little over $3, but it gets split 3 ways. From the royalties, I should be able to retire some time around 2298.
Similar topics
» Physical Implementation of Data Marts Using Conformed Dimensions
» Initial date of effective date column for SCD 2 implementation
» Adding a fact table with start date and end date to a (SSAS) multidimensional cube
» LA, L0, L1, L2 Data Warehouse Architecture
» Multi - tenant or SaaS Warehouse architecture
» Initial date of effective date column for SCD 2 implementation
» Adding a fact table with start date and end date to a (SSAS) multidimensional cube
» LA, L0, L1, L2 Data Warehouse Architecture
» Multi - tenant or SaaS Warehouse architecture
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum