Data Warehouse spread across 4 separate databases
4 posters
Page 1 of 1
Data Warehouse spread across 4 separate databases
Hello,
I started with a new BI team. Our data warehouse was developed by consultants. I find the design very strange, but they insist that it uses BI best practices. I can't find any best practices supporting this set-up. Can anyone help me out?
Our design is as follows (Note we are using SQL Server 2008):
We have 3 different databases containing our fact tables (these 3 databases looks exactly the same - just the amount of data stored in the fact tables differ). None of these databases contains any dimensions.
The strategic database contains all information, tactical just for the current month and tactical month-end just for the previous month.
Then we have a fourth database containing our dimensions and again all the fact tables the others contain (except that this database contains only today's information).
On a daily basis the dimensions and fact tables in the fourth database gets populated - this information is then used to populate the fact tables in the other 3 databases. Then cubes used for reporting gets build based on all 4 of these databases.
My big concerns regarding this design are:
-The same fact information is available in more than one place, which complicates balancing.
-Duplicating the same information chows up our space.
-The ETL processes needs to load the same data more than once.
-The dimensions and fact tables sit in different databases - any query you do will therefore always straddle 2 databases.
-You have 4 versions of the exact same cube (just with different amounts of data in it). Our users then needs to be decide which cube is the correct cube for the query they want to do.
Thanks,
Maretha
I started with a new BI team. Our data warehouse was developed by consultants. I find the design very strange, but they insist that it uses BI best practices. I can't find any best practices supporting this set-up. Can anyone help me out?
Our design is as follows (Note we are using SQL Server 2008):
We have 3 different databases containing our fact tables (these 3 databases looks exactly the same - just the amount of data stored in the fact tables differ). None of these databases contains any dimensions.
The strategic database contains all information, tactical just for the current month and tactical month-end just for the previous month.
Then we have a fourth database containing our dimensions and again all the fact tables the others contain (except that this database contains only today's information).
On a daily basis the dimensions and fact tables in the fourth database gets populated - this information is then used to populate the fact tables in the other 3 databases. Then cubes used for reporting gets build based on all 4 of these databases.
My big concerns regarding this design are:
-The same fact information is available in more than one place, which complicates balancing.
-Duplicating the same information chows up our space.
-The ETL processes needs to load the same data more than once.
-The dimensions and fact tables sit in different databases - any query you do will therefore always straddle 2 databases.
-You have 4 versions of the exact same cube (just with different amounts of data in it). Our users then needs to be decide which cube is the correct cube for the query they want to do.
Thanks,
Maretha
Maretha- Posts : 2
Join date : 2011-01-13
Re: Data Warehouse spread across 4 separate databases
If these 'databases' are all on the same physical server (i.e. just different schema), then it is not much of an issue from a performance standpoint... it is just a logical separation. It is not something I would normally do, and it isn't clear why it was done.
As far as the cubes... are they truly 'identical', or do they vary slightly in terms of which dimensions they provide? Cubes have capacity issues and, depending on the measures (i.e. semi-additive) may have conflicts beween certain combinations of dimensions. Also, is the content mutually exclusive? They could have broken things out for performance reasons or maybe long lost business requirments (or a misinterpretation of those requirements).
Anyway, its hard to judge wither what was done makes sense without proper review and analysis.
As far as the cubes... are they truly 'identical', or do they vary slightly in terms of which dimensions they provide? Cubes have capacity issues and, depending on the measures (i.e. semi-additive) may have conflicts beween certain combinations of dimensions. Also, is the content mutually exclusive? They could have broken things out for performance reasons or maybe long lost business requirments (or a misinterpretation of those requirements).
Anyway, its hard to judge wither what was done makes sense without proper review and analysis.
Re: Data Warehouse spread across 4 separate databases
You see this design in SQL Server since partitioning tables is an arduous process. Instead of partitioning the fact table correctly, a new fact table is built to support current day. If that's the case, you're right and they're wrong.
Other reasons I've seen to build this structure is to partition data from different operating companies. That doesn't sound like the case here. You are right to be suspicious.
Other reasons I've seen to build this structure is to partition data from different operating companies. That doesn't sound like the case here. You are right to be suspicious.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Data Warehouse spread across 4 separate databases
Hello,
Thank you for your feedback. It seems that at least I'm not the only one who hasn’t heard of this as a BI best practice. To answer some of your questions.
The different databases are on the same server, but they are different databases entirely. I think the same would be achievable, by putting the different tables in the same database, but just in different user schemas or with different names. I'm not sure about SQL server functionality across different databases - would star schema optimization work when the fact tables and dimensions aren't in the same database?
The cubes are indeed identical - it uses the same conformed dimensions. As far as the data goes only the data coming from the fact table is different between the different cubes - The Tactical Cube contains current month information, The Tactical Month End Cube contains previous month information and the Strategic cube contains all information (i.e. including the current and previous month information).
We got new consultants - so the old ones aren't around to question anymore, I believe the main reason they sited was performance. But the new consultants did apply partitioning on the cube side which wasn't implemented before. On the database side there were no indexes and no partitioning, but on the other hand there are also no reports feeding directly from the database. The new team improved our cube build performance to an acceptable level by doing the cube partitioning and adding a few indexes on the database.
Thank you for your feedback. It seems that at least I'm not the only one who hasn’t heard of this as a BI best practice. To answer some of your questions.
The different databases are on the same server, but they are different databases entirely. I think the same would be achievable, by putting the different tables in the same database, but just in different user schemas or with different names. I'm not sure about SQL server functionality across different databases - would star schema optimization work when the fact tables and dimensions aren't in the same database?
The cubes are indeed identical - it uses the same conformed dimensions. As far as the data goes only the data coming from the fact table is different between the different cubes - The Tactical Cube contains current month information, The Tactical Month End Cube contains previous month information and the Strategic cube contains all information (i.e. including the current and previous month information).
We got new consultants - so the old ones aren't around to question anymore, I believe the main reason they sited was performance. But the new consultants did apply partitioning on the cube side which wasn't implemented before. On the database side there were no indexes and no partitioning, but on the other hand there are also no reports feeding directly from the database. The new team improved our cube build performance to an acceptable level by doing the cube partitioning and adding a few indexes on the database.
Maretha- Posts : 2
Join date : 2011-01-13
Re: Data Warehouse spread across 4 separate databases
I think there is a benefit of partitioning database tables that isn't addressed very well for SQL Server 2008. The discussions around partitioning focuses on performance, but I think they come in handy if data ever needs to be retired. It's much, much easier to remove or move data from a table by partition than one that isn't. If the table is partitioned by year and you want to remove or move 2004 data from the table, it's a snap with a partition. Without partitions, you have to rename the table, clone the design, load the new table from the old table excluding the data that you want to retire. It's a mess.
After thinking about it, I can see having different databases on the same instance to facilitate maintenance. It's easier to backup and restore smaller databases getting loaded on different schedules. But I think it can also be done with by putting the data loaded on different schedules into different file groups.
A problem with having different dimension tables on 1 database and fact tables in a different database is that it makes it harder to set up some of the query tools. To make it easier for the query tools, you need to create views or synonyms of the dimension tables in the databases with the fact tables.
After thinking about it, I can see having different databases on the same instance to facilitate maintenance. It's easier to backup and restore smaller databases getting loaded on different schedules. But I think it can also be done with by putting the data loaded on different schedules into different file groups.
A problem with having different dimension tables on 1 database and fact tables in a different database is that it makes it harder to set up some of the query tools. To make it easier for the query tools, you need to create views or synonyms of the dimension tables in the databases with the fact tables.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» Data Stores on Separate Databases within DW.
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum