Conformed DW on single or multiple SQL Server DB?
+3
carryonjutta
Joy
tembree
7 posters
Page 1 of 1
Conformed DW on single or multiple SQL Server DB?
We are moving towards an integrated Enterprise Data Warehouse database architecture approach, where we wish to implement conformed dimensions across multiple subject areas. This entails distinct Staging, Integration, and Presentation layers, in a new proposed EDW architecture roadmap.
We would like to get input, however, on the advisability of implementing this Presentation layer within a single physical database in a SQL Server environment, and whether there are practical limitations around doing so, and if so, what those are, and how to address these while still achieving our goals.
While the single Presentation database approach seems clearly to be preferable for several reasons, there may be some general considerations related to this that we’d like to think through, such as database size and maintenance windows, as well as how to isolate user access to subject areas that have been updated while still loading others, if these are within a single database environment. Have others come across any of these sorts of limitations (or others) within an enterprise data warehouse implemented on SQL Server that we should be aware of?
As a way to mitigate those possible limitations if they exist, we’re also interested in learning about implementing conformed dimensions across multiple databases to achieve our goals of an integrated enterprise data warehouse. How would that work from an RI and user standpoint?
Any input, specific SQL Server Data Warehouse considerations we should be aware of, or implementation options, particularly around how to make the conformed dimensions work as intended if we need to have multiple databases, are much appreciated. We are a State University and so with state budget situation, have no training or consulting budget.
We would like to get input, however, on the advisability of implementing this Presentation layer within a single physical database in a SQL Server environment, and whether there are practical limitations around doing so, and if so, what those are, and how to address these while still achieving our goals.
While the single Presentation database approach seems clearly to be preferable for several reasons, there may be some general considerations related to this that we’d like to think through, such as database size and maintenance windows, as well as how to isolate user access to subject areas that have been updated while still loading others, if these are within a single database environment. Have others come across any of these sorts of limitations (or others) within an enterprise data warehouse implemented on SQL Server that we should be aware of?
As a way to mitigate those possible limitations if they exist, we’re also interested in learning about implementing conformed dimensions across multiple databases to achieve our goals of an integrated enterprise data warehouse. How would that work from an RI and user standpoint?
Any input, specific SQL Server Data Warehouse considerations we should be aware of, or implementation options, particularly around how to make the conformed dimensions work as intended if we need to have multiple databases, are much appreciated. We are a State University and so with state budget situation, have no training or consulting budget.
tembree- Posts : 1
Join date : 2009-03-30
Re: Conformed DW on single or multiple SQL Server DB?
The most common solution is to have a monolithic presentation database: a single database that contains the enterprise data warehouse, all conformed dimensions, and the fact tables that hook into them. This is true for SQL Server as for other RDBMSs. It's the most popular architecture because it's (usually) simplest, and because it enables easy connection between subect areas.
It sounds like you may be concerned about scalability. I have seen quite a few terabyte+ databases implemented on SQL Server, which in my experience should meet the needs of a university DW. I have seen complex, multi-subject-areas DWs, and I've seen DWs with many users doing all sorts of crazy things. Combine those different dimensions of "scale," and complexity increases substantially. In other words, if you're trying to solve a hard, large-scale problem, don't expect to be able to get there simply by reading Books Online (ha!). But it is possible to get there on SQL Server.
That said, there's nothing about the Kimball Method that requires a monolithic data warehose database. It is quite feasible to "chunk out" pieces of the DW to multiple servers. Of course, that's quite common with very large systems. One common approach is to identify one server as the Dimension Manager; the final step of dimension processing is to replicate the dimension wherever it's used (using replication or simply copying). With most dimension tables, this task performs fast enough (especially if you do not enforce RI between facts and dimensions in the database).
In the SQL Server world, it's also common to use Analysis Services as the presentation database. Analysis Services does make it easy to do some of the things you mentioned, such as allowing users to query a consistent copy of the database during processing.
It sounds like you may be concerned about scalability. I have seen quite a few terabyte+ databases implemented on SQL Server, which in my experience should meet the needs of a university DW. I have seen complex, multi-subject-areas DWs, and I've seen DWs with many users doing all sorts of crazy things. Combine those different dimensions of "scale," and complexity increases substantially. In other words, if you're trying to solve a hard, large-scale problem, don't expect to be able to get there simply by reading Books Online (ha!). But it is possible to get there on SQL Server.
That said, there's nothing about the Kimball Method that requires a monolithic data warehose database. It is quite feasible to "chunk out" pieces of the DW to multiple servers. Of course, that's quite common with very large systems. One common approach is to identify one server as the Dimension Manager; the final step of dimension processing is to replicate the dimension wherever it's used (using replication or simply copying). With most dimension tables, this task performs fast enough (especially if you do not enforce RI between facts and dimensions in the database).
In the SQL Server world, it's also common to use Analysis Services as the presentation database. Analysis Services does make it easy to do some of the things you mentioned, such as allowing users to query a consistent copy of the database during processing.
Re: Conformed DW on single or multiple SQL Server DB?
As a way to mitigate those possible limitations if they exist, we’re also interested in learning about implementing conformed dimensions across multiple databases to achieve our goals of an integrated enterprise data warehouse. How would that work from an RI and user standpoint?
carryonjutta- Posts : 1
Join date : 2015-01-05
Re: Conformed DW on single or multiple SQL Server DB?
Given that any option other than implementing a single DB is going to add considerably complexity/effort to your solution both in terms of build and maintenance, I'm wondering why you are spending time thinking about it when you don't know if you are going to hit any potential limitations?carryonjutta wrote:As a way to mitigate those possible limitations if they exist..
Have you sized your DW and do you think you will hit the published limitations of SQL Server? If so, is this for specific tables or the overall DB?
To answer your specific questions:
- RI: there should be no impact as your DW as either SQL Server supports RI across DBs or it doesn't. If it doesn't then splitting your DW across DBs is probably not a solution
- Users: how you physically implement and maintain a DB should be transparent to your users. I would suggest that if your solution is not transparent to your users then it is probably not an appropriate solution
Regards,
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Conformed DW on single or multiple SQL Server DB?
If you go the multiple DB server route, it basically boils down replicating dimension tables on the different servers. You want to avoid having to join across servers at all costs. But, as Nick pointed out, why worry about it?
Re: Conformed DW on single or multiple SQL Server DB?
In my experience the most difficult part of trying to accomplish an EDW is not so much a technology as it is a people problem. I believe your real challenge is going to get people agreeing on what the conformed dimensions are going to be along with their attributes. While their are strategies to handle attributes that pertain to different business units, it will still pose a challenge up front getting all the business units to decide on the list of leaf level members. EDW from a technology perspective is no problem, it's getting stakeholders to agree that is the big problem. Then again you're a University, so maybe you will be spared of the goal hindering politics that stand in the way of an EDW.
jdobrzen- Posts : 2
Join date : 2015-01-27
Re: Conformed DW on single or multiple SQL Server DB?
I used to sell SQL Server for Microsoft, then its MPP version known at that time as Parallel Data Warehouse (PDW).
SQL Server will comfortably handle data warehouses of 2-3 terabytes without any special administrative action. Once you go over that number you start needing the attention of a DBA, with more performance management required as the database grows.
The biggest SQL Server databases are now over 100Tb. These are exceptions, but I've had direct contact with satisfied customers up to 30Tb.
There is no issue that would prevent you running multiple, independent fact tables against conformed dimensions, all in the one database.
Neither should there be any issue running queries across tables in multiple databases, as long as they were in the same DBMS instance.
For simplicity, I think I'd prefer to have my DW in the one database unless there was a compelling reason to do otherwise.
SQL Server will comfortably handle data warehouses of 2-3 terabytes without any special administrative action. Once you go over that number you start needing the attention of a DBA, with more performance management required as the database grows.
The biggest SQL Server databases are now over 100Tb. These are exceptions, but I've had direct contact with satisfied customers up to 30Tb.
There is no issue that would prevent you running multiple, independent fact tables against conformed dimensions, all in the one database.
Neither should there be any issue running queries across tables in multiple databases, as long as they were in the same DBMS instance.
For simplicity, I think I'd prefer to have my DW in the one database unless there was a compelling reason to do otherwise.
Last edited by ron.dunn on Wed Jan 28, 2015 4:42 am; edited 1 time in total (Reason for editing : Added comment about multiple databases)
Similar topics
» Single or Multiple Fact : Single or Multiple Dimension
» multiple hierarchy : single dimension vs multiple
» Role Playing vs single generic Conformed (for drill-across) time dimension
» What to do when a single attribute rolls up to multiple categories?
» Multiple dimensions Vs. Single dimension and hierarchy
» multiple hierarchy : single dimension vs multiple
» Role Playing vs single generic Conformed (for drill-across) time dimension
» What to do when a single attribute rolls up to multiple categories?
» Multiple dimensions Vs. Single dimension and hierarchy
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum