DW Architecture for a new setup
5 posters
Page 1 of 1
DW Architecture for a new setup
Ok I'm a newby to DW. I am tasked with setting up a data warehouse this year. There are a couple of things which I feel make it unique, I'll outline the plan I have come up with to accomplish this as of right now.
We have 2 OLTP systems, one was from a company we acquired and is denormalized and uninterpretable for the most part... a cobol application runs off it. So they do strange things like storing 4 entries in one record, the result is having to loop through each record. There are no date fields or number fields... it is pretty much all string based. I found a product which 'decodes' (normalizes) the DB. The normalized version will be on a separate server and essentially is replicated from the production server.
The other system we have is very easy to get information out of. I am very familiar with its structure. So I am not worried about that.
Both systems are maintaining sales data, this is what we initially want to analyze. They both have the same items, however they are not the same Item Names or even descriptions. I plan to make a 'master style' listing and maintain this table so it will allow for "merging" of the two systems' sales/inventory data.
Non-Normalized System -> Normalized Version -> Merged Data (DW)
Normalized System -> Replicated version -> Merged Data (DW)
This will start out small, initial cubes will be for analyzing sales and inventory information, so we can have one view of both systems.
So some of the questions I have are related to the actual architecture of the database servers. I will have essentially 4 DB servers. 2 are the OLTP production. Those 2 will replicate to 1 DB server, which will be used to load the final DB server. I am doing this because i don't want to affect performance on the OLTP databases, is this a wise choice? I'm trying to avoid any locking on the production servers...
We have 2 OLTP systems, one was from a company we acquired and is denormalized and uninterpretable for the most part... a cobol application runs off it. So they do strange things like storing 4 entries in one record, the result is having to loop through each record. There are no date fields or number fields... it is pretty much all string based. I found a product which 'decodes' (normalizes) the DB. The normalized version will be on a separate server and essentially is replicated from the production server.
The other system we have is very easy to get information out of. I am very familiar with its structure. So I am not worried about that.
Both systems are maintaining sales data, this is what we initially want to analyze. They both have the same items, however they are not the same Item Names or even descriptions. I plan to make a 'master style' listing and maintain this table so it will allow for "merging" of the two systems' sales/inventory data.
Non-Normalized System -> Normalized Version -> Merged Data (DW)
Normalized System -> Replicated version -> Merged Data (DW)
This will start out small, initial cubes will be for analyzing sales and inventory information, so we can have one view of both systems.
So some of the questions I have are related to the actual architecture of the database servers. I will have essentially 4 DB servers. 2 are the OLTP production. Those 2 will replicate to 1 DB server, which will be used to load the final DB server. I am doing this because i don't want to affect performance on the OLTP databases, is this a wise choice? I'm trying to avoid any locking on the production servers...
esh- Posts : 4
Join date : 2009-02-03
Re: DW Architecture for a new setup
I'm not familiar with all the details but here's what I'd do (at a high level):
To address the disparate source systems I'd create level 1 and level 2 fact tables. Source system A and source system B would both have their own fact tables (living at level 1). I'd then consolidate these up into a single table that contains all the ubiquitous fields between both level 1 tables (assuming each source system's fact data is not structured identically). Your cubes (or summary tables) could aggregate up from your level 2 fact tables.
The item disparity can be handled in a mapping table during the ETL process (which I think is what you're saying here).
If you can batch extract from your source systems I'd do that after hours (for minimal impact) and then dump the data into a staging area. Your lowest level fact tables can be populated from there. Your dimensions can be consolidated and conformed in the staging area and then loaded into your warehouse dimension tables from there as well. It looks like you already have a server at your disposal for this. A method for determining net change during this extraction is crucial if your tables are large.
So to answer your final question, I think your idea to hit your production systems only once per extraction period is a good one.
Hopefully this helps. If not, shoot me some more details over and we can review it again.
To address the disparate source systems I'd create level 1 and level 2 fact tables. Source system A and source system B would both have their own fact tables (living at level 1). I'd then consolidate these up into a single table that contains all the ubiquitous fields between both level 1 tables (assuming each source system's fact data is not structured identically). Your cubes (or summary tables) could aggregate up from your level 2 fact tables.
The item disparity can be handled in a mapping table during the ETL process (which I think is what you're saying here).
If you can batch extract from your source systems I'd do that after hours (for minimal impact) and then dump the data into a staging area. Your lowest level fact tables can be populated from there. Your dimensions can be consolidated and conformed in the staging area and then loaded into your warehouse dimension tables from there as well. It looks like you already have a server at your disposal for this. A method for determining net change during this extraction is crucial if your tables are large.
So to answer your final question, I think your idea to hit your production systems only once per extraction period is a good one.
Hopefully this helps. If not, shoot me some more details over and we can review it again.
BrianJarrett- Posts : 61
Join date : 2009-02-03
Age : 50
Location : St. Louis, MO
Data Profile!
Hi esh,
Personally, I think your approach is pretty good. You have an integration/staging server and a server dedicated for the data warehouse.
Before you get much further though, I think you can help yourself out a ton by doing the following:
(1) Perform a thorough data profiling of both systems. You'll spend more time on the "Non-Normalized System".
(2) From your data profiling, conform all data attributes and metrics. This means that you should jot down any datatype conversions, necessary decodings, numeric conversions, or mappings.
(3) Extract the raw data into your staging server and perform all necessary transformations to conform the data.
(4) From this conformed set of mostly relational data, load your dimensional model.
Let me know if you'd like some more details. That's where the devil lives!
-Tod
Personally, I think your approach is pretty good. You have an integration/staging server and a server dedicated for the data warehouse.
Before you get much further though, I think you can help yourself out a ton by doing the following:
(1) Perform a thorough data profiling of both systems. You'll spend more time on the "Non-Normalized System".
(2) From your data profiling, conform all data attributes and metrics. This means that you should jot down any datatype conversions, necessary decodings, numeric conversions, or mappings.
(3) Extract the raw data into your staging server and perform all necessary transformations to conform the data.
(4) From this conformed set of mostly relational data, load your dimensional model.
Let me know if you'd like some more details. That's where the devil lives!
-Tod
Re: DW Architecture for a new setup
The architecture will depend on Performance SLA of the source systems. If it is a Global system, then batch processing at any time will affect someone. On the other hand if there is a quiet period, you could use it to perform extracts to a Staging area. Note that as your Data warehouse grows, and more subject areas are captured, the available time window for data extraction will shrink. You may want to explore the option of creating an ODS, which will be composed of replicated tables and batch populated tables.esh wrote:So some of the questions I have are related to the actual architecture of the database servers. I will have essentially 4 DB servers. 2 are the OLTP production. Those 2 will replicate to 1 DB server, which will be used to load the final DB server. I am doing this because i don't want to affect performance on the OLTP databases, is this a wise choice? I'm trying to avoid any locking on the production servers...
Edwin
...who thinks every company is unique, and their solutions are unique.
Re: DW Architecture for a new setup
Thanks for all the info/help!
- There is a short window of only about 4 hours every night, however it shortens during holiday (retail) to only about 1 hour. So i really think we will have to do the staging server to pull data from. I will probably utilize the staging server as an adhoc analysis server as well, so it can serve 2 purposes, staging for the DW and Ad-hoc for all other types of reports people always want that may not be in the DW necessarily. Will look into the ODS concept as well.
- What good data profiling tools are available? These are mostly all SQL 2005 servers except for one which is 2000, working on converting it to 2005.
- Tod - 2,3,4 also seem to make good sense, will add this to my documentation for the project.
- Brian, I'll have to read more about the level 1 & 2 fact table differences. Not too familiar with the differences between the two yet.
- Brian, yes the mapping would occur during the ETL process. (that's my plan at least).
esh- Posts : 4
Join date : 2009-02-03
Re: DW Architecture for a new setup
also the reporting/analysis tool I have been testing out is LogiXML's suite. Not sure if anyone is familiar with it? I tried some of the open source ones, and looked at 2 other commercial ones. This one seemed to me very easy to use and get into. The others I tried definitely were not as user friendly... and well that is important as I have many other things to do here (I do pretty much everything, dba/network eng/vmware admin/support & maintain multiple OLTP systems, as well as regular mgmt duties.). So I definitely want tools that are easy to use, so if anyone has any more suggestions please let me know.
esh- Posts : 4
Join date : 2009-02-03
Re: DW Architecture for a new setup
"I have many other things to do here (I do pretty much everything, dba/network eng/vmware admin/support & maintain multiple OLTP systems, as well as regular mgmt duties.). "
And you are on the hook to build a data warehouse??? Dude... get some help! Last time I checked, a day is only 24 hours and they are not getting any longer.
And you are on the hook to build a data warehouse??? Dude... get some help! Last time I checked, a day is only 24 hours and they are not getting any longer.
Re: DW Architecture for a new setup
Budgets are tight. Luckily one of the products I purchased is almost out of the box in terms of being built specifically for that application and taking the data into a DW and also tying it in with a hyperion reporting solution. That project is 90% complete now, I will soon start working on merging that data with our other systems into a central DW to get the entire company wide data combined.ngalemmo wrote:
And you are on the hook to build a data warehouse??? Dude... get some help! Last time I checked, a day is only 24 hours and they are not getting any longer.
I'm hoping next year to hire a full time DBA to offload a lot of the db activities.
esh- Posts : 4
Join date : 2009-02-03
Similar topics
» 4 tier architecture and BI
» ETL Architecture
» SLA: Architecture
» Hybrid Architecture
» ODS Data Architecture
» ETL Architecture
» SLA: Architecture
» Hybrid Architecture
» ODS Data Architecture
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum