Source or Data Mart?
4 posters
Page 1 of 1
Source or Data Mart?
I've seen other discussions about using the Data Mart as a source to other systems but what about as a source to itself? I was taught that the general rule of thumb if you're building a new fact table then you should pull the data from the same source as the other tables, however there's a disagreement on our team that the Data Mart has the data this is just another way of looking at it so therefore, we should use the Data Mart's views to pull the data.
I can think of several reasons why it's a bad idea. When I pose this question to other data warehouse people outside our team they agree, yes it's a standard practice that you should pull from the identified sources and not from other Fact tables.
The major argument I'm getting is but this data has be able to reconcile with the data in the other Fact Tables. There are several ways of taking that and to me that sounds more like a data quality issue than anything.
Someone asked me if there was a specific source that states it's a standard practice to pull your data from the source instead of the Data Mart. I can't find any reference that states that but everyone seems to think it is.
So I'd like pose this question to others and see what their thoughts are.
Thank you
I can think of several reasons why it's a bad idea. When I pose this question to other data warehouse people outside our team they agree, yes it's a standard practice that you should pull from the identified sources and not from other Fact tables.
The major argument I'm getting is but this data has be able to reconcile with the data in the other Fact Tables. There are several ways of taking that and to me that sounds more like a data quality issue than anything.
Someone asked me if there was a specific source that states it's a standard practice to pull your data from the source instead of the Data Mart. I can't find any reference that states that but everyone seems to think it is.
So I'd like pose this question to others and see what their thoughts are.
Thank you
cajenner- Posts : 3
Join date : 2011-07-28
Re: Source or Data Mart?
I'm a bit confused by you use of the word 'source' and 'data mart'. Usually source refers to a data source outside the data warehouse, an operational system for example. Correct practice is to load sources into atomic level star schema (which can be referred to as data marts, but the term is very vague). The collection of star schema make up the dimensional data warehouse.
It is not unusual to feed like data from different sources into the same star schema (sales transactions, for example, from different sales systems). If you need to create aggregations (summaries or combinations of facts) these are usually created from the atomic fact tables, not re-extracted from the source system.
If this is new data being introduced from a source system, then, depending on the nature of the data, is either placed into a new star (or stars) or integrated into existing stars or both.
It is not unusual to feed like data from different sources into the same star schema (sales transactions, for example, from different sales systems). If you need to create aggregations (summaries or combinations of facts) these are usually created from the atomic fact tables, not re-extracted from the source system.
If this is new data being introduced from a source system, then, depending on the nature of the data, is either placed into a new star (or stars) or integrated into existing stars or both.
Re: Source or Data Mart?
In this case when I refer to source I mean a data source outside the dimensional data warehouse. The crux of our issue is that part of the team feels that the dimensional data warehouse has the data (in both fact and dimensions) scattered so we should populate this new fact with that data. This new fact is not an aggregate fact but at the same atomic level as the other facts but with new measures.
Some of us on the team are saying No, we should pull from the same outside data source we used to populate the Facts. Some are going with performance, meaning I can load the table with data without having to wait for the other Fact tables to load.
If I understood your response, it six to one half dozen, it's just a matter of preference.
Some of us on the team are saying No, we should pull from the same outside data source we used to populate the Facts. Some are going with performance, meaning I can load the table with data without having to wait for the other Fact tables to load.
If I understood your response, it six to one half dozen, it's just a matter of preference.
cajenner- Posts : 3
Join date : 2011-07-28
Re: Source or Data Mart?
I would have no problem using data from a fact table already in the Data Mart to populate another fact table. As already mentioned, this is commonly done when building aggregations. Why "bother" the source system again if you already extracted the required source data?
Because you are wanting to add an additional fact table to present the same data that is already in the Data Mart, did you review your dimensional model to see if there was a way to satisfy all the requirements with a single fact table?
Because you are wanting to add an additional fact table to present the same data that is already in the Data Mart, did you review your dimensional model to see if there was a way to satisfy all the requirements with a single fact table?
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Source or Data Mart?
I've been asking since the start of our project, are we building a report or a solution. To me since the data resides in the Data Mart we're just building a report so we should look at Aggregate views, however performance is a problem. The table where the majority of the data is located can take several minutes to return data.
cajenner- Posts : 3
Join date : 2011-07-28
Re: Source or Data Mart?
I am getting the sense that the data marts that were created were the result of narrowly scoped projects to address specific reporting requirements. From your desription of the new requirements I am guessing they are just additional measures from the same source the other mart was created from.
There is a fundimental conflict between building a data warehouse (i.e. not a bunch of marts) and a project oriented approach to development. Many 'failed' data warehouses were not really data warehouses at all, but rather a cobbled collection of report oriented marts that eventially collapse under the cost of maintenance. Going back to the source and building yet another mart like that is heading towards the same fate.
Build a proper star schema from the source that is driven by the data avilable in the source, not on short term report requirements. Use it to replace the existing mart.
The performance issues may be due to design, platform, physical implementation or all of the above. Building high performance, general use star schema can and is done all the time. Get outside help if you need to. Your organization will be better off in the long run.
There is a fundimental conflict between building a data warehouse (i.e. not a bunch of marts) and a project oriented approach to development. Many 'failed' data warehouses were not really data warehouses at all, but rather a cobbled collection of report oriented marts that eventially collapse under the cost of maintenance. Going back to the source and building yet another mart like that is heading towards the same fate.
Build a proper star schema from the source that is driven by the data avilable in the source, not on short term report requirements. Use it to replace the existing mart.
The performance issues may be due to design, platform, physical implementation or all of the above. Building high performance, general use star schema can and is done all the time. Get outside help if you need to. Your organization will be better off in the long run.
Re: Source or Data Mart?
Déjà vu! Looks like another case of bad practice as in the report table approach in another topic: http://forum.kimballgroup.com/t1263-reporting-table-data-repository-vs-dimensional-data-store .
My concern with extracting the data that already exists in DW is the repeated extracting logic in ETL. If ETL is complex and time dependent, maintenance is the major issue which cause the data inconsistency after bug fixing, code change and reloading. There is an important concept in dimensional modeling called Conformance. I would suggest to go back to the drawing board with Kimball's Bus Matrix for your DW and see how you can share your dimension attributes and use conformed dimensions to share common measures.
My concern with extracting the data that already exists in DW is the repeated extracting logic in ETL. If ETL is complex and time dependent, maintenance is the major issue which cause the data inconsistency after bug fixing, code change and reloading. There is an important concept in dimensional modeling called Conformance. I would suggest to go back to the drawing board with Kimball's Bus Matrix for your DW and see how you can share your dimension attributes and use conformed dimensions to share common measures.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Using the Dimensional Data Warehouse as source data for the OLTP process
» difference between data mart and data warehouse at logical/physical level
» Data Mart/Dimensional Data Store Definition.
» integrating otfer data into Data mart
» Data Mart Does Not Equal Data Warehouse
» difference between data mart and data warehouse at logical/physical level
» Data Mart/Dimensional Data Store Definition.
» integrating otfer data into Data mart
» Data Mart Does Not Equal Data Warehouse
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum