Data Marts Built Upon Data Marts
5 posters
Page 1 of 1
Data Marts Built Upon Data Marts
This is a theoretical question as opposed to a specific, real example.
I often hear cautions that building a Data Mart upon a Data Mart is a bad idea. For the purpose of this discussion, let's assume the first Data Mart is derived from an underlying Data Warehouse.
What are the cons to sourcing another Data Mart from the first one? What kind of best practices does this conflict with? Are there any situations where this could be considered an acceptable solution?
Thanks for any guidance you can provide.
I often hear cautions that building a Data Mart upon a Data Mart is a bad idea. For the purpose of this discussion, let's assume the first Data Mart is derived from an underlying Data Warehouse.
What are the cons to sourcing another Data Mart from the first one? What kind of best practices does this conflict with? Are there any situations where this could be considered an acceptable solution?
Thanks for any guidance you can provide.
trbarnes182- Posts : 1
Join date : 2014-03-28
Re: Data Marts Built Upon Data Marts
The main "con" is time. You have to wait for the first ETL cycle to complete before you can start the second cycle. If the first cycle crashes, everything dependent on it will not load. The biggest boost in ETL processing is setting up the workflows to process data in parallel.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Data Marts Built Upon Data Marts
Assuming you are following the Kimball methodology and have an Enterprise Data warehouse then a Data Mart is just a logical collection of tables grouped into a subject area; a data mart has no physical representation - the fact tables could exist in multiple data marts and the dimension tables almost certainly will (as they are conformed dimensions).
Therefore the idea of building a Data Mart on top of another Data Mart makes no sense (to me). You can build dimensions from existing dimensions (derive a month dimension from a date dimension) and facts from facts (create aggregates or join data in multiple facts by their conformed dimensions to create a single new fact table).
There is nothing wrong with doing this and it is often the best approach. For example, if you have already built a "quotes" fact table and a "sales" fact table and then your users ask for a report that needs both quote and sales data but is not easily satisfied by querying across both tables ("show me the average time between a quote being created and a sale being closed") then you might want to build a quote-sales fact table. It's much easier to do this by using the two fact tables as your source (assuming that between them they hold all the information you need) rather than going back to your staging layer, or even your source systems, to get the data you need.
Hope this helps?
Therefore the idea of building a Data Mart on top of another Data Mart makes no sense (to me). You can build dimensions from existing dimensions (derive a month dimension from a date dimension) and facts from facts (create aggregates or join data in multiple facts by their conformed dimensions to create a single new fact table).
There is nothing wrong with doing this and it is often the best approach. For example, if you have already built a "quotes" fact table and a "sales" fact table and then your users ask for a report that needs both quote and sales data but is not easily satisfied by querying across both tables ("show me the average time between a quote being created and a sale being closed") then you might want to build a quote-sales fact table. It's much easier to do this by using the two fact tables as your source (assuming that between them they hold all the information you need) rather than going back to your staging layer, or even your source systems, to get the data you need.
Hope this helps?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Data Marts Built Upon Data Marts
There is a big problem with the term 'Data Mart' as it has so many different interpretations.
In a dimensional model, the term is used to refer to a single star schema: one fact table and its dimensions. Collectively these are the data warehouse.
In a properly implemented dimensional data warehouse the fact tables should be atomic, at the lowest level of detail available. For performance reasons it may be necessary to summarize or combine (aggregate) facts, as such you are creating a data mart from another data mart or marts.
As B&L points out, you don't want to go too far with it as the dependencies and timing may introduce operational issues.
In a dimensional model, the term is used to refer to a single star schema: one fact table and its dimensions. Collectively these are the data warehouse.
In a properly implemented dimensional data warehouse the fact tables should be atomic, at the lowest level of detail available. For performance reasons it may be necessary to summarize or combine (aggregate) facts, as such you are creating a data mart from another data mart or marts.
As B&L points out, you don't want to go too far with it as the dependencies and timing may introduce operational issues.
Re: Data Marts Built Upon Data Marts
But going with the concept of a Mart being a single Fact and Dimensions, then conceptually an aggregated fact table would be a data mart that was built off of other data marts.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Data Marts Built Upon Data Marts
Jeff Smith wrote:But going with the concept of a Mart being a single Fact and Dimensions, then conceptually an aggregated fact table would be a data mart that was built off of other data marts.
Yes. There is nothing wrong with that. The issue is the operational complexities if you go too far. There is also a matter of diminishing returns. Would you gain very much if you built an aggregate of an aggregate, for example.
Re: Data Marts Built Upon Data Marts
trbarnes182 wrote:For the purpose of this discussion, let's assume the first Data Mart is derived from an underlying Data Warehouse.
This comment leads me to think this is more an Inmon style implementation. Building dimensional marts from a normalized DW is not a trivial task (not really difficult either, but not as simple as when the DW is dimensional to begin with).
Assuming there has been some investment in time and effort to work out what went into the mart and work out the transformations necessary to convert it to a dimensional model and apply whatever business rules that pertain to it, it can make sense to leverage this previous work to build a further aggregation of this data.
Wither this is the correct thing to do depends on why it is being done. If it is solely to provide a higher level of summarization, does it result in a significant improvement in performance for a substantial number of queries?
Other reasons to do this may be to localize a subset of data for a particular audience, to distribute query loads, for security, or to obfuscate or remove sensitive data that certain users should not see.
There are no hard rules on this. It is a matter of balancing a need versus operational impact and alternate methods of accomplishing the result.
Similar topics
» Data Marts, Conformed dimensions and Data Warehouse
» Designing data marts from an EAV data source
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
» Modelling Problem of Data Marts
» cross process(data marts) ratios
» Designing data marts from an EAV data source
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
» Modelling Problem of Data Marts
» cross process(data marts) ratios
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum