very large/wide fact table considerations?
4 posters
Page 1 of 1
very large/wide fact table considerations?
I am involved with a scenario where a customer is considering the idea of a very very large, wide, and generic fact table that will house transactional activity of multiple financial subledger source systems. The BI requirements for this customer to date have revolved around very simple transactional grain reporting, with approx 3-4 dimensions and simple summary to detail transaction reports. In fact, there is limited slicing and dicing and analytics on the data. To add to this, the data volumes here are quite high, each subledger system can generate approx 20+ million fact records a DAY. The idea for this consolidated fact table is to have a generic fact that has "transaction amount", 3-4 actual FK fields to join to the common dimensions (mainly Day, Account, etc) and a slew of "placeholders fields" such as VARCHAR1..VARCHAR50, Nunber1..Number50, etc. All in all, this table will have 250+ fields and since they will have facts from multiple subledger systems, many fields will be NULL so it will be sparsely populated. Additionally, to accommodate the volume, the DW Oracle table may be partitioned/subpartitioned vertically (via RANGE, etc) and also horizontally (sharding) via country, etc. In essense, this business scenario deviates from Kimball best practices but the reasoning is that the business does not need any advanced analysis of data aside from transactional reporting.
My thoughts on this are primarily the issue of sparsity of data in such a wide table and also the bad practice of putting multiple business processes (even though they may have a generic "grain") into one fact table. The goal of the customer is to simplify the DW build by having one confirmed target fact table. From a reporting standpoint it is expected that the business users will only query in a pre determined # of ways (e.g transactions by country and account #..etc). Since the table is sharded, it can essentially scale out regardless of the billions and billions of fact records.
My questions are as follows:
1. Aside from the above listed concerns, what other DW specific considerations would apply to this idea? Could it theoretically be sustainable if the business does, indeed, not need any analysis capability for each business process that what is already knows?
2. Would the idea that there is a generic grain fact across all subledgers be sufficient enough to warrant the idea of a large table like this (versus a separate table for each fact).
3. Can an oracle table that is partitioned/sharded that is this large (suppose 200+ million records/day) even have a volume concern for the DBAs or does the partition/sharding really allow for unlimited growth?
4. What are the performance implications of such a large wide, sparsely populated fact table? Suppose it is determined that the end users will only selectively query per country and subledger account, etc and will not "slice and dice" any other way..does this justify the approach.
Thoughts are appreciated.
My thoughts on this are primarily the issue of sparsity of data in such a wide table and also the bad practice of putting multiple business processes (even though they may have a generic "grain") into one fact table. The goal of the customer is to simplify the DW build by having one confirmed target fact table. From a reporting standpoint it is expected that the business users will only query in a pre determined # of ways (e.g transactions by country and account #..etc). Since the table is sharded, it can essentially scale out regardless of the billions and billions of fact records.
My questions are as follows:
1. Aside from the above listed concerns, what other DW specific considerations would apply to this idea? Could it theoretically be sustainable if the business does, indeed, not need any analysis capability for each business process that what is already knows?
2. Would the idea that there is a generic grain fact across all subledgers be sufficient enough to warrant the idea of a large table like this (versus a separate table for each fact).
3. Can an oracle table that is partitioned/sharded that is this large (suppose 200+ million records/day) even have a volume concern for the DBAs or does the partition/sharding really allow for unlimited growth?
4. What are the performance implications of such a large wide, sparsely populated fact table? Suppose it is determined that the end users will only selectively query per country and subledger account, etc and will not "slice and dice" any other way..does this justify the approach.
Thoughts are appreciated.
obiapps- Posts : 21
Join date : 2010-09-28
Re: very large/wide fact table considerations?
It certainly doesn't sound like a dimensional model. Generic columns, textual data in the fact table, 250+ columns in the fact table, these are all red flags. Throw enough hardware at it and maybe it will work if you never go across partitions.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: very large/wide fact table considerations?
Sounds like somebody wants to create the mother of all spreadsheets.
Run away. Whoever is behind this hasn't a clue.
Run away. Whoever is behind this hasn't a clue.
Re: very large/wide fact table considerations?
I agree with both of you. Sadly, as much as I would like to run away, I need to advise the client. Aside from the items I mentioned in the post, are there any other specific reasons this is a horrible idea? Any details on why this is a bad idea given the limited business analysis needed would be great.
obiapps- Posts : 21
Join date : 2010-09-28
Re: very large/wide fact table considerations?
I totally agree with the other posters: RUN!!
You mentioned "The goal of the customer is to simplify the DW build". I think this design would achieve the opposite. Every task required by the DW builder, testers and report authors will require constant reference to some master document which tells them what each row and column combination actually means - it seems the proposed database schema will be essentially meaningless.
This runs contrary to the way those people will have been trained and all their prior experience, and contrary to the way all their tools are designed and intended to be used.
Good luck!
You mentioned "The goal of the customer is to simplify the DW build". I think this design would achieve the opposite. Every task required by the DW builder, testers and report authors will require constant reference to some master document which tells them what each row and column combination actually means - it seems the proposed database schema will be essentially meaningless.
This runs contrary to the way those people will have been trained and all their prior experience, and contrary to the way all their tools are designed and intended to be used.
Good luck!
Re: very large/wide fact table considerations?
No one has been successful implementing a fact table as you propose.
From what I can gather from your description, it is the mythical 'big flat file', the perfect solution to anybody's data needs. It has everything spread out in one big table. And we the business will have complete control of what anything means.
This is not a bad design.
This is a cry for help.
I am going to go out on a limb here and make some guesses, there is a dysfunctional BI environment and a lot of the actual work is being done with spreadsheets (or personal databases, such as Access) downloaded from the operational system. The fundamental issue is the business wants to gain control over their own data. And, from the business point of view, the world of data is best viewed as flattened landscape with all kinds of attributes and measures to chose from and a tool to do analysis doing simple drag and drop. (After all, that's the view BI tools try to present). They are asking for something reasonable, the problem is the specifications are being taken literally or developed by IT in frustration. Dig deeper and find out what they really want. Is the IT process too cumbersome? Too expensive? Is the environment wrong? Do they not have the tools available to do their job? You need to provide the functionality they deserve but at the same time coach them on the nature of a data warehouse and why things are done the way they are done. Get them involved in defining the BI environment so they can choose the tools they use to do reporting and analysis.
The solution is most likely a political one.
From what I can gather from your description, it is the mythical 'big flat file', the perfect solution to anybody's data needs. It has everything spread out in one big table. And we the business will have complete control of what anything means.
This is not a bad design.
This is a cry for help.
I am going to go out on a limb here and make some guesses, there is a dysfunctional BI environment and a lot of the actual work is being done with spreadsheets (or personal databases, such as Access) downloaded from the operational system. The fundamental issue is the business wants to gain control over their own data. And, from the business point of view, the world of data is best viewed as flattened landscape with all kinds of attributes and measures to chose from and a tool to do analysis doing simple drag and drop. (After all, that's the view BI tools try to present). They are asking for something reasonable, the problem is the specifications are being taken literally or developed by IT in frustration. Dig deeper and find out what they really want. Is the IT process too cumbersome? Too expensive? Is the environment wrong? Do they not have the tools available to do their job? You need to provide the functionality they deserve but at the same time coach them on the nature of a data warehouse and why things are done the way they are done. Get them involved in defining the BI environment so they can choose the tools they use to do reporting and analysis.
The solution is most likely a political one.
Similar topics
» Wide and large Dimension or Survey Factless Fact Table
» Large Dimension table compared to fact table?
» Model not-fixed-step for process in a wide datawarehouse. Extend fact-table ?
» Partitioning Large MS SQL-Server FACT table
» Large volume of hospital data into fact table
» Large Dimension table compared to fact table?
» Model not-fixed-step for process in a wide datawarehouse. Extend fact-table ?
» Partitioning Large MS SQL-Server FACT table
» Large volume of hospital data into fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum