Reporting table data repository vs. Dimensional data store
+3
BoxesAndLines
John Simon
hang
7 posters
Page 1 of 1
Reporting table data repository vs. Dimensional data store
I am facing this business situation. Our current fact-dimension based dimensional store is under scrutiny which has found out some of the key fact and dimension tables have issues in their implementations. The impact of the issues has been highlighted to justify replacing the current architecture with another methodology, the reporting table based data repository, for our BI reporting services.
Now we have implemented a quite good effective date based data store that can provide all the dimensional and fact data at any given point of time for a few years history. Some argument points out why we need another redundant data store in form of fact and dimension tables when we can simply load data to fully de-normalised reporting tables that suffice all the required information for the front-end. And the argument goes on, If we need historical data, we then load reporting data snapshots into the reporting tables. BTW, OLAP Cubing is nice to have but not imperative at the moment.
In the proposed approach, we could eliminate one extra data layer, and the ETL becomes very straightforward without worrying about all the dimensional modelling complexities, such as SCD, bridge and junk dimension structure. Our reporting data are always much smaller subsets of original facts and dimensions in dimensional model. So the reporting tables may not go too deep even with a few snapshots, although very wide.
Now the justification seems convincing and the new approach will have some repercussion on national services once adopted. I know this simplistic approach goes against my BI/DW understanding and practice in many years, I really need some expert advice that will pin point the flaw and consequence, if there is one, of this reporting table practice.
Now we have implemented a quite good effective date based data store that can provide all the dimensional and fact data at any given point of time for a few years history. Some argument points out why we need another redundant data store in form of fact and dimension tables when we can simply load data to fully de-normalised reporting tables that suffice all the required information for the front-end. And the argument goes on, If we need historical data, we then load reporting data snapshots into the reporting tables. BTW, OLAP Cubing is nice to have but not imperative at the moment.
In the proposed approach, we could eliminate one extra data layer, and the ETL becomes very straightforward without worrying about all the dimensional modelling complexities, such as SCD, bridge and junk dimension structure. Our reporting data are always much smaller subsets of original facts and dimensions in dimensional model. So the reporting tables may not go too deep even with a few snapshots, although very wide.
Now the justification seems convincing and the new approach will have some repercussion on national services once adopted. I know this simplistic approach goes against my BI/DW understanding and practice in many years, I really need some expert advice that will pin point the flaw and consequence, if there is one, of this reporting table practice.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
Ngalemmo, Warrent. any comment on this?
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
Firstly, performance. Fully denormalised tables do not improve performance.
Secondly, you don't build reports from your database. You don't build your database for your reports or you will lock yourself in to an inflexible arrangement where you need to recreate tables for each report.
Thirdly, data quality and reuse. You can easily drill across tables in a DDS. If you need to create a new report then your development times are dramatically increased due to loading, testing each report table.
I did some consulting at a bank that tried the above approach and they had a lot of issues with having the same measure being incorrectly calculated and other data issues each time a report is required.
Secondly, you don't build reports from your database. You don't build your database for your reports or you will lock yourself in to an inflexible arrangement where you need to recreate tables for each report.
Thirdly, data quality and reuse. You can easily drill across tables in a DDS. If you need to create a new report then your development times are dramatically increased due to loading, testing each report table.
I did some consulting at a bank that tried the above approach and they had a lot of issues with having the same measure being incorrectly calculated and other data issues each time a report is required.
Re: Reporting table data repository vs. Dimensional data store
I think you'll run into trouble when you have the wide and deep table. Most of today's DBMS' return the whole row of data from disk even though you only have a few columns in the select statement. OTOH, if you're running Netezza, Teradata, or a columnar database, this may work just fine.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Reporting table data repository vs. Dimensional data store
It depends on what you are trying to do.
One cardinal rule I live by is you don't design a data warehouse to produce a set of reports. You look at the big picture and build to address the business.
With that said, if you aready have a data warehouse, building a publication layer to produce 'report files' or other extracts is not out of the question provided there is some benefit in doing so.
But, trying to 'fix' a data warehouse by replacing it with a collection of report oriented tables (or cubes) is a dead end. You wind up with the mess that prompted the creation of data warehouses in the first place.
One cardinal rule I live by is you don't design a data warehouse to produce a set of reports. You look at the big picture and build to address the business.
With that said, if you aready have a data warehouse, building a publication layer to produce 'report files' or other extracts is not out of the question provided there is some benefit in doing so.
But, trying to 'fix' a data warehouse by replacing it with a collection of report oriented tables (or cubes) is a dead end. You wind up with the mess that prompted the creation of data warehouses in the first place.
Re: Reporting table data repository vs. Dimensional data store
BoxesAndLines wrote:OTOH, if you're running Netezza, Teradata, or a columnar database, this may work just fine.
I've been working a lot with Netezza lately. It tends to work better with large thin tables than large wide ones. You generally get significantly (orders of magnitude) better performance with star schema and its joins than trying to query a one-table equivalent.
Re: Reporting table data repository vs. Dimensional data store
Interesting. What is a shame is I just attended the Netezza conference in Boston a couple of weeks ago and this type of information was severely lacking. I learned exactly nothing about the appropriate database designs for Netezza.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Reporting table data repository vs. Dimensional data store
I can't agree more! However in many people's sight, the delivery of reports is the quick and tangible benchmark of a successful BI system.John Simon wrote:you don't build reports from your database. You don't build your database for your reports or you will lock yourself in to an inflexible arrangement where you need to recreate tables for each report.
It appears we could head in that direction, I wish our decision makers could take your comment seriously. However, the opponent of the dimension modeling has been quite successful in producing reports by using effective date based datasets for almost everything, facts and dimensions. The approach has also be claimed more resilient to changes as it does not rely on surrogate keys.ngalemmo wrote:But, trying to 'fix' a data warehouse by replacing it with a collection of report oriented tables (or cubes) is a dead end.
Ngalemmo, I know you have mentioned, a few times in the relevant posts, about the effective date pattern for fact table, and it's almost like applying SCD2 on the fact tables, but without dedicated SK and NK as in the dimension table. So the question is, is it feasible to use this generic approach to produce equivalent results which would otherwise be achieved by periodic snapshots, or even accumulating snapshots. I can imagine every thing has to be date ranged in the absence of SK on dimensions.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
However, the opponent of the dimension modeling has been quite successful in producing reports by using effective date based datasets for almost everything, facts and dimensions. The approach has also be claimed more resilient to changes as it does not rely on surrogate keys.
Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
Re: Reporting table data repository vs. Dimensional data store
Nick's spot on. On a recent engagement, I was converting SSRS reports to a dimensional model. The SSRS reporting guru showed up with a smug gring and a huge binder full with over 500 "reports". Two fact tables and 30 some odd dimensions later, the whole stack was running on the dimensional model.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Reporting table data repository vs. Dimensional data store
From my experience, the biggest negative against the Reporting table approach is the exponential duplication of code it produces. Consider boxesandlines scenario of 500 "reports". The code for many of the business rules (expressed as joins, lookups, calculated columns etc) would be repeated in literally hundreds of different sections of code. Surely no-one with any IT training or experience can consider this a good solution?
Even with the best of efforts and resources, it is impossible to maintain consistency across such a code base - consider that each rule implementation for each specific report will be in a subtly different context, to meet the requirements of each report. Duplicated inconsistent code then leads to inconsistent results, and implies huge code maintenance and testing overheads for any changes.
The dimensional datamart optimises this challenge by implementing each business rule in a single piece of code and materialising the result in a single piece of data, which can then be easily queried by report authors and testers.
Hang I think your best counter-attack strategy would be to insist on a peer code review of the code produced so far for the supposedly successful Reporting table approach. Odds are you will be able to point to many duplicated business rules, a very high "Lines of code" count and maybe even some inconsistencies creeping in already.
Good luck (and keep the faith)!
Mike
Even with the best of efforts and resources, it is impossible to maintain consistency across such a code base - consider that each rule implementation for each specific report will be in a subtly different context, to meet the requirements of each report. Duplicated inconsistent code then leads to inconsistent results, and implies huge code maintenance and testing overheads for any changes.
The dimensional datamart optimises this challenge by implementing each business rule in a single piece of code and materialising the result in a single piece of data, which can then be easily queried by report authors and testers.
Hang I think your best counter-attack strategy would be to insist on a peer code review of the code produced so far for the supposedly successful Reporting table approach. Odds are you will be able to point to many duplicated business rules, a very high "Lines of code" count and maybe even some inconsistencies creeping in already.
Good luck (and keep the faith)!
Mike
Re: Reporting table data repository vs. Dimensional data store
Sounds like quick and dirty. But anyway, it’s out of my control. B&L, I see your point, as now we started this simplistic approach, our SSRS developers can take over the tasks and stuff data together without any sense of data modelling.ngalemmo wrote:Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
Last edited by hang on Thu Aug 04, 2011 12:34 am; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
Mike, thanks for the advice and it seems to be an effective one. You are right, there are heaps of ugly copy-paste duplications all over the place.Mike Honey wrote:Hang I think your best counter-attack strategy would be to insist on a peer code review of the code produced so far for the supposedly successful Reporting table approach. Odds are you will be able to point to many duplicated business rules, a very high "Lines of code" count and maybe even some inconsistencies creeping in already.
Last edited by hang on Thu Aug 04, 2011 12:35 am; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
Here's a list of issues:
I cannot think of a single good reason to go down that path.
Please let us know how it goes.
John
- Performance
- Little Re-use of existing calculations, leading to:
- Increased development time
- Highly prone to bugs in business rules since they are re-done for each table
- High risk of ending up with hundreds or thousands of reports and table leading to maintenance and support nightmares
I cannot think of a single good reason to go down that path.
Please let us know how it goes.
John
Re: Reporting table data repository vs. Dimensional data store
Good points John, I will keep you updated.
Last edited by hang on Thu Aug 04, 2011 12:37 am; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
hang wrote:Sounds like quick and dirty. It’s a shame that I have to wait that long to see its failure. However as saying goes, if it’s doomed why doing it in first place. But anyway, it’s out of my control. B&L, I see your point, as now we started this simplistic approach, our SSRS developers can take over the tasks and stuff data together without any sense of data modelling.ngalemmo wrote:Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
The real problem is it is kind of like crack. Really addictive and you lose sight of the real cost of doing it. Outside observers will see a disjoint mess while those on the inside wil think they are humming along just fine. People will still be running reports off files that were obsolete years ago because they don't know there was a replacement. IT will still be producing those files because 'the business still uses it' and nobody has the time to check to see what that file really is nor does anyone have time to convert those reports to the new file. Hardware will be upgraded because there isn't enough time or space to generate all the files on the old machine (but that's normal isn't it?).
Eventually, an intervention is needed.
Re: Reporting table data repository vs. Dimensional data store
Since Hang is in Australia, I volunteer.ngalemmo wrote:...Eventually, an intervention is needed.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Reporting table data repository vs. Dimensional data store
B&L, I appreciate your offer, and will give you a call when the number of reporting tables have reached a few hundreds.BoxesAndLines wrote:Since Hang is in Australia, I volunteer.ngalemmo wrote:...Eventually, an intervention is needed.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
As a consultant I've been asked in the past to put forward cost estimates for migrating 1000 reports, and more recently 2,500 reports.
Ridiculous!
Ridiculous!
Re: Reporting table data repository vs. Dimensional data store
Hi Hang,
If I was tackling this scenario I would document some concrete scenarios with real data applicable to your business and then highlight to pros and cons of each method and give a score card value on how important each aspect is to your business now and in the future. Don't forget to include the cost of developing and maintaining each approach (I'm thinking labour costs and available skill sets here), what client tools will need connect to data repository, will they need custom interfaces building (labour costs again), time spent developing reports with each approach. The Cost of having more than one version of the truth in your solution. The cost of supporting hardware and software as it scales.
The biggest thing that sticks out to me in your post is the 'OLAP Cubing is a nice to have at the moment' comment. Well if you go down the reporting table route only, the OLAP option goes out of the window along with power of the queries you can create in MDX versus the potentially slower and complex SQL equivalent. I would think this is a significant part of your scenario based evaluation above.
I know your question was purely a data architecture related, however this kind of decision has implications on the technologies you may want to use now and in the future which in turn impacts the bottom line of the business.
Simon
If I was tackling this scenario I would document some concrete scenarios with real data applicable to your business and then highlight to pros and cons of each method and give a score card value on how important each aspect is to your business now and in the future. Don't forget to include the cost of developing and maintaining each approach (I'm thinking labour costs and available skill sets here), what client tools will need connect to data repository, will they need custom interfaces building (labour costs again), time spent developing reports with each approach. The Cost of having more than one version of the truth in your solution. The cost of supporting hardware and software as it scales.
The biggest thing that sticks out to me in your post is the 'OLAP Cubing is a nice to have at the moment' comment. Well if you go down the reporting table route only, the OLAP option goes out of the window along with power of the queries you can create in MDX versus the potentially slower and complex SQL equivalent. I would think this is a significant part of your scenario based evaluation above.
I know your question was purely a data architecture related, however this kind of decision has implications on the technologies you may want to use now and in the future which in turn impacts the bottom line of the business.
Simon
Ham09- Posts : 8
Join date : 2011-07-26
Location : United Kingdom
Re: Reporting table data repository vs. Dimensional data store
Simon, you hit the nail on the head! I really love your comments. That's almost what I told one of the important stakeholders. In my view, if the model is not cube friendly, then there must be some flaw in it. Obviously, reporting table approach will require a lot of work to be able to feed OLAP cube, if ever possible. I think it is the most powerful and convincing argument point. Thanks for further strengthening my confidence.Ham09 wrote:Well if you go down the reporting table route only, the OLAP option goes out of the window along with power of the queries you can create in MDX versus the potentially slower and complex SQL equivalent. I would think this is a significant part of your scenario based evaluation above.
Last edited by hang on Thu Aug 04, 2011 12:39 am; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Reporting table data repository vs. Dimensional data store
Hang,
We had a same issue with one of a telecom client where in they have an architect who with lots of OLTP exp suggested to have the analytical reports in reports table approach.
In the beginning when the tables were small less than 300-400K, it worked fine, but later on as the table size increased, the performance issue started to creep in. This resulted in lots of business users shouting as now they had to wait for over a minute to get their reports. Instead of getting to nip this in the bud by adopting the dimensional approach, the management went with the another brilliant idea of same architect by introducing the aggregate report tables on top the reports table where ever require to increase performance...Now imagine the developer pain when he has to first introduce the logic of populating these aggregate tables with the base data and then changing each report to refer to them in all there were 60 reports.
Here comes the real problem, one fine day business users asked to introduce another aggregation levels (from cities to states to regions to country). Now it bowled everyone as the current architecture does not support this as the reports tables are static and changing them would result in corruption of the reports. thus forced the development team to create another set of 180 tables and reports( 60 * 3 new levels) with sql written to aggregate at region level. you can imagine the maintenance problem introduce in it, like changing a label for a type of report requires change in reports for each level.
So reporting table/reports bound the scalability of reporting environment which dimensional model provide.
We had a same issue with one of a telecom client where in they have an architect who with lots of OLTP exp suggested to have the analytical reports in reports table approach.
In the beginning when the tables were small less than 300-400K, it worked fine, but later on as the table size increased, the performance issue started to creep in. This resulted in lots of business users shouting as now they had to wait for over a minute to get their reports. Instead of getting to nip this in the bud by adopting the dimensional approach, the management went with the another brilliant idea of same architect by introducing the aggregate report tables on top the reports table where ever require to increase performance...Now imagine the developer pain when he has to first introduce the logic of populating these aggregate tables with the base data and then changing each report to refer to them in all there were 60 reports.
Here comes the real problem, one fine day business users asked to introduce another aggregation levels (from cities to states to regions to country). Now it bowled everyone as the current architecture does not support this as the reports tables are static and changing them would result in corruption of the reports. thus forced the development team to create another set of 180 tables and reports( 60 * 3 new levels) with sql written to aggregate at region level. you can imagine the maintenance problem introduce in it, like changing a label for a type of report requires change in reports for each level.
So reporting table/reports bound the scalability of reporting environment which dimensional model provide.
hellovineet- Posts : 2
Join date : 2011-07-28
Re: Reporting table data repository vs. Dimensional data store
A very likely same requirement change in our case, as the flexible location navigation is such a valuable part of our reporting environment.hellovineet wrote:Here comes the real problem, one fine day business users asked to introduce another aggregation levels (from cities to states to regions to country).
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Data Mart/Dimensional Data Store Definition.
» Store Aggregated data in dimension
» Data Vault v's Dimensional Model
» Is it a good idea to have a single STATUS table if we have to store multiple status in the data mart?
» Medical Data - Building a reporting Data Warehouse
» Store Aggregated data in dimension
» Data Vault v's Dimensional Model
» Is it a good idea to have a single STATUS table if we have to store multiple status in the data mart?
» Medical Data - Building a reporting Data Warehouse
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum