star data model and reporting with different dimension groupings
4 posters
Page 1 of 1
star data model and reporting with different dimension groupings
Dear community,
I would like to ask you for helping me with a star data model / reporting problem.
The simplified data model is:
FACT_SALES (DATE_ID, PRODUCT_ID, CUSTOMER_ID, PLANT_ID, SALES_AMOUNT)
DIM_PRODUCT (PRODUCT_ID, PRODUCT_NR, NAME, etc.)
DIM_CUSTOMER (CUSTOMER_ID, CUSTOMER_NR, NAME, etc.)
DIM_PLANT (PLANT_ID, PLANT_NR, NAME, etc.)
DIM_DATE (DATE_ID, DAY, MONTH, etc.)
Business users want to create reports (lists and cross-tabs) where they want to show the SALES_AMOUNT fact grouped by
different dimension combinations in a single row in a report with a "grouping name" attached.
Business users define these dimension groupings in the following way (example):
"Grouping A" | Products A, B, X | all Customers | Plant XX, XY
"Grouping A" | Products D, E | Customer 1, 2, 3 | Plant XX
"Grouping A" | all Products | Customer 4, 5 | Plant XY
"Grouping A" | all Products | all Customers | Plant ZZ
"Grouping B" | Products A, B | all Customers | Plant ZZ
"Grouping B" | all Products | Customer 2, 5 | Plant XX, XY
... | ... | ... | ...
So each of these groupings can (and usually is) defined via differnt dimension combinations.
The list-report should look like (example)
Grouping A: Sales amount of USD 100,--
Grouping B: Sales amount of USD 200,--
...
As we do not want to create these groupings in the report itself (lots of these groupings need to be used in many different reports),
we are looking for a way to solve the issue with a data structure on the database (so that these groupings are maintained
once in the database and can be reused in lots of different reports).
As you can see from the above example, the problem is that lots of definitions use "all Products", "all Customers", "all Plants", etc.
Therefore when creating a table with these definitions in the database which joins directly to the fact table, we
have to create for "all Customers" a row with every single possible customer (the same for "all products", "all xyz").
For the first defintion, this table would look like:
"Grouping A" | Product A | Customer 1 | Plant XX
"Grouping A" | Product A | Customer 2 | Plant XX
"Grouping A" | Product A | Customer ... | Plant XX
"Grouping A" | Prodcut A | Customer 1 | Plant XY
"Grouping A" | Prodcut A | Customer 2 | Plant XY
"Grouping A" | Prodcut A | Customer ... | Plant XY
"Grouping A" | Product B | Customer 1 | Plant XX
"Grouping A" | Product B | Customer 2 | Plant XX
"Grouping A" | Product B | Customer ... | Plant XX
"Grouping A" | Prodcut B | Customer 1 | Plant XY
"Grouping A" | Prodcut B | Customer 2 | Plant XY
"Grouping A" | Prodcut B | Customer ... | Plant XY
"Grouping A" | Product X | Customer 1 | Plant XX
"Grouping A" | Product X | Customer 2 | Plant XX
"Grouping A" | Product X | Customer ... | Plant XX
"Grouping A" | Prodcut X | Customer 1 | Plant XY
"Grouping A" | Prodcut X | Customer 2 | Plant XY
"Grouping A" | Prodcut X | Customer ... | Plant XY
The problem with that solution is, that this table gets huge - it's several times bigger than the fact table itself
and it impacts performance dramatically.
But without this table, the joins to the fact table would not work.
In our real situation there are about 7 dimensions involved in these group-definitions.
The biggest dimensions are DIM_CUSTOMER with about 30.000 customers and DIM_PRODUCT with about 2.000 products.
Did anyone of you had a similiar problem?
I would appreciate any help from you.
I would like to ask you for helping me with a star data model / reporting problem.
The simplified data model is:
FACT_SALES (DATE_ID, PRODUCT_ID, CUSTOMER_ID, PLANT_ID, SALES_AMOUNT)
DIM_PRODUCT (PRODUCT_ID, PRODUCT_NR, NAME, etc.)
DIM_CUSTOMER (CUSTOMER_ID, CUSTOMER_NR, NAME, etc.)
DIM_PLANT (PLANT_ID, PLANT_NR, NAME, etc.)
DIM_DATE (DATE_ID, DAY, MONTH, etc.)
Business users want to create reports (lists and cross-tabs) where they want to show the SALES_AMOUNT fact grouped by
different dimension combinations in a single row in a report with a "grouping name" attached.
Business users define these dimension groupings in the following way (example):
"Grouping A" | Products A, B, X | all Customers | Plant XX, XY
"Grouping A" | Products D, E | Customer 1, 2, 3 | Plant XX
"Grouping A" | all Products | Customer 4, 5 | Plant XY
"Grouping A" | all Products | all Customers | Plant ZZ
"Grouping B" | Products A, B | all Customers | Plant ZZ
"Grouping B" | all Products | Customer 2, 5 | Plant XX, XY
... | ... | ... | ...
So each of these groupings can (and usually is) defined via differnt dimension combinations.
The list-report should look like (example)
Grouping A: Sales amount of USD 100,--
Grouping B: Sales amount of USD 200,--
...
As we do not want to create these groupings in the report itself (lots of these groupings need to be used in many different reports),
we are looking for a way to solve the issue with a data structure on the database (so that these groupings are maintained
once in the database and can be reused in lots of different reports).
As you can see from the above example, the problem is that lots of definitions use "all Products", "all Customers", "all Plants", etc.
Therefore when creating a table with these definitions in the database which joins directly to the fact table, we
have to create for "all Customers" a row with every single possible customer (the same for "all products", "all xyz").
For the first defintion, this table would look like:
"Grouping A" | Product A | Customer 1 | Plant XX
"Grouping A" | Product A | Customer 2 | Plant XX
"Grouping A" | Product A | Customer ... | Plant XX
"Grouping A" | Prodcut A | Customer 1 | Plant XY
"Grouping A" | Prodcut A | Customer 2 | Plant XY
"Grouping A" | Prodcut A | Customer ... | Plant XY
"Grouping A" | Product B | Customer 1 | Plant XX
"Grouping A" | Product B | Customer 2 | Plant XX
"Grouping A" | Product B | Customer ... | Plant XX
"Grouping A" | Prodcut B | Customer 1 | Plant XY
"Grouping A" | Prodcut B | Customer 2 | Plant XY
"Grouping A" | Prodcut B | Customer ... | Plant XY
"Grouping A" | Product X | Customer 1 | Plant XX
"Grouping A" | Product X | Customer 2 | Plant XX
"Grouping A" | Product X | Customer ... | Plant XX
"Grouping A" | Prodcut X | Customer 1 | Plant XY
"Grouping A" | Prodcut X | Customer 2 | Plant XY
"Grouping A" | Prodcut X | Customer ... | Plant XY
The problem with that solution is, that this table gets huge - it's several times bigger than the fact table itself
and it impacts performance dramatically.
But without this table, the joins to the fact table would not work.
In our real situation there are about 7 dimensions involved in these group-definitions.
The biggest dimensions are DIM_CUSTOMER with about 30.000 customers and DIM_PRODUCT with about 2.000 products.
Did anyone of you had a similiar problem?
I would appreciate any help from you.
z00t- Posts : 2
Join date : 2010-09-17
Re: star data model and reporting with different dimension groupings
You need to talk this through with the business a bit more. I mean, sure, you can build data structures to support this kind of thing, but things can get very complex very quickly. You can easily wind up with something that no one can or would want to maintain.
The thing is, these groupings mean something to somebody... there is a business reason behind them. You need to talk to whoever dreamed these up to understand the reasons behind them. There are probably categories of customers, product, etc... that have significance. It is possible that all of this can be reduced to simply adding categorization attributes to the dimensions (that the business maintains) that can serve to simplify selecting and classifying data for reports.
The other thing is, what you describe is very similar to what is typically done in financial statements and other GL reports. It is quite possible that there could be a correlation between what they are asking for and how sales are being booked.
The thing is, these groupings mean something to somebody... there is a business reason behind them. You need to talk to whoever dreamed these up to understand the reasons behind them. There are probably categories of customers, product, etc... that have significance. It is possible that all of this can be reduced to simply adding categorization attributes to the dimensions (that the business maintains) that can serve to simplify selecting and classifying data for reports.
The other thing is, what you describe is very similar to what is typically done in financial statements and other GL reports. It is quite possible that there could be a correlation between what they are asking for and how sales are being booked.
Re: star data model and reporting with different dimension groupings
I think that Nick is definitely on the right track here. To be a little more specific, you should probably see if these groupings correlate to any financial dimensions that exist or could be created. If so, then you just apply the business rules to put those keys in the table and let that drive your ETL.
The next step from there would be allowing users to drill-through from the financial reports into the detail that is in the fact tables you are currently working on.
The next step from there would be allowing users to drill-through from the financial reports into the detail that is in the fact tables you are currently working on.
bigjonroberts- Posts : 6
Join date : 2009-07-09
Re: star data model and reporting with different dimension groupings
Hi,
I also agreed with the previous two posts.
Storing such groupings are specific to a special business need.
If they were product groups / channels then you could have added a field to the product dimension called ProductCategory etc. and repeat the group names, however your scenario is more complex.
A technical solution could be to create a 'junk' dimensions with all the possible grouping combinations and try and reference the correct junk dimension record from your fact table. however this is a technical workaround that 'may' work but once you start doing this - what stops you from doing so again and again - which will result in a very messy DW with many junk dimensions storing a myriad of different report specific groupings - which is not the purpose of a DW system.
I also agreed with the previous two posts.
Storing such groupings are specific to a special business need.
If they were product groups / channels then you could have added a field to the product dimension called ProductCategory etc. and repeat the group names, however your scenario is more complex.
A technical solution could be to create a 'junk' dimensions with all the possible grouping combinations and try and reference the correct junk dimension record from your fact table. however this is a technical workaround that 'may' work but once you start doing this - what stops you from doing so again and again - which will result in a very messy DW with many junk dimensions storing a myriad of different report specific groupings - which is not the purpose of a DW system.
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Similar topics
» Data model for Reporting needs - Event based or fact based
» How to Handle Updates to Dimension Attributes (SCD 1) that have a Cardinality > 1 (Context is Data Warehouse Star Schema)
» How to dimension model this operational data
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» How to model dimension data including dynamic fields from the OLTP system?
» How to Handle Updates to Dimension Attributes (SCD 1) that have a Cardinality > 1 (Context is Data Warehouse Star Schema)
» How to dimension model this operational data
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» How to model dimension data including dynamic fields from the OLTP system?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum