Pros and cons of consolidated dimension table Vs. many dimension table ?
4 posters
Page 1 of 1
Pros and cons of consolidated dimension table Vs. many dimension table ?
Say to create sales star schema => with sale(IT), refund(RF), exchange(EX) … as few of its attributes (item codes).
Pros and cons of consolidated dimension table Vs. many dimension table ?
Should I create different dimension tables with each item code:
Or, Should I include "item Code" column with in the fact table.
Consider a receipt with a number of line item say, 3 IT and 1 EX on each of 3 different receipt. I need to report number of EX and IT with in that particular day. But in my fact table, for uniqueness of the receipt i have included transaction Date, location Id, register Id and receipt Id only. Do i need to include unique column for each line item within that receipt.
I am very new to data warehousing .. your help is much appreciated. Thank you .
Pros and cons of consolidated dimension table Vs. many dimension table ?
Should I create different dimension tables with each item code:
Or, Should I include "item Code" column with in the fact table.
Consider a receipt with a number of line item say, 3 IT and 1 EX on each of 3 different receipt. I need to report number of EX and IT with in that particular day. But in my fact table, for uniqueness of the receipt i have included transaction Date, location Id, register Id and receipt Id only. Do i need to include unique column for each line item within that receipt.
I am very new to data warehousing .. your help is much appreciated. Thank you .
peace1aparna- Posts : 7
Join date : 2011-02-21
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
You would have an Item dimension containing all items. The table (as well as all dimension tables) would have a surrogate primary key which serves as the foreign key on the fact. You would have other dimensions as well, such as a transaction type dimension, location dimension, date dimension and so on. Things like receipt ID and line number can be stored on the fact as degenerate dimensions. You do not need a unique key on the fact (usually).
As far as measures go, it is common practice to store values as they relate to sales. So things like returns would have negative values in their measures as they reduce sales. Exchanges would typically be stored as two rows, one for the returned item and the other for the exchange item, with offsetting values as appropriate in the measures.
As far as measures go, it is common practice to store values as they relate to sales. So things like returns would have negative values in their measures as they reduce sales. Exchanges would typically be stored as two rows, one for the returned item and the other for the exchange item, with offsetting values as appropriate in the measures.
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
It would be a Junk Dimension, in case you wanted to read up on it.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
Still confused, as in my POS system the Sales and Return Receipt are different
Sale receipt have following information :
Header Information (employee, location, receipt number, date…)
Sales item
Price_hold_status (paid in half or full)
Item -1 sold
Item -2 sold
Item -3 sold
.
.
Discount (if applied)
Subtotal
Tax
Total
Payment Type
Return receipt have information like :
Header Information (employee, location, receipt number, date…)
Item-1 returned
Price_hold_status (paid in half or full)
Orignal purchase date
Original store
Original purchase receipt number :
Number of days since purchase
Item-2 returned
Price_hold_status (paid in half or full)
Orignal purchase date
Original store
Original purchase receipt number :
Number of days since purchase
.
.
.
Discount (if applied)
Subtotal
Tax
Total
Payment Type
So far I have created a single fact table with following measures
1. Employee_pk
2. Customer_pk
3. Product_pk
4. Date_pk
5. Location_pk
6. Price_modifier_pk (like promotion)
7. Transaction_type (Sale, Refund, Exchange) (where to include measure for Count of transaction_type? Like how many count of Refund per day, per employee, per location or per workstation)
8. Payment_type_pk
9. Workstation_pk (it is register number, should I make it DD or create dimension table?)
10. Receipt_id (DD) (Collection of all items in a single transaction)
11. Line_item_id(DD) (for each item sale /return item)
12. Quantity
13. Unique price
14. Tax
15. Total_before_tax
16. Total_after_tax
Again for all refund reporting, I need to report additional information like
Price_hold_status (paid in half or full)
Original purchase date
Original store
Original purchase receipt number:
Number of days since purchase
Report Criteria => Show all refunds with number of days since purchase is greater than 5 days ?
Also, should be feasible to track original purchase receipt from refund receipt ?
How and where do I include following information. Do I need to create new dimension/fact?
Help appreciated.
Thank you .
Sale receipt have following information :
Header Information (employee, location, receipt number, date…)
Sales item
Price_hold_status (paid in half or full)
Item -1 sold
Item -2 sold
Item -3 sold
.
.
Discount (if applied)
Subtotal
Tax
Total
Payment Type
Return receipt have information like :
Header Information (employee, location, receipt number, date…)
Item-1 returned
Price_hold_status (paid in half or full)
Orignal purchase date
Original store
Original purchase receipt number :
Number of days since purchase
Item-2 returned
Price_hold_status (paid in half or full)
Orignal purchase date
Original store
Original purchase receipt number :
Number of days since purchase
.
.
.
Discount (if applied)
Subtotal
Tax
Total
Payment Type
So far I have created a single fact table with following measures
1. Employee_pk
2. Customer_pk
3. Product_pk
4. Date_pk
5. Location_pk
6. Price_modifier_pk (like promotion)
7. Transaction_type (Sale, Refund, Exchange) (where to include measure for Count of transaction_type? Like how many count of Refund per day, per employee, per location or per workstation)
8. Payment_type_pk
9. Workstation_pk (it is register number, should I make it DD or create dimension table?)
10. Receipt_id (DD) (Collection of all items in a single transaction)
11. Line_item_id(DD) (for each item sale /return item)
12. Quantity
13. Unique price
14. Tax
15. Total_before_tax
16. Total_after_tax
Again for all refund reporting, I need to report additional information like
Price_hold_status (paid in half or full)
Original purchase date
Original store
Original purchase receipt number:
Number of days since purchase
Report Criteria => Show all refunds with number of days since purchase is greater than 5 days ?
Also, should be feasible to track original purchase receipt from refund receipt ?
How and where do I include following information. Do I need to create new dimension/fact?
Help appreciated.
Thank you .
peace1aparna- Posts : 7
Join date : 2011-02-21
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
It looks like you might need a separate fact table for refund as it has different dimensionality from sale fact. The purchase receipt number (DD) would be the connection between the two facts. if the fact for exchange is just a process of returns and sales, you may not need a separate fact table for exchange. I would combine Transaction type, Payment type and Workstation into a junk dimension, therefore the workstation should not be a DD as it is of low cardinality. You could add a physical measure and default it to 1 for count in the fact tables.
All other dimension keys look fine in the fact table. The additive count measure will give you the desired count by constraining on any relevant dimensions.
All other dimension keys look fine in the fact table. The additive count measure will give you the desired count by constraining on any relevant dimensions.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
If you go with a separate fact table for refunds AND refunds need to be included in sales numbers, then I would record the refund in BOTH fact tables.
Otherwise, I would just add the extra FKs to the sales fact (ie Original purchase date, Original store, Original purchase receipt number (a degenerate probably)) and populate with the same values as date, store and receipt when it is a sale.
Otherwise, I would just add the extra FKs to the sales fact (ie Original purchase date, Original store, Original purchase receipt number (a degenerate probably)) and populate with the same values as date, store and receipt when it is a sale.
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
A.Modelling in single fact table: (have included sample of refund/ sale receipt below)
SALEandRETURN_FACT ( 1.Employee_pk, 2. Customer_pk, 3. Product_pk, 4. Date_pk, 5. Location_pk, 6. Price_modifier_pk (like promotion), 7. Transaction_type (Sale, Refund, Exchange), 8. Receipt_id (DD), 9. Line_item_id(DD),10. Quantity, 11. Junk_key (transaction_type, payment_type, work_station, exchange_flag_if_any),12.Tax, 13. Total_before_tax, 14. Total_after_tax, 15. Original_date_key, 16. Original_store_key, 17. Original_purchase_receipt, 18. Days_since_purchase (how to compute this)
So when I do this will introduce 3 DD columns (15. Original_store_key, 16. Original_store_key, 17. Original_purchase_receipt).
Will this solve my problem of tracking original receipt from refund receipt. But again how will I compute Days_since_purchase.
Theoritically Days_since_purchase = (Original_purchase_receipt – refund date from date_pk). Still confused how to get there??
B.Modelling in different fact table:
SALE_FACT_TABLE(day_key, product_key, customer_key, sale_transaction_ID, sale_line_item_ID, Exchange_flag,quantity, amount, tax, total)
REFUND_FACT_TABLE(day_key_sale, day_key_return, refund_transaction_ID, return_junk_key, refund_line_item_ID, refund_quantity, refund_amount, tax,refund_ total, return_without_receipt_flag, sale_transaction_ID, sale_line_item_ID )
DAY_DIMENSION(day_Key, day, month, year)
I came to this structure with the help of google but here in REFUND_FACT_TABLE why should the day_key_sale ,sale_transaction_ID, sale_line_item_ID be included in REFUND_FACT_TABLE and what data should be load ??
And what if i have to compute Return as % of item Sales ????
Which track should I Follow(A or B). Sorry Its long and lots of confusing, hope I explained my confusion well. Help appreciated.
Following is the sample of purchase receipt and Return Receipt.
Purchase Receipt :
Receipt # XYZ
Register # 1
Store# NY
Cashier Adam
47 - transaction duration (sec)
Friday 02/25 5:12:13 PM PDT
510 510022330 - LIC KIDDIE STEP STOOL 5.00
350 350048083 - BRNT THBCK PANEL NAVY 63 12.00
350 350048083 - BRNT THBCK PANEL NAVY 63 12.00
520 520015380 - 9.5in ENVELOPES 0.94
Subtotal 29.94
Tax 2.92
Total 32.86
DBIT 32.86
Return receipt :
Receipt # ABCr
Register # 2
Store# NY
Cashier Betty
11 - transaction duration (sec)
Monday 03/14 1:05:24 PM PDT
17 - days since purchase
350 350048083 - BRNT THBCK PANEL NAVY 63 -12.00
Original Date: 2011-02-25 00:00:00
Original Store: NY
Original Register: 1
Original Transaction: XYZ
17 - days since purchase
350 350048083 - BRNT THBCK PANEL NAVY 63 -12.00
Original Date: 2011-02-25 00:00:00
Original Store: NY
Original Reg: 1
Original Trans: XYZ
Subtotal -24.00
Tax -2.34
Total -26.34
CASH -26.34
SALEandRETURN_FACT ( 1.Employee_pk, 2. Customer_pk, 3. Product_pk, 4. Date_pk, 5. Location_pk, 6. Price_modifier_pk (like promotion), 7. Transaction_type (Sale, Refund, Exchange), 8. Receipt_id (DD), 9. Line_item_id(DD),10. Quantity, 11. Junk_key (transaction_type, payment_type, work_station, exchange_flag_if_any),12.Tax, 13. Total_before_tax, 14. Total_after_tax, 15. Original_date_key, 16. Original_store_key, 17. Original_purchase_receipt, 18. Days_since_purchase (how to compute this)
So when I do this will introduce 3 DD columns (15. Original_store_key, 16. Original_store_key, 17. Original_purchase_receipt).
Will this solve my problem of tracking original receipt from refund receipt. But again how will I compute Days_since_purchase.
Theoritically Days_since_purchase = (Original_purchase_receipt – refund date from date_pk). Still confused how to get there??
B.Modelling in different fact table:
SALE_FACT_TABLE(day_key, product_key, customer_key, sale_transaction_ID, sale_line_item_ID, Exchange_flag,quantity, amount, tax, total)
REFUND_FACT_TABLE(day_key_sale, day_key_return, refund_transaction_ID, return_junk_key, refund_line_item_ID, refund_quantity, refund_amount, tax,refund_ total, return_without_receipt_flag, sale_transaction_ID, sale_line_item_ID )
DAY_DIMENSION(day_Key, day, month, year)
I came to this structure with the help of google but here in REFUND_FACT_TABLE why should the day_key_sale ,sale_transaction_ID, sale_line_item_ID be included in REFUND_FACT_TABLE and what data should be load ??
And what if i have to compute Return as % of item Sales ????
Which track should I Follow(A or B). Sorry Its long and lots of confusing, hope I explained my confusion well. Help appreciated.
Following is the sample of purchase receipt and Return Receipt.
Purchase Receipt :
Receipt # XYZ
Register # 1
Store# NY
Cashier Adam
47 - transaction duration (sec)
Friday 02/25 5:12:13 PM PDT
510 510022330 - LIC KIDDIE STEP STOOL 5.00
350 350048083 - BRNT THBCK PANEL NAVY 63 12.00
350 350048083 - BRNT THBCK PANEL NAVY 63 12.00
520 520015380 - 9.5in ENVELOPES 0.94
Subtotal 29.94
Tax 2.92
Total 32.86
DBIT 32.86
Return receipt :
Receipt # ABCr
Register # 2
Store# NY
Cashier Betty
11 - transaction duration (sec)
Monday 03/14 1:05:24 PM PDT
17 - days since purchase
350 350048083 - BRNT THBCK PANEL NAVY 63 -12.00
Original Date: 2011-02-25 00:00:00
Original Store: NY
Original Register: 1
Original Transaction: XYZ
17 - days since purchase
350 350048083 - BRNT THBCK PANEL NAVY 63 -12.00
Original Date: 2011-02-25 00:00:00
Original Store: NY
Original Reg: 1
Original Trans: XYZ
Subtotal -24.00
Tax -2.34
Total -26.34
CASH -26.34
peace1aparna- Posts : 7
Join date : 2011-02-21
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
I think the case can be made for 2 fact tables or even 3 fact tables.
Let's say you have attributes that only apply to a sale and others that only apply to a refund. In some cases, it might be possible to combine the attributes into a single dimension table with columns that only apply to the sale and others that only apply to the refund. In this case, you could put everything into 1 fact table.
There may be columns in the fact table that apply only to a refund - such as original store. But even then, it's possible to make the original store and refund store (or current store) the same on a sale. However, you may find that adding a bunch of columns to the sales fact and to the refund fact to make them fit into one fact table may dramatically increase the size of the fact table and it might be better to split the facts in to 2 fact tables.
But I think it's a total of 3 fact tables. The Sale and refund facts are at the item level. You could create a third fact table that is technically an aggregate of the refund and sale transactions. The measure might be total items in the transaction.
This third table would make it easy for users to see the number of transactions by store or by cashier. If the transactions had account level information, it would make it easy for people to see the geographic reach of the store.
Let's say you have attributes that only apply to a sale and others that only apply to a refund. In some cases, it might be possible to combine the attributes into a single dimension table with columns that only apply to the sale and others that only apply to the refund. In this case, you could put everything into 1 fact table.
There may be columns in the fact table that apply only to a refund - such as original store. But even then, it's possible to make the original store and refund store (or current store) the same on a sale. However, you may find that adding a bunch of columns to the sales fact and to the refund fact to make them fit into one fact table may dramatically increase the size of the fact table and it might be better to split the facts in to 2 fact tables.
But I think it's a total of 3 fact tables. The Sale and refund facts are at the item level. You could create a third fact table that is technically an aggregate of the refund and sale transactions. The measure might be total items in the transaction.
This third table would make it easy for users to see the number of transactions by store or by cashier. If the transactions had account level information, it would make it easy for people to see the geographic reach of the store.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
Jeff Smith wrote:.
But I think it's a total of 3 fact tables. The Sale and refund facts are at the item level. You could create a third fact table that is technically an aggregate of the refund and sale transactions. The measure might be total items in the transaction.
This third table would make it easy for users to see the number of transactions by store or by cashier. If the transactions had account level information, it would make it easy for people to see the geographic reach of the store.
Is it possiblt to create 1 aggregate table using 2 existing fact table (refund, sales) ?
AGGREGATE_FACT(weekly_dat_key,location_key, employee_key, product_key,sale_count, sale_amount, refund_count, refund_amount) .. can I do this but this table have columns that only apply to the sale AND columns that only apply to the refund. What are the pros and cons of this table with following columns ??
Also, How have create 2 models, how to attach in this forum ??
Thank you .
peace1aparna- Posts : 7
Join date : 2011-02-21
Re: Pros and cons of consolidated dimension table Vs. many dimension table ?
New to dataware housing so can you please explain this further ..ngalemmo wrote:If you go with a separate fact table for refunds AND refunds need to be included in sales numbers, then I would record the refund in BOTH fact tables.
Also I am confused with a similar sample model .. (extracted from The complete refrence to Star schema by - Christopher Adams, pg:70, fig 4-8)
ORDER_FACTS (day_key. product_key, customer_key, ordered_quantity, ordered_amount, order_id, order_line_num)
SHIPMENT_FACTS (day_key_order, day_key_shipment, product_key, customer_key, shiped_quantity, shipped_amount, revenue_dollar, order_id, order_line_num, shipment_id, shipment_line_number )
Following Confusion:
1. why is day_key , day_key_order, day_key_shipment all reference to day_key dimension table (shouldnot day_key_order, day_key_shipment be different)
2. whys is order_id, order_line_num attribute are also included in SHIPMENT_FACTS and what data goes in here, assuming order and shipment invoice have different IDs.
Thank you .
peace1aparna- Posts : 7
Join date : 2011-02-21
Similar topics
» Fedrated DW pros and cons and what to watch out for
» Can a dimension table directly link to another dimension table?
» Wide fact tables
» Too many bridge tables dimensional modeling - pros/cons
» Consolidated fact table or separate facts?
» Can a dimension table directly link to another dimension table?
» Wide fact tables
» Too many bridge tables dimensional modeling - pros/cons
» Consolidated fact table or separate facts?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum