Detalization of fact table
5 posters
Page 1 of 1
Detalization of fact table
Hi,
There is a scenario. We have a fact table containing Vacation Request duration with the following attributes:
RequestKey (surrogate key)
RequestType
Employee
Total Duration (measure to analyze)
Start_Date
End_Date
This fact table keeps total vac duration. But business users would like to analyze vacation by months. If a vacation started on January and finished on February we should keep duration for each month.
Which scenario looks better?
1. Create a separate fact table with the following columns and configure reference to existing table by RequestKey?
- RequestKey
- Vacation Month
- Month Duration
2. Create two additional columns: Vacation Month and Duration in existing table and keep both duration?
Any thoughts would be very helpful! Thanks in advance!
There is a scenario. We have a fact table containing Vacation Request duration with the following attributes:
RequestKey (surrogate key)
RequestType
Employee
Total Duration (measure to analyze)
Start_Date
End_Date
This fact table keeps total vac duration. But business users would like to analyze vacation by months. If a vacation started on January and finished on February we should keep duration for each month.
Which scenario looks better?
1. Create a separate fact table with the following columns and configure reference to existing table by RequestKey?
- RequestKey
- Vacation Month
- Month Duration
2. Create two additional columns: Vacation Month and Duration in existing table and keep both duration?
Any thoughts would be very helpful! Thanks in advance!
Last edited by Aleksandr on Mon Nov 23, 2015 12:36 pm; edited 2 times in total
Aleksandr- Posts : 10
Join date : 2015-11-11
RE: Detalization of fact table
If you add Vacation start date and vacation end date in the fact table, then it is going to work for monthly analysis. You don't have to store "Duration", since it can be calculated using those dates. Sometimes a customer checkout early or extend their visit, so this is another reason not to store "Duration" in the fact table.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Detalization of fact table
hi zoom,
Thanks for your reply. Now we are planning to load data to Vacation Request fact table from scratch every day, because count of rows is not great. So we can calculate duration by month in ETL process. But I doubt, is it a good idea mix Month Duration and Total duration in the same fact table? Total duration is calculated alreday in source and we could just extract it rather than calculate sum by months.
Thanks for your reply. Now we are planning to load data to Vacation Request fact table from scratch every day, because count of rows is not great. So we can calculate duration by month in ETL process. But I doubt, is it a good idea mix Month Duration and Total duration in the same fact table? Total duration is calculated alreday in source and we could just extract it rather than calculate sum by months.
Aleksandr- Posts : 10
Join date : 2015-11-11
Re: Detalization of fact table
What is you grain on the fact table? If you are loading daily transactions, then "monthly" or "Total" should not be kept in the same fact table. Using that daily transaction fact table, you can either create a monthly snapshot table or your reporting team can sum totals in the report.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Re: Detalization of fact table
Let's look at business process in some details. An employee uses a web-application to request a vacation from October 30, 2015 to November 02, 2015. This application calculates duration (in hours) of vac and writes a row into underlying DB.
Our ETL process extract this vacation info from appication and load data to fact table. For our example it looks like:
RequestKey Employee Start_Dt End_Dt Duration
1000 100 20151030 20151102 24
But business users want to analyze duration of vacation by month and see Total too: October, 2015 - 8 hours, November - 16 hours, Total - 24. So, we should modify our fact table.
Is good idea to store data in such form:
RequestKey Employee Start_Dt End_Dt Month Month Duration Total Duration
1000 100 20151030 20151102 201510 8 24
1000 100 20151030 20151102 201511 16 24
Or Total Duration should be removed?
Thanks.
Our ETL process extract this vacation info from appication and load data to fact table. For our example it looks like:
RequestKey Employee Start_Dt End_Dt Duration
1000 100 20151030 20151102 24
But business users want to analyze duration of vacation by month and see Total too: October, 2015 - 8 hours, November - 16 hours, Total - 24. So, we should modify our fact table.
Is good idea to store data in such form:
RequestKey Employee Start_Dt End_Dt Month Month Duration Total Duration
1000 100 20151030 20151102 201510 8 24
1000 100 20151030 20151102 201511 16 24
Or Total Duration should be removed?
Thanks.
Aleksandr- Posts : 10
Join date : 2015-11-11
Re: Detalization of fact table
To zoom's point, what if the users later decide to do a weekly report?
Given your volumes are not large, a daily grain would allow you to handle anything.
Given your volumes are not large, a daily grain would allow you to handle anything.
Re: Detalization of fact table
Hi, ngalemmo
Keeping such data in daily grain will increase the number of rows is about 30 times - it's significantly. If store the data in monthly grain, is bad practice to mix Month and Total duration in the same table?
Thank you!
Keeping such data in daily grain will increase the number of rows is about 30 times - it's significantly. If store the data in monthly grain, is bad practice to mix Month and Total duration in the same table?
Thank you!
Aleksandr- Posts : 10
Join date : 2015-11-11
Re: Detalization of fact table
What about to keep two tables: the first (described on top the topic, let's name it Main table) stores total duration, the second (called VacationByMonth) stores detailed data? I've two suggestions regarding the second table structure:
1. With columns RequestKey, Month, Month Duration Duration. Configure foreign key to reference to the first table by RequestKey columns. But if I understand correctly Kimball's The Data Warehouse Toolkit doesn't recommend such method.
2. With columns Start_Dt, End_Dt, Employee, RequestType, Month, Month Duration.
At that moment we have a couple of reports. And only one of them needs duration by Months, so it will use VacationByMonth table. Other reports consume Total Duration and they will use Main table.
Is the second structure of VacationByMonth acceptable?
Thanks.
1. With columns RequestKey, Month, Month Duration Duration. Configure foreign key to reference to the first table by RequestKey columns. But if I understand correctly Kimball's The Data Warehouse Toolkit doesn't recommend such method.
2. With columns Start_Dt, End_Dt, Employee, RequestType, Month, Month Duration.
At that moment we have a couple of reports. And only one of them needs duration by Months, so it will use VacationByMonth table. Other reports consume Total Duration and they will use Main table.
Is the second structure of VacationByMonth acceptable?
Thanks.
Aleksandr- Posts : 10
Join date : 2015-11-11
Re: Detalization of fact table
How many employees do you have? Unless you're the world's biggest company, you don't have enough data to worry about the size of your fact table.
If it were my choice, I'd store each day of leave as a separate transaction. Something like:
fact_leave (employee_id,date_id,leave_type_id,hours_taken)
I'd use a type-2 slowly changing dimension for Employee, including their current department.
My Date dimension would include year, month, day, etc. to cater for all the roll-up scenarios you might want.
Further, I could now handle scenarios I don't think you've thought of yet, including:
- Employee who takes 1/2 day leave to attend a private appointment
- Employee who takes 10 days leave, but has two days credited back because they were called into work for an emergency
- Employee who took a 5 day break which included both personal leave and bereavement leave
Considering data volume, how many employees do you have? Let's go wild, and say 20,000. How many days leave do they get each year? Let's be a little generous, and say 20 days. That means your table might contain 400,000 rows, plus a small percentage of rows for people who have multiple entries for the one day. That should only take a few seconds to load on any contemporary DBMS.
What problems do you think might not be solved by this approach?
If it were my choice, I'd store each day of leave as a separate transaction. Something like:
fact_leave (employee_id,date_id,leave_type_id,hours_taken)
I'd use a type-2 slowly changing dimension for Employee, including their current department.
My Date dimension would include year, month, day, etc. to cater for all the roll-up scenarios you might want.
Further, I could now handle scenarios I don't think you've thought of yet, including:
- Employee who takes 1/2 day leave to attend a private appointment
- Employee who takes 10 days leave, but has two days credited back because they were called into work for an emergency
- Employee who took a 5 day break which included both personal leave and bereavement leave
Considering data volume, how many employees do you have? Let's go wild, and say 20,000. How many days leave do they get each year? Let's be a little generous, and say 20 days. That means your table might contain 400,000 rows, plus a small percentage of rows for people who have multiple entries for the one day. That should only take a few seconds to load on any contemporary DBMS.
What problems do you think might not be solved by this approach?
Re: Detalization of fact table
Hi ron.dunn,
Thanks for you reply. We have about 15 000 employees. Each employee takes about 25 vacation days (regular, ill, etc.). So, table might contain 400 000 rows per year -you guessed:). I think we will not be faced with the performance problem. But the whole process will look a little weird:
1. We extract from data source already calculated Total duration.
2. Split total duration into set of daily values.
3. Aggregate daily values to get total duration for report.
I agree with you, such method handles a lot of scenarios, but new scenarios are not expected. What about keeping two tables? Why it looks bad?
Thank you!
Thanks for you reply. We have about 15 000 employees. Each employee takes about 25 vacation days (regular, ill, etc.). So, table might contain 400 000 rows per year -you guessed:). I think we will not be faced with the performance problem. But the whole process will look a little weird:
1. We extract from data source already calculated Total duration.
2. Split total duration into set of daily values.
3. Aggregate daily values to get total duration for report.
I agree with you, such method handles a lot of scenarios, but new scenarios are not expected. What about keeping two tables? Why it looks bad?
Thank you!
Aleksandr- Posts : 10
Join date : 2015-11-11
Re: Detalization of fact table
A well designed data warehouse should be able to support any report that a user can come up with, for a particular topic and, where it doesn't extending your model to support new queries should not break existing queries. It does this by holding the lowest possible level of detail available - as from this it should be possible to fulfill any query; if it can't then it means the data is not actually available, which is a different issue. From this granular fact data it is then possible to build up additional, aggregate, fact tables if there are performance issues with querying the granular data (either due to the volume of the data or the complexity of the query).
What you are trying to do is design a fact table to support just a specific report. While superficially this may seem like an attractive idea, it is normally a very bad idea - you end up with a fact table per report, every new reporting requirement results in the creation of a new fact table and you end up with a mess of of a DW that is basically unsupportable.
You say that there will be no new requirements - and you may be right and therefore your design will work - but in 99% of DWs implemented you'll find that users will decide they want to report on the data in a slightly different way. With your design approach, you'll either have to redesign the fact table(s) - which may well break your existing reports and require them to be rebuilt as well - or you create a new, slightly different fact table which means you have duplicated data, more complex ETL and a design your users will never understand: if they want to create a new report how are they meant to know which of the many Vacation data tables are they meant to query?
What you are trying to do is design a fact table to support just a specific report. While superficially this may seem like an attractive idea, it is normally a very bad idea - you end up with a fact table per report, every new reporting requirement results in the creation of a new fact table and you end up with a mess of of a DW that is basically unsupportable.
You say that there will be no new requirements - and you may be right and therefore your design will work - but in 99% of DWs implemented you'll find that users will decide they want to report on the data in a slightly different way. With your design approach, you'll either have to redesign the fact table(s) - which may well break your existing reports and require them to be rebuilt as well - or you create a new, slightly different fact table which means you have duplicated data, more complex ETL and a design your users will never understand: if they want to create a new report how are they meant to know which of the many Vacation data tables are they meant to query?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Detalization of fact table
Hi,
We've discussed this issue and decided to store vacation data in daily grain. You are really right: such method handles all scenarios. Thanks you all for advice, you patiently explained to me how to solve my question. Your advice is really valuable
We've discussed this issue and decided to store vacation data in daily grain. You are really right: such method handles all scenarios. Thanks you all for advice, you patiently explained to me how to solve my question. Your advice is really valuable
Aleksandr- Posts : 10
Join date : 2015-11-11
Similar topics
» 'Routing' the grain of the fact table to multpile members of multiple dimensions causes the fact table to 'explode'
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
» How to create fact table with measures derived from comparing two fact table rows
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
» Integrating new fact table which has one to many relationship with the main fact table in existing star schema
» How to create fact table with measures derived from comparing two fact table rows
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum