Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)
3 posters
Page 1 of 1
Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)
Greeting,
I am roaming around the web and im not very successful in my research so i will ask here, hoping that someone smarter than me will be able to help!
Basicly, I need to build a cube (and the underlying start schema) that will allow the user to count a number of employees based on many criterions, including time. I was thinking about this kind of design for my fact table.
fk Contract_type_id
fk Office_id
fk Date_id
fk employee_id
[Number of employees] (should always be 1, or even a distinct count at cube definition level on employee_id to avoid duplicates)
With that, for exemple, we need to be able to tell how many employees where in office 1 for at least a day in 2008 and how many employees where in that same office exactly on 12/31/2008.
My problem here is that each combo [employee-contract-office] cannot be tagged with a single date like a products sales could. Each combos are valid "from" date1 "to" date2, with date2 null(or very large) for the current assignment.
The first idea was to store 365rows/years/employees but our biggest database have 200 000 employees in it. That would be way too many rows right? or not?
The second option I tought about is adding 2 dates dimensions for my "from" and "to" dates but i fear we would either have to hack pentaho front end analysis tool for it to generate correct MDX querries with correct date constraint, or create static reports... We dont want this... In fact, we fear having to do this...
Am I on a special business case here ? How is this normaly managed ?
Best regards, and sorry for my english
Manuel
I am roaming around the web and im not very successful in my research so i will ask here, hoping that someone smarter than me will be able to help!
Basicly, I need to build a cube (and the underlying start schema) that will allow the user to count a number of employees based on many criterions, including time. I was thinking about this kind of design for my fact table.
fk Contract_type_id
fk Office_id
fk Date_id
fk employee_id
[Number of employees] (should always be 1, or even a distinct count at cube definition level on employee_id to avoid duplicates)
With that, for exemple, we need to be able to tell how many employees where in office 1 for at least a day in 2008 and how many employees where in that same office exactly on 12/31/2008.
My problem here is that each combo [employee-contract-office] cannot be tagged with a single date like a products sales could. Each combos are valid "from" date1 "to" date2, with date2 null(or very large) for the current assignment.
The first idea was to store 365rows/years/employees but our biggest database have 200 000 employees in it. That would be way too many rows right? or not?
The second option I tought about is adding 2 dates dimensions for my "from" and "to" dates but i fear we would either have to hack pentaho front end analysis tool for it to generate correct MDX querries with correct date constraint, or create static reports... We dont want this... In fact, we fear having to do this...
Am I on a special business case here ? How is this normaly managed ?
Best regards, and sorry for my english
Manuel
mbruneau- Posts : 1
Join date : 2010-01-21
Re: Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)
Not sure whether I understood your query correctly... I am assuming that there is a 1-1 cardinality between CONTRACT and EMP entities.
With that in mind, could you not use the FACT table design as:
CONTRACT_SK
EMP_SK
OFFICE_SK
FROM_DT_SK
TO_DT_SK
EMP_COUNT
That ways, the FACT table should be able to answer queries such as count of distinct employees between a date range. Both attributes, FROM_DT_SK and TO_DT_SK refer to the same DATE_DIMENSION.
With that in mind, could you not use the FACT table design as:
CONTRACT_SK
EMP_SK
OFFICE_SK
FROM_DT_SK
TO_DT_SK
EMP_COUNT
That ways, the FACT table should be able to answer queries such as count of distinct employees between a date range. Both attributes, FROM_DT_SK and TO_DT_SK refer to the same DATE_DIMENSION.
dwbi_rb- Posts : 17
Join date : 2009-02-19
Re: Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)
In a relational data model you would have effective and expiration date ranges to handle this situation. In a cube (i.e. a multidimensional database such as Pentaho) you are essentially building aggregations. A membership aggregate is typically implemented as a snapshot. In your case there are two: How many unique employee were in an office in during a year, and how many employees were in an office at a point in time (usually the last day of the month or year, depending on your need). With an underlying relational fact table, you should be able to easily generate the aggregates for publishing into a cube.
Similar topics
» Actual Date vs Date Key in Fact table
» SCD type 2 - Valid To Date Value? 12/31/9999 or Null?
» Date instead of date key in fact tables
» Adding a fact table with start date and end date to a (SSAS) multidimensional cube
» DATE OR DATE KEYS IN FACT TABLES
» SCD type 2 - Valid To Date Value? 12/31/9999 or Null?
» Date instead of date key in fact tables
» Adding a fact table with start date and end date to a (SSAS) multidimensional cube
» DATE OR DATE KEYS IN FACT TABLES
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum