Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)

mbruneau Thu Jan 21, 2010 8:12 pm

Greeting,

I am roaming around the web and im not very successful in my research so i will ask here, hoping that someone smarter than me will be able to help!

Basicly, I need to build a cube (and the underlying start schema) that will allow the user to count a number of employees based on many criterions, including time. I was thinking about this kind of design for my fact table.

fk Contract_type_id
fk Office_id
fk Date_id
fk employee_id
[Number of employees] (should always be 1, or even a distinct count at cube definition level on employee_id to avoid duplicates)

With that, for exemple, we need to be able to tell how many employees where in office 1 for at least a day in 2008 and how many employees where in that same office exactly on 12/31/2008.

My problem here is that each combo [employee-contract-office] cannot be tagged with a single date like a products sales could. Each combos are valid "from" date1 "to" date2, with date2 null(or very large) for the current assignment.
The first idea was to store 365rows/years/employees but our biggest database have 200 000 employees in it. That would be way too many rows right? or not?

The second option I tought about is adding 2 dates dimensions for my "from" and "to" dates but i fear we would either have to hack pentaho front end analysis tool for it to generate correct MDX querries with correct date constraint, or create static reports... We dont want this... In fact, we fear having to do this...

Am I on a special business case here ? How is this normaly managed ?

Best regards, and sorry for my english
Manuel

dwbi_rb Fri Jan 22, 2010 12:49 pm

Not sure whether I understood your query correctly... I am assuming that there is a 1-1 cardinality between CONTRACT and EMP entities.

With that in mind, could you not use the FACT table design as:
CONTRACT_SK
EMP_SK
OFFICE_SK
FROM_DT_SK
TO_DT_SK
EMP_COUNT

That ways, the FACT table should be able to answer queries such as count of distinct employees between a date range. Both attributes, FROM_DT_SK and TO_DT_SK refer to the same DATE_DIMENSION.

ngalemmo Fri Jan 22, 2010 1:36 pm

In a relational data model you would have effective and expiration date ranges to handle this situation. In a cube (i.e. a multidimensional database such as Pentaho) you are essentially building aggregations. A membership aggregate is typically implemented as a snapshot. In your case there are two: How many unique employee were in an office in during a year, and how many employees were in an office at a point in time (usually the last day of the month or year, depending on your need). With an underlying relational fact table, you should be able to easily generate the aggregates for publishing into a cube.

Sponsored content

Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)

Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)

Re: Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)

Re: Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)

Re: Best way to implement date ranges in schemas (fact 1 valid "from" date1 "to" date 2)