SCD Type 2 dimension and fact table being the same table

View previous topic View next topic Go down

SCD Type 2 dimension and fact table being the same table

Post  emonchen on Thu Feb 11, 2010 3:14 am

I am designing a dimensional model for employee data in a data warehouse. Right now I only register the basic information about the employee, like job title, department, language, FTE, etc. I have chosen to use an SCD Type 2 dimension since I want to keep historical data.

In the end, I need to be able to answer the following questions:

- How many employees did I have at <<< any date >>> ?
- How many FTE did I have at <<< any date >>> ?
- What was my inflow in << any month of any year >>> ?
- Who changed from which to which department between <<< any date >> and <<< any date >>> ?

If I'm not mistaken, when I use the SCD table as a fact table AND as a dimension table to retrieve this information, I would be able to answer such questions. So basically with only that table I would be able to extract a lot of information already.

My question is, would this be a good approach to dimensional modeling, using the same SCD table as a dimension and as facts?

Thanks!

emonchen

Posts: 12
Join date: 2010-02-11
Age: 36
Location: Delft, The Netherlands

View user profile

Back to top Go down

Re: SCD Type 2 dimension and fact table being the same table

Post  ngalemmo on Wed Feb 17, 2010 9:54 am

No. You want to track employee movements in its own fact table. You would have an employee, organization unit, job, position dimensions and create a fact entry whenever any of this changes, based on a feed from your HR system. Include effective dates in the fact. You may also want to include previous org, job, position keys as well should HR wish to do analysis as to how employees move through the organization.

ngalemmo

Posts: 2570
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 2 dimension and fact table being the same table

Post  emonchen on Wed Feb 17, 2010 11:53 pm

Thanks for your reply.
I am not sure yet which information to put in the dimension table and what in the fact table. For example, my SCD dimension table looks like this:

EmployeeKey (PK)EmployeeIDNameLocationPositionSCD_StartTimeSCD_EndTime
11000John SmithNew YorkGeneral Manager2009-01-01NULL
21193Jane DoeLos AngelesSupport Manager2009-02-012009-10-01
21193Jane DoeChicagoSupport Manager2009-10-01NULL
31038Peter WalkerChicagoSales2009-04-01NULL

From this table, based on the start- and enddates of the SCD, I can see exactly for example how many employees did I have in Chicago on Spetember 30th? Because of the second line, Jane Doe is not included in that because on Sept 30th she was still in LA.

In other words, I would put things like FTE and vacation balances in the fact table on a periodic snapshot.

If I would add a line to the fact table whenever something changes, there will be a line added to the dimension table and also one line in the fact table. These two tables would grow the same way.

emonchen

Posts: 12
Join date: 2010-02-11
Age: 36
Location: Delft, The Netherlands

View user profile

Back to top Go down

Re: SCD Type 2 dimension and fact table being the same table

Post  ngalemmo on Thu Feb 18, 2010 12:58 pm

The problem boils down to what is a dimension and what is a fact. A fact table should be used to track business events and states. Where an employee works at a particular point in time is an example of a business state. The purpose of a dimension table is to provide context for a fact. With that in mind, you are trying to mix purposes into a single dimension table.

Employee, position, location and such should all be separate dimensions. Employee should not contain position and location information unless you are storing the current value and doing it for convience when loading facts. Facts that require position and location as part of its context should do so through foreign keys to those dimensions, NOT by trying to piggy-back that information through the employee dimension. Where an employee is (location, postition, etc) should be recorded as a fact with appropriate dimensions and effective dates.

Doing so greatly simplifies your model and cleans up the employee dimension. For example, you mention vacation balances as one of the measures you wish to track. A vacation balance belongs to an employee regardless of where they work, having an employee dimension that also represents his/her job history muddies the model and makes creating a snapshot problematic.

ngalemmo

Posts: 2570
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 2 dimension and fact table being the same table

Post  emonchen on Fri Feb 19, 2010 12:22 am

Let me give the same example that I am facing with the customer data. Right now, I have the following table for my customer dimension:

Customer DimensionKey (PK)
Customer ID (Business Key)
Customer Type
Customer Status
Customer Name
Customer Address
Customer Classification
Customer Size (# of employees)
...more geographical data fields
SCD_StartTime
SCD_EndTime

The fact table, which would be a monthly snapshot, contains the following fields:

SnapShotDate
Customer DimensionKey (FK to customer dimension)
Revenue (fact - the revenue for the customer for the snapshot month)
Number of orders (fact - the number of orders that this customer placed)

With this setup, I can always retrieve information from the data warehouse that answers questions like:

- How many customers did we have in September 2008 in Europe?
- How many customers in the IT industry (classification) do we currently have?
- How many orders were there placed in December 2009 in the US?
- How much revenue did customer X generate for 2009?
- How much revenue did we have in 2009 in California?

The last two facts are related to finances, so this could be included in the transactional data of the financial tables though.

My previous model was a monthly snapshot of the current situation. This also makes it possible to answer such questions, but on the other hand, every month ALL customers are loaded in the fact table then, also the customers where nothing has changed since the previous month. That makes the fact table explode in size quite fast!

emonchen

Posts: 12
Join date: 2010-02-11
Age: 36
Location: Delft, The Netherlands

View user profile

Back to top Go down

Another technique

Post  nash on Thu Mar 11, 2010 8:38 pm

There is another technique to handle an SCD type 2 dimension with same number of rows as the fact table. You will need to add the fact attributes in the dimension and track them there. I think this is also discussed in Kimball's DW Toolkit Book (check HR chapter).

Ciao

nash

Posts: 18
Join date: 2010-03-11

View user profile

Back to top Go down

Simplify for ease of loading and querying

Post  kt_raj1 on Tue Apr 24, 2012 1:30 am

Hi,

I believe that simplifying is a better option. Create employee,location,position dimensions separately having separate ids to identify in fact for tracking and answering the questions. That is my point of view. What do you feel?

We have to answer the business questions and looking from the standpoint of technology implementation and farsightedness in the long run.

Thanks and regards,
Raja

kt_raj1

Posts: 2
Join date: 2012-04-24

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum