Event data DW Model DIM or Fact
2 posters
Page 1 of 1
Event data DW Model DIM or Fact
Hi,
I have a requirement to capture meter event information. the raw data looks like this:
meterid,StartTime, EndTime, NumberEvents,EventName,EventMessage
1 2012-04-30T00:00:00.000-04:00, 2012-04-30T15:24:33.000-04:00,2 ,LAST_GASP,NIC power fail
the question is do I create a factlessfac table to capture this information? or a Dimension with the above information
with fk relation to Fact? how would you tackle this in a design?
Thanks,
I have a requirement to capture meter event information. the raw data looks like this:
meterid,StartTime, EndTime, NumberEvents,EventName,EventMessage
1 2012-04-30T00:00:00.000-04:00, 2012-04-30T15:24:33.000-04:00,2 ,LAST_GASP,NIC power fail
the question is do I create a factlessfac table to capture this information? or a Dimension with the above information
with fk relation to Fact? how would you tackle this in a design?
Thanks,
dim67- Posts : 15
Join date : 2012-05-05
Re: Event data DW Model DIM or Fact
I assume the NumberEvents is a measure of some sort. If not then I would combine it with EventName and Eventmessage in the Meter_Event dimension. Create standard dimensions for Date and Time and use them to role play for the State and End Times. I assume that the Meter_ID links to a source table that has info about the meter such as model, location, etc.
Fact Table
Meter_Skey
Start_Date_sKey
State_Time_skey
End_Date_sKey
End_Time_skey
Meter_Event_skey
NumberEvents
Meter_Event_dimension
Meter_Event_skey
EventName
Eventmessage
Fact Table
Meter_Skey
Start_Date_sKey
State_Time_skey
End_Date_sKey
End_Time_skey
Meter_Event_skey
NumberEvents
Meter_Event_dimension
Meter_Event_skey
EventName
Eventmessage
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Event data DW Model DIM or Fact
yes the NumberEvents is a meassure, in the example provided is = 2. two different events were recorded for that particular meter in that time.
the file is provided every 6 hours and an event may or may not occur for that meter. yes meter id is a source xml file
that links to a meter dimension table.
thanks
the file is provided every 6 hours and an event may or may not occur for that meter. yes meter id is a source xml file
that links to a meter dimension table.
thanks
dim67- Posts : 15
Join date : 2012-05-05
Re: Event data DW Model DIM or Fact
even though the NumberEvents is a meassure, it does not tell me how many events of a particular type occured for a meter.
all it says is x number of events occurs during this time frame. what I need to capture is the event_type and detail and the number of occurrences for that
particular detail. so in other words I am thinking about putting the EventName in the fact table. Does it make sense?
thanks,
all it says is x number of events occurs during this time frame. what I need to capture is the event_type and detail and the number of occurrences for that
particular detail. so in other words I am thinking about putting the EventName in the fact table. Does it make sense?
thanks,
dim67- Posts : 15
Join date : 2012-05-05
Re: Event data DW Model DIM or Fact
The EventName doesn't belong in the Fact table. You get the same information by keeping the EventName in the Event_Dimension.
Everything that is in a dimension is logically in the fact table as long as the fact table contains the dimension key and it will query faster.
Everything that is in a dimension is logically in the fact table as long as the fact table contains the dimension key and it will query faster.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Event data DW Model DIM or Fact
I see your point. It is truly a facless fact table. thanks for the help.
one last question, why query will be faster?.
thanks in advance
one last question, why query will be faster?.
thanks in advance
dim67- Posts : 15
Join date : 2012-05-05
Re: Event data DW Model DIM or Fact
It will run faster because there's less data. When you filter on the EventName in the dimension table, it's going to scan through a table with many fewer rows than if it were in the fact table. Once it finds the rows that have the info you are filtering on, it's going to look for rows in the fact table with the corresponding skey values. The skey should be an integer (faster performance) and the skey value on the fact table could defined as foreign key (faster performance), and it could be indexed on the fact table (faster).
Plus, less info is being sent to the server. Lets say your filter results in 5 rows from the dimnesion table but 10 million in the fact table. Lets say the data you want 1 field from the dimension table and it's a varchar(30). Essentially, you are pulling 170 bytes of data, give or take (30 bytes for the field and 4 bytes for the skey) from the dimension table and the 10 million * 4 bytes (for the skey on the fact table). But if the 1 field that you need is on the fact table, then it's 10 million * 30. I'm overly simplifying of course.
Also, let's say you are using something like Cognos Query Studio to run an ad hoc report and you want to filter on EventName. Cognos is going to do a complete table scan to pull all of the potential values for EventName and then offer them up in a drop down box. Scanning through the entire fact table will take much, much longer than scanning through the entire dimension table. And yes, you can improve performance by creating an index on the EventName if it were on the fact table but that solution would use more disk space.
Plus, less info is being sent to the server. Lets say your filter results in 5 rows from the dimnesion table but 10 million in the fact table. Lets say the data you want 1 field from the dimension table and it's a varchar(30). Essentially, you are pulling 170 bytes of data, give or take (30 bytes for the field and 4 bytes for the skey) from the dimension table and the 10 million * 4 bytes (for the skey on the fact table). But if the 1 field that you need is on the fact table, then it's 10 million * 30. I'm overly simplifying of course.
Also, let's say you are using something like Cognos Query Studio to run an ad hoc report and you want to filter on EventName. Cognos is going to do a complete table scan to pull all of the potential values for EventName and then offer them up in a drop down box. Scanning through the entire fact table will take much, much longer than scanning through the entire dimension table. And yes, you can improve performance by creating an index on the EventName if it were on the fact table but that solution would use more disk space.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» Data model for Reporting needs - Event based or fact based
» Cash Targets Data Model Fact
» data model for 2 fact tables (Header / Detail scenario)
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» Accumulating Snapshot Fact Table Data Model (Order Management)
» Cash Targets Data Model Fact
» data model for 2 fact tables (Header / Detail scenario)
» Model available time per day in timesheet data warehouse: fact or dimension or neither?
» Accumulating Snapshot Fact Table Data Model (Order Management)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum