how to model an event log
Page 1 of 1
how to model an event log
I am trying to model the following event data:
Field1 = event date, field2 = event code, field3 = location code, field4 = customer_id, fields5 to 8 are event code specific, ie. variable.
Field1 Field2 Field3 Field4 Field5 Field6 Field7 Field8
08/07/2009 2:21:47 PM jr 119 90252969 320 951 1 0
08/07/2009 2:21:54 PM jx 119 90252969 998 125873
08/07/2009 2:22:40 PM ai 119 90252969 4026
First 4 columns are common and will map nicely to our dimension tables.
Log files are about 5Gb per day and 17 million rows, extrapolate that out over a year and we're talking about 6 billion rows and 2Tb.
My modelling dilema is that values in fields5 to 8 vary by event_code.
My initial thought was to parse the file into different tables based on the event code BUT one of the requirements is the ability to analyze the sequence of events.
For example an event (jr) happened at 2:21:47, followed by 2 more events (jx) and (ai) with each code having different attributes in fields 5,6,7,8.
I've always been a Kimball star schema sort of guy but wonder if I should deviate in this situation and break the fact table out into a multi-table design. A main fact table containing the first 4 common attributes and then related tables for each different event type and associated attributes.
Any thoughts on this approach would be most appreciated.
Field1 = event date, field2 = event code, field3 = location code, field4 = customer_id, fields5 to 8 are event code specific, ie. variable.
Field1 Field2 Field3 Field4 Field5 Field6 Field7 Field8
08/07/2009 2:21:47 PM jr 119 90252969 320 951 1 0
08/07/2009 2:21:54 PM jx 119 90252969 998 125873
08/07/2009 2:22:40 PM ai 119 90252969 4026
First 4 columns are common and will map nicely to our dimension tables.
Log files are about 5Gb per day and 17 million rows, extrapolate that out over a year and we're talking about 6 billion rows and 2Tb.
My modelling dilema is that values in fields5 to 8 vary by event_code.
My initial thought was to parse the file into different tables based on the event code BUT one of the requirements is the ability to analyze the sequence of events.
For example an event (jr) happened at 2:21:47, followed by 2 more events (jx) and (ai) with each code having different attributes in fields 5,6,7,8.
I've always been a Kimball star schema sort of guy but wonder if I should deviate in this situation and break the fact table out into a multi-table design. A main fact table containing the first 4 common attributes and then related tables for each different event type and associated attributes.
Any thoughts on this approach would be most appreciated.
robber- Posts : 41
Join date : 2009-02-28
Location : Canada
Similar topics
» Event data DW Model DIM or Fact
» Data model for Reporting needs - Event based or fact based
» Rule based algorithm to convert an ER model to a dimensional model
» Swiss cheese and dimensional design
» Complexities of Relational Model and Simplicities of Dimensional Model
» Data model for Reporting needs - Event based or fact based
» Rule based algorithm to convert an ER model to a dimensional model
» Swiss cheese and dimensional design
» Complexities of Relational Model and Simplicities of Dimensional Model
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum