how to model an event log

robber Thu Jul 16, 2009 12:00 pm

I am trying to model the following event data:
Field1 = event date, field2 = event code, field3 = location code, field4 = customer_id, fields5 to 8 are event code specific, ie. variable.

Field1 Field2 Field3 Field4 Field5 Field6 Field7 Field8
08/07/2009 2:21:47 PM jr 119 90252969 320 951 1 0
08/07/2009 2:21:54 PM jx 119 90252969 998 125873
08/07/2009 2:22:40 PM ai 119 90252969 4026

First 4 columns are common and will map nicely to our dimension tables.
Log files are about 5Gb per day and 17 million rows, extrapolate that out over a year and we're talking about 6 billion rows and 2Tb.

My modelling dilema is that values in fields5 to 8 vary by event_code.
My initial thought was to parse the file into different tables based on the event code BUT one of the requirements is the ability to analyze the sequence of events.
For example an event (jr) happened at 2:21:47, followed by 2 more events (jx) and (ai) with each code having different attributes in fields 5,6,7,8.

I've always been a Kimball star schema sort of guy but wonder if I should deviate in this situation and break the fact table out into a multi-table design. A main fact table containing the first 4 common attributes and then related tables for each different event type and associated attributes.

Any thoughts on this approach would be most appreciated.