Two-Part Fact Design Question
4 posters
Page 1 of 1
Two-Part Fact Design Question
OK. The question isn't two-part...the Fact is.
I have a source "Event-Log" table that clearly qualifies as a Transaction Fact, containing some junk, regular, and degenerate Dimension attributes: Event-ID, Event-Date, Event-Type, Action-Code, Cycle#, Department-ID, Employer-ID, etc.
However, there is a corresponding "Message" that is optionally associated with Event. A "Message" is essentially a text string that was optionally sent to the Employer in question, as part of the "Event". Each Message is stored in a different source table and they are grouped into "Message Definition Type" categories. Each "Message Type" is essentially a text-block template, containing one or more embedded positional parameters ({0}, {1}, {2}...), like you would see in a word processing mail-merge template. Each Message also has 0..N "Message Value" children entries, which are the actual values that were substituted into the template to form the completed message.
Since a Message cannot correspond to more than one Event, and vice-versa, they at the same grain. I need to keep track of the fully-formed Message (not the template), as well as the Event. I can do this by replacing the 0..N Message values into the template text.
Problem is, there are 70M Events and about 95% of them have a corresponding Message. My instinct was to create a Message Dimension for the Event Log Fact, containing the substituted Message Text, but that's a 1:1 Dim:Fact with ~70M records in each. Not exactly a good Dim:Fact situation.
Since the Message is essentially just a [Message#, Message-Type, and Message-Text], it doesn't really lend itself to being a fact.
To be honest, Message feels more like Operational Data, not Dimensional. But since I'll have an Event Log Fact, I need to find a way to design in the Message data with it.
I'm hoping the business decides that they don't really need to see the Message text, and only needs to track the Message-Type of each Event Log Fact. I'm working on this but may not convince them of it.
What's the general approach to this situation, when you have a large volume 1:1 Dim:Fact?
What is the approach when you have to account for Dimensional data that has corresponding Operational Data?
I have a source "Event-Log" table that clearly qualifies as a Transaction Fact, containing some junk, regular, and degenerate Dimension attributes: Event-ID, Event-Date, Event-Type, Action-Code, Cycle#, Department-ID, Employer-ID, etc.
However, there is a corresponding "Message" that is optionally associated with Event. A "Message" is essentially a text string that was optionally sent to the Employer in question, as part of the "Event". Each Message is stored in a different source table and they are grouped into "Message Definition Type" categories. Each "Message Type" is essentially a text-block template, containing one or more embedded positional parameters ({0}, {1}, {2}...), like you would see in a word processing mail-merge template. Each Message also has 0..N "Message Value" children entries, which are the actual values that were substituted into the template to form the completed message.
Since a Message cannot correspond to more than one Event, and vice-versa, they at the same grain. I need to keep track of the fully-formed Message (not the template), as well as the Event. I can do this by replacing the 0..N Message values into the template text.
Problem is, there are 70M Events and about 95% of them have a corresponding Message. My instinct was to create a Message Dimension for the Event Log Fact, containing the substituted Message Text, but that's a 1:1 Dim:Fact with ~70M records in each. Not exactly a good Dim:Fact situation.
Since the Message is essentially just a [Message#, Message-Type, and Message-Text], it doesn't really lend itself to being a fact.
To be honest, Message feels more like Operational Data, not Dimensional. But since I'll have an Event Log Fact, I need to find a way to design in the Message data with it.
I'm hoping the business decides that they don't really need to see the Message text, and only needs to track the Message-Type of each Event Log Fact. I'm working on this but may not convince them of it.
What's the general approach to this situation, when you have a large volume 1:1 Dim:Fact?
What is the approach when you have to account for Dimensional data that has corresponding Operational Data?
ThomVF- Posts : 3
Join date : 2014-06-26
Two Part Fact Design
Hi Thom.
You could store the message with the fact, but I would tend to either 1) create a factless fact or 2) create it as a dimension. If you created it as a dimension, the users could use it to filter by message type, etc.
I'm interested to know what you decide :-)
You could store the message with the fact, but I would tend to either 1) create a factless fact or 2) create it as a dimension. If you created it as a dimension, the users could use it to filter by message type, etc.
I'm interested to know what you decide :-)
PLYounger- Posts : 8
Join date : 2012-03-16
Location : Washington DC
Re: Two-Part Fact Design Question
If the message is 1:1 with event, you could just store it in the event dimension. The message value data could be a bridge based on event or you could incorporate the values in the stored message, however if these values are derived based on the specific message target and are actually attributes of the target, I wouldn't store them at all… at best just store the metadata related to the value.
Re: Two-Part Fact Design Question
I assume that you are not using the message data for analysis purposes, you just want to be able to display it in a report along with the event it relates to? In which case just put it in a table with an FK to the Fact table PK and just join to it in your reporting tool when you need to display it - effectively you aren't treating it as part of a dimensional model, it is just transactional data.
Kimball discusses this in his book (search for "textual comments") and his recommendation is either use a Comments Dim (if cardinality of comments is less than the grain of the fact) or treat it as an attribute in the transaction dimension but says this second option is likely to give you a performance hit - hence my preference for taking out of the dimensional model entirely. In your reporting tool you'd run a subquery to select all the events you want to report on (using your dimensional model) and then join this dataset to your standalone message table
Kimball discusses this in his book (search for "textual comments") and his recommendation is either use a Comments Dim (if cardinality of comments is less than the grain of the fact) or treat it as an attribute in the transaction dimension but says this second option is likely to give you a performance hit - hence my preference for taking out of the dimensional model entirely. In your reporting tool you'd run a subquery to select all the events you want to report on (using your dimensional model) and then join this dataset to your standalone message table
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Similar topics
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Question on Multiple fact table design
» Design Question - Multiple Fact Tables at the same Grain
» Accumulating Snapshot and Transaction Fact tables : question to design and use them together
» Dimension design question
» Question on Multiple fact table design
» Design Question - Multiple Fact Tables at the same Grain
» Accumulating Snapshot and Transaction Fact tables : question to design and use them together
» Dimension design question
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum