Modelling two independent business processes related to a single event

sphenodontidan Mon Sep 28, 2015 3:44 am

Hi all

I have modelled the following business processes and am seeking opinions on whether or not this conforms to dimensional modelling practice, or if there is another way to model this in a dimensional context.

The process is:

An "Event" occurs, for which two independent processes optionally record: (a) any "Participants" and the "Role" each one played in the event, and (b) a means of "Categorising" the event.

In theory this seems relatively straightforward, I could create a fact table for EventParticipantRoles and EventCategories, with attributes of the Event as dimensions for each table, e.g. location, date, time, person, etc. Unfortunately the process is a little complex because each Event may have zero to many Participants and/or Categories.

The business need to be able to identify (for example):
- the number of specified Categories by location over time
- changes in the number of Events for any Category of interest over time
- the total number of Participants
- the total number of Participants by Role
- details of Participants by Role played
- etc.

So far, so good. Each of the above requirements could be met by modelling two fact tables. However, the information needs to be combined for the following requirements, which is where it gets tricky:

- details of Participants involved in specified Categories
- Categories that involve specific Roles
- counts of Events for specified location, Category and Participant Roles
- etc.

If all of our users were proficient with SQL then I could just model two fact tables, each with the unique Event identifier, and the users could join the tables using that. However, this isn't an acceptable solution because that is creating a fact-to-fact join, which is not permitted in dimensional modelling. In any case, our users do not use SQL - we use web-based reporting tools.

So how to model the Category to Participant and Role data? The solution that has been implemented is:

A fact table called EventCategoryParticipant. This has one row for each unique combination of Event, Participant, Role and Category. If either one of the Category or Participant is not yet recorded in the source system then we populate the relevant columns with 'UNK'. Note that Events that do not have either are not included in the table at all (we report on measures at the Event-level with a more straightforward EventFact table).

Due to the fact that each Event can have zero-to-many Categories or Participants and Roles, the EventCategoryParticipant table contains multiple records for each Event, for all combinations of Category, Participant and Role. I have added count columns for CategoryCount, ParticipantCount and ParticipantRoleCount to the table and this allows me to count the number of distinct Categories, Participants and Participant Roles for each unique Event.

The table has been designed as a transactional fact table, logically updating records when new data is received from the source system during overnight acquisition and ETL processing. For example, if an Event has been Categorised on day 1, but does not have a Participant recorded until day 2, then on day 1 a single record will exist in the table, with "UNK" values for all Participant-related columns. On day 2, the single row will be updated so the "UNK" values are replaced with the relevant Participant values. If on day 3 a second Category is added to the Event then a second row will be added to the table for that Event, with new Category data, but the Participant data will be duplicated for all relevant columns. This can be extended for multiple Participants as well, which would result in rows for the "Number of Categories" multiplied by the "Number of Participants and Roles" for each Event.

This is a little complex but it is working. However, the design has been challenged during peer review, so the issue is being put to a wider audience to seek feedback/ideas.

It will be interested to see if anyone has encountered similar problems with many-to-many fact data and how many different ways it has been resolved by the community...

A search of the forums reveals that perhaps we have modelled our processes as per the final sentence in the response to this post.

Regards
Christopher

zoom Mon Sep 28, 2015 10:34 am

A transaction table should have a lowest level of detail. Once a row is inserted in the Fact table, then it is never updated. So a trans table should only have inserts. When you said participant information is repeated in the fact table, do you mean to say that you keep text values for a participant in the Fact table? If you do keep text value in the fact table then it means you did not create a participant Dim. If you had a participant Dim then only participant Dim Ids would repeat in the Fact table, which is acceptable.

I do not see an issue with your design, other than keeping text in the fact table. You need 1 fact table.

Based on your description, I would have following Dims:

1) Participants dim--- people or non-living things
2) Location dim
3) Event dim
4) Role playing dim for Category and Role
5) Date dim

sphenodontidan Mon Sep 28, 2015 1:57 pm

Hi Zoom, thanks for the feedback

The repeated information in the table are participant and role id's, or category id's. There is a dim for the both the Participants and for the Category, and also Role, Location, Event (based on a view of the EventFact to dimensionalise some attributes of the Event), Date and Time.

As for updating the Fact, we insert a row for every change of detail. We could leave in the rows that appear on day 1 and then just insert additional rows when a Category is added to a record that only previously had Participants (or vice-versa), but then we'd have to filter out the "day 1" rows when reporting. It is easier for us to not have to do that so we logically replace the "UNK" values with the id's for either Categories or Participants/Roles instead. In ETL we physically delete the row and add a new one, but logically it's just an update.

One further note in case anyone is wondering: we do actually use surrogate keys for everything.

Sponsored content

Modelling two independent business processes related to a single event

How would you model this?

Modelling two independent business processes related to a single event

Modelling two independent business processes related to a single event

Re: Modelling two independent business processes related to a single event

Re: Modelling two independent business processes related to a single event