Measuring transit-times between varying start- & end-states

rburcksen Thu Oct 21, 2010 11:57 am

Please find below a business case outlining the user requirements for an ad-hoc reporting environment and for which I would like to invite your comments on the solution I have come up with. I’m new to data-modelling and I value any suggestions you may have to improve my model.

Background
This ad-hoc reporting environment will be used to report on civil court cases. During the lifecycle of a court case the status of a court case changes multiple times until the final resolution..
In order to mark a current status for a court case as ‘complete’, the transactional system requires that the user selects a new status. Only then will the current status be marked as complete.

A sample of typical court cases and their statuses (A/B/C etc):
Note: The statuses are listed in their chronological order. Each combination of case & status is associated with a date and a completion flag.

Case 1: A > B > C > D > E > F > G > H > I> J
Case 2: A > D > F > G > J
Case 3: A > B > C > F > B > C> H > I

As you can see above, the number of statuses for each court case can be different. Not all court cases go through the same statuses. ( Case 2 is never assigned status B or C).
Also the order in which the statuses are applied can vary (Case 3 is ‘reset’ to status B after reaching status F)

User requirements
The data-model needs to support the following two requirements:
1. Users want to have the flexibility to choose a begin- and endstatus between wich the report should calculate the (average) transit-time.
Example: If a user wants to measure the transit time between status A & D, then case 1 & 2 need to be reported.
2. Users want to be able to report on the number of court cases that have reached a specific status.
Example: If a user wants to report how many court cases are pending status I then only case 3 should be reported.

Data model
The fact-table that I have come up with has the following structure:

ROW, CASENR, BEGIN_STATE, BEGIN_DATE, END_STATE, END_DATE, COMPLETE_FLAG
1, 1, A, begindate, B, enddate,Y
2, 1, B, begindate, C, enddate,Y
3, 1, C, begindate, D, enddate,Y
4, 1, D, begindate, E, enddate,Y
5, 1, E, begindate, F, enddate,Y
6, 1, F, begindate, G, enddate,Y
7, 1, G, begindate, H, enddate,Y
8, 1, H, begindate, I, enddate,Y
9, 1, I, begindate, J, enddate,N
10, 2, A, begindate, D, enddate,Y
11, 2, D, begindate, F, enddate,Y
12, 2, F, begindate, G, enddate,Y
13, 2, G, begindate, J, enddate,N
14, 3, A, begindate, B, enddate,Y
15, 3, B, begindate, C, enddate,Y
16, 3, C, begindate, F, enddate,Y
17, 3, F, begindate, B, enddate,Y
18, 3, B, begindate, C, enddate,Y
19, 3, C, begindate, H, enddate,Y
20, 3, H, begindate, I, enddate,N
In this model I’m effectively splitting the lifecycle of a court case into individual tracks, each
with its appropriate begin- and enddate & completion flag.

This will allow me to meet requirement #2 (select all rows with END_STATE = ‘I’ and COMPLET_FLAG = ‘N’ will report case 3 (row 20)).
This model will not meet requirement #1 (select all rows with BEGIN_STATE = ‘A’ and END_STATE = ‘D’ will only report case 2 (row 10).

In order to also meet requirement #1 I could also use the same structure but instead of populating the table with only the chronological tracks, I can populate the table for each possible chronological combination of begin- and end state, like so:
…
66, 2, A, begindate, D, enddate,Y
67, 2, A, begindate, F, enddate,Y
68, 2, A, begindate, G, enddate,Y
69, 2, A, begindate, J, enddate, N
70, 2, D, begindate, F, enddate,Y
71, 2, D, begindate, G, enddate,Y
72, 2, D, begindate, J, enddate, N
73, 2, F, begindate, G, enddate,Y
74, 2, F, begindate, J, enddate, N
75, 2, G, begindate, J, enddate,N
…

I’ve only created the example for case 2 since by creating every possible chronological combination of begin- and end state, the number of records grows significantly (for case 2 from 4 to 10 rows, for case 1 from 9 to 45 rows…).
This solution will require more time to repopulate the table during each production-run. It will generate records for combinations of statuses which from a business perspective make no sense. Also running a report will take additional time while the query is ploughing through the massive table (estimated at 100 million records).

I’m pretty sure there must be a better/smarter data model to support these requirements. I welcome any suggestions you may have!

Thanks!

ngalemmo Thu Oct 21, 2010 12:31 pm

You can answer #1 with the structure you have. You just need to do a self-join on the fact table...

select
a.CASENR,
a.BEGIN_STATE,
a.BEGIN_DATE,
b.BEGIN_STATE,
b.END_DATE
from case_fact a, case_fact b
where a.CASENR = b.CASENR
and a.BEGIN_STATE = (start state)
and b.BEGIN_STATE = (end state)
and b.COMPLETE_FLAG = true (or whatever filtering you need to do)

rburcksen Tue Oct 26, 2010 10:41 am

Nick, thanks a lot for your input!

I was succesful in meeting my business requirements.

Cheers,

Randall

Sponsored content

Measuring transit-times between varying start- & end-states

Measuring transit-times between varying start- & end-states

Re: Measuring transit-times between varying start- & end-states

Re: Measuring transit-times between varying start- & end-states

Re: Measuring transit-times between varying start- & end-states