Expand dimension or introduce new dimension?

revdpoel Wed Aug 08, 2012 8:54 am

I have a dimension which contains information about an ATM (Automatic Teller Machine).
Let’s call it DIM-ATM. (A part of the) attributes it contains are:
1. Atm-key (surrogate-key)
2. Atm-id
3. Atm-type

Of these attributes history has to be kept
There is a separate history table where starting-date, up-until-date and recent-ind are used for keeping history. DIM-ATM-history:
1. Atm-history-key (surrogate-key)
2. Starting-date
3. Up-until-date
4. Recent-ind
5. Atm-key (surrogate-key)
6. Atm-id
7. Atm-type

Above information we get from source A.

Now source B is going to send us month ultimo some more information about an ATM. Information when an ATM will be filled with money. We get these attributes:
1. frequency-ind (1 = once a week, 2 = once every two weeks, 3 = once every three weeks, 4 = once every four weeks)
2. Monday (value X if Monday is a serviceday, otherwise space)
3. Tuesday (value X if Tuesday is a serviceday, otherwise space)
4. Wednesday (value X if Wednesday is a serviceday, otherwise space)
5. Thursday (value X if Thursday is a serviceday, otherwise space)
6. Friday (value X if Friday is a serviceday, otherwise space)
7. start-date of this frequency (can change every month)

Little explanation.
When frequency-ind = 1 and Monday and Thursday contain an ‘X’, it means that every week on Monday and Thursday the ATM is filled.
When frequency-ind = 2, it means that every other week on Monday and Thursday the ATM is filled.
The start-date tells when the frequency starts. Assume the frequency (with frequency-ind = 1) started on 05-01-2012, then in the month of August 2012 the ATM will be filled 9 times. Assume the very first frequency with frequency-ind =1 will start on starting-date 17-08-2012, then in the month of August 2012 the ATM will be filled 4 times.

The user wants to know for each month how many times the ATM had to be filled (Which, in our example, doesn’t say it is actually filled 9 times. For instance the 4th time they might have seen there is still so much money in the machine that they can skip the 5t time.) and they also want to know on which dates (so for August this will be 7,9,14,16,21,23,28,30 August).

How am I going to model this?
Store the 7 attributes from source B in DIM-ATM and DIM-ATM-history? I presume starting-date in DIM-ATM-history is just going to be filled with system-date (as happens now) and not with start-date from source B? Or do I create a new table DIM-ATM-frequency so starting-date of DIM-ATM-history won’t interfere with start-date from source B? Attributes:
1. ATM-frequency-key (surrogate)
2. ATM-key
3. Starting-date
4. Frequency-ind
5. Monday
6. Tuesday
7. Wednesday
8. Thursday
9. Friday

ATM-key and starting-date are logical primary key in this table.

And then introduce a fact-table FAC-ATM, containing
1. Day-key
2. Month-key
3. ATM-history-key
4. ATM-frequency-key (only when attributes are separately stored in DIM-ATM-frequency)
5. Number (filled with 1 for each row)?

For the above example 9 rows will then be added to FAC-ATM.

Please advice.

ngalemmo Wed Aug 08, 2012 8:25 pm

It's not a lot of attributes. There does not appear to be any reason why you couldn't just add them to the existing dimension. Having another table only has disadvantages... you can't use the attributes with other facts that use the existing dimension, and it adds an extra FK to the facts.

revdpoel Thu Aug 09, 2012 4:22 am

Thanks

Ok, I can store the seven new attributes in DIM-ATM and DIM-ATM-history.
But don't I get a problem then with starting-date in the history table interfering with the start-date from source B
The history tablë:

1. Atm-history-key (surrogate-key)
2. Starting-date
3. Up-until-date
4. Recent-ind
5. Atm-key (surrogate-key)
6. Atm-id
7. Atm-type
8. start-date source B
9 and so on. other attributes form source B

Suppose in the past, on 2012-01-01 I recieved a record form source B with start-date 2012-01-16
This row is then inserted in history table ( i don't mention all the attributes)

1. 8989
2. 2012-01-01
3. 9999-12-31
4. Y
5. 12
7. AAAA
8. 2012-01-16

Suppose then on 2012-08-01 the frequency changes and i get a record from source B having start-date 2012-08-14. Also source A gives me a changed ATM-type on 2012-08-01.
Then my history will look like this

row 1
1. 8989
2. 2012-01-01
3. 2012-07-31
4. N
5. 12
7. AAAA
8. 2012-01-16

row 2
1. 8989
2. 2012-08-01
3. 9999-12-31
4. Y
5. 12
7. BCBC
8. 2012-08-14

Now the question on 2012-10-01 is:
What was the value of start-date and the other attributes of source B on 2012-08-07? And what was the value of ATM-type?
Then you are used to look at starting-date and up-until-date in the history table. If you do so, the answer will be that the value of ATM-type was BCBC, which is ok.
The value of start-date from source B was 2012-08-14. But that is wrong, since the changes frrm source B start on 2012-08-14 and not 2012-08-01. In this perspective the starting-date 2012-08-01 is the moment I received the mutation.

So for source A the starting-date (attribute 2) is the date the change is valid. For source B attribute 2 indicates when I received the mutation, attribute 8 tells me when it becomes valid.

In my opinion this makes things very difficult. When only for instance only attributes of source A change, I get multiple rows where start-date attribute source B is equal.
How to solve?

Please advoce
Thanks Ron

ngalemmo Thu Aug 09, 2012 6:16 pm

The purpose of start and end dates in a type 2 dimension is to reflect the effective period of all attributes in the row. While you may have other columns that contain dates relating to specific sources, those are not the ones to use to find a particular row.

You may have logic in your update process to perform an update in place if the second source updates the same row on the same day (to avoid a lot of extra type 2 rows) but it has nothing to do with the structure of the table.

revdpoel Thu Aug 09, 2012 6:28 pm

OK
but the start-date of source B tells me when a new frequency gets valid.
What frequency was valid on 2012-08-07 and how do I find that one, how do I query?

ngalemmo Thu Aug 09, 2012 8:47 pm

You always use the row's effective period. The source doesn't matter. The row effective period tells you what the value of the attributes were during a particular period of time.

If source A updates, you create a new row with a new effective period. The source B attributes haven't changed, but so what? They are in both current and history with proper effective periods. The only issue is if source A and B both change during the same day. One of two things will happen, you create a new history row with the old one having a start and end on the same day, which would be ignored in any query, or you check the dates in the update process and do an update in place rather than creating a history row.

If you are concerned about fact foreign keys to the dimension that were assigned before the source B updates were applied in the same day, then I would suggest you use the modified update process and do an update in place of the source B attributes when they occur in the same day. Prior facts would see the updated attributes. You use the same logic in source A and B updates so it won't matter which ones were applied first in the same day.

revdpoel Fri Aug 10, 2012 3:17 am

Thanks very much
I understand your answer
the problem in this issue is that source B might send me on the first day of the month a start-date which will be valid in the future.

What frequency was valid on 2012-08-07 and how do I find that one, how do I query?
Was it the frequency referred to in history row 2?
On base of the starting-date 2012-08-01 one would say Yes. But the answer is No, because the start-date of the frequency is 2012-08-14
Was it the frequency referred to in history row 1?
On base of the end-date 2012-07-31 one would say No. But the answer is Yes, because the start-date of the frequency is 2012-01-16 and is still valid

Do you understand the problem?

ngalemmo Fri Aug 10, 2012 2:49 pm

Ok, that's different. I would use another dimension table in situations like that. Trying to coordinate different effective periods in type 2 dimensions is cumbersome. Basically it would involve queueing the update and applying it to the dimension when it goes into effect.

revdpoel Fri Aug 10, 2012 4:42 pm

thnx for all the help
really appreciate it

Sponsored content

Expand dimension or introduce new dimension?

Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?

Re: Expand dimension or introduce new dimension?