Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Detecting Changes

3 posters

Go down

Detecting Changes Empty Detecting Changes

Post  SCAI_Andre Mon Jan 25, 2010 6:27 am

Hi,

I have a question regarding the pros and cons of the several approaches of detecting changes in dimensional data.

The ETL Toolkit mentions several techniques, but I'm not so much interested in the technical side of the issue (log scraping vs. daily dumps) but the logical view on the data:
Image an operational system storing the customer data in a table called t_customer. This table has a "last_changed_timestamp" column, set by a trigger.

I "could" extract all the customers from this table being changed in a given time frame.

But what I do not get from this design is ALL the changes that occurred in that time frame. I only get the current state of the customers being changed recently (if the timeframe ends on sysdate).

Isn't it bad design to be left with missing changes on dimensional information?

Regardless which load window I choose, I will always end up with the chance of loosing versions of the customer dimension which were present in the operational system, but will not be there in my DWH.

That alone would not be critical, but if the customer has caused transactions based on a version of its customer record, that would be a hit to the dwh, if I would assign that transaction to a different version of my customer dimension row.

E.g.:
Customers can have an attribute "Customer Value", containing the value of the customer to the company. It is measured from an algorithm in the operational system, since it causes the operational system to behave differently (granting different discounts on list prices, or something). I surely would try to reimplement the algorithm in the dwh, for consistency reasons on the algorithm with the operational system.

Customer A makes 10 transactions (orders) from 00:00h AM to 12:00h AM with CustomerValue='C'.
On 12:00h AM the algorithm updates CustomerValue='B'
From 12:00h AM to 04:00 PM Customer A performs 25 more transaction (CustomerValue='B').
On 04:00h AM the algorithm updates CustomerValue='A'
From 04:00h AM to 12:00 PM Customer A performs 5 more transaction (CustomerValue='A').

If I would upload the data to the dwh I would only see the last change that occurred on 04:00h PM, given that I have a daily load interval. Comparing to the old version of the customer dimension row, it would be a change from CustomerValue 'C' to 'A'.

I would loose the information that the 25 transactions between 12:00h AM and 04:00h PM have been made in the context of CustomerValue='B'.

If I would compare now the paid prices in the sales transactions with the products' list prices while considering CustomerValue discounts, I would notice that the transaction prices do not match ListPrice*CustomerValueDiscount, since CustomerValueDiscount is from the wrong CustomerValue-row ('A' not 'B', as would be correct).

Additionally I would see a change from CustomerValue from 'C' to 'A', which might not be possible as defined by the algorithm - and has not happened as such in the operational system.

Lots of talking, you might say my assumptions are not realistic, but the approach using the "LastChangedTimestamp" lets you end up using "intra-load-interval changes". And loosing dimensional changes will give you incorrect data. Right?

If you do not agree on this, please share your thoughts on this.

In either case: What do you see in the real world? Is this an issue, good designers take into consideration when design ETL interfaces?

What makes me nervous is that the ETL toolkit fails to leave a statement on this issue. And in my projects I seem to cause irritations with this requirement. Am I missing something?

Thanks in advance for your thoughts.

Regards,
Andre

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  ngalemmo Mon Jan 25, 2010 12:37 pm

The bottom line question is: Does it matter?

Probably 99.94% of the time it doesn't. Getting the state at the end of the business day is just fine. It is something for the business to decide. It is technically feasable to capture change moment by moment at the cost of a much more complex ETL environment and larger tables. If there is business value in doing so, and that value exceeds the costs, then it is justifiable.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  SCAI_Andre Mon Jan 25, 2010 1:03 pm

Well, the business always decides but it is consulted by the technicians Currently, the tech's are screaming loadly at the business side, that they have to think that they don't need it.

What are you doing then? Are you at least asking for every change in the source, or is the approach to get the end-of-business-day status of a dimensional value the only thing you do, if there is no explicit demand for something different?

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  ngalemmo Mon Jan 25, 2010 2:59 pm

Every time it comes up, the business is more than happy with end-of-day as far as the data warehouse goes. Intra-day variations are simply insignificant in the greater scheme of things. Also, most OLTP systems capture important customer characteristics with the transaction, so significant attributes are usually captured at the time of the transaction.

In cases where an ODS is also being implemented, the ODS usually receives a near real-time feed. While the ODS itself is almost always a current snapshot, it can be used for change capture to feed intra-day changes to the data warehouse if such data is desired.

I'm a bit confused with your comment about who want's what. Is the tech side saying they should capture intra-day changes or is it the business side. If it is the business side, the best approach is to outline to them what it would cost (time, money, etc...) and let them make an informed decision. The tech's job is to advise, communicate the issues and provide options.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  SCAI_Andre Tue Jan 26, 2010 7:07 am

I'm perfectly aware of the fact, that this "getting each change from the operational system"-thing is an expensive and complicated one.

But: Business users new to BI do not know what to demand and what to expect, which is the situation I'm currently in.

The internal IT of the company in this project just says "You don't need it! We never do it."

I, as an external consultant, think, "Check the cost first.", which is exactly what you suggest.

My question is now, what does everybody else outside do? Does every BI consultant simply assume, that
a) it is ok to loose intra-day changes and just design the etl process without checking back with the business,
b) first make check, if the business is aware of the implications and ask them, if the cost for getting each change should be checked?

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  BoxesAndLines Tue Jan 26, 2010 9:20 am

It depends. Put on your business hat and see if there is value in the intra day states. Your case occurs very infrequently though. I can't recall a transaction occurring in an OLTP application where they don't store the appropriate relationships for a period of time. Type 1 dimensions change all the time independent of the underlying facts and most of my dimensions are type 1. In nearly all cases, the business folks want current view.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  ngalemmo Tue Jan 26, 2010 12:17 pm

What does everyone else do? If you go with 'A', which is the classic 'We are IT and we know what you want better than you' stance, the project is doomed to failure... not because of functionality, but because of mistrust and lack of communication between the two groups.

I agree with B&L that most of the time businesses are only interested in the current state. But, not all the time. So, you need to find out what is most important to the business. I suggest you go back to requirements definition and start anew, and frame the requirements in terms of what kinds of analysis needs to be done. Rather than frame requirements in technical terms, such as needing to capture every little change, frame them in terms of what they ultimately want to be able to do. What are the key metrics, what do they want to know about the business, their customers, the performance of their operating units. Build to meet those requirements, not some theoretical concept someone latched onto because they read an article in a trade magazine.

Also, put together some scenarios with the different approaches tied to the metrics they want. For example, take a metric and demonstrate how it would be different if there was intra-day change capture versus end-of-day change capture (or no change capture at all... ie type 1). Is there a difference? If there is, is it so significant would it have an effect on decisions they make from the data?

--Addendum--

What you describe is a tough situation. As the consultant, you are in the middle of it. Ultimately what happens is the the decision of the project sponsor, its his/her budget after all. Do what you can, in the most positive manner possible, to present the options and let the powers to be make an informed decision. Then do what they say.


Last edited by ngalemmo on Tue Jan 26, 2010 4:53 pm; edited 1 time in total (Reason for editing : Afterthough)
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  SCAI_Andre Wed Jan 27, 2010 6:46 am

Thanks for your input!

I do work that way (EDIT) and thank you for posting this approach as a best practice for others and for reassuring me, that I'm on track here.

But maybe have not asked clearly enough what I really want to know from you and all the others out there on this specific issue:
What do YOU usually do? Do you always ask the business if it wants intra-day changes and evaluate the cost, or do you simply assume that the business does not need if it does not explicitly ask for it in a requirement?

I want to find out what other consultants do usually to get a metric how unusual my approach on this is.

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  SCAI_Andre Wed Jan 27, 2010 7:01 am

please note that I edited my post.

Thank you!

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  ngalemmo Wed Jan 27, 2010 12:56 pm

SCAI_Andre wrote:What do YOU usually do? Do you always ask the business if it wants intra-day changes and evaluate the cost, or do you simply assume that the business does not need if it does not explicitly ask for it in a requirement?

What I would do, as a consultant, is to work within the stated requirements. That is, essentially, your contract. If the issue comes up, its a change in scope and handle it as a change order with appropriate estimates as to how it changes the timeline, cost and so on. The client will decide if they want to go ahead with it or not. You need to stay neutral, and give your best advice when asked... and document everything.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  SCAI_Andre Thu Jan 28, 2010 3:37 am

"work within the stated requirements"
As mentioned above, nothing stated, nothing mentioned in the contract.

Let me repeat my question:
What do YOU usually do? Do you always ask the business if it wants intra-day changes and then evaluate the cost, or do you simply assume that the business does not need it if it does not explicitly ask for it in a requirement?

Other participants welcome here.

SCAI_Andre

Posts : 7
Join date : 2010-01-25
Location : Leverkusen, Germany

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  ngalemmo Thu Jan 28, 2010 1:20 pm

It depends on why I was engaged. If I am asked to design the DW, then of course, I would bring up the issue of change capture and how/when it would be done. If I was engaged to lead a development and deployment project for a DW that was already designed, I would assume the issue has been settled.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Detecting Changes Empty Re: Detecting Changes

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum