Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Where should the ETL be ?

4 posters

Go down

Where should the ETL be ? Empty Where should the ETL be ?

Post  Manoj Panda Sun Feb 06, 2011 1:35 pm

As I understand in organisational IT architecture there are 3 tiers.

1st tier(Data storage layer) : All the databases .
2nd tier(Business Tier ): All the middleware applications that include ETLs, messaging middleware , object oriented middleware etc
3rd tier(Client layer/Presentation layer) : Front end applications .

My question is why should ETL remain in the second tier instead of the first tier .

thanks for your help.
Manoj

Manoj Panda

Posts : 1
Join date : 2011-02-06

Back to top Go down

Where should the ETL be ? Empty Re: Where should the ETL be ?

Post  Jeff Smith Mon Feb 07, 2011 2:00 pm

Going in reverse order -

My reporting software interfaces with the database via SQL. If I need to, I can create views in the interface that the user sees as a table. I would only do this for simple things for short term purposes. Upon the next release of the database, I would try to push such calculations into the database if appropriate. But, I might want to keep some calculations in the frontend - for example, if I same an aggregate table that has total dollars and total items, I might put a calculation in the front end for the average.

In the middle tier, which I assume is the presentation database, I may have modeled scores. It may be easier to model the scores on data in the middle tier. If the presentation area is a star schema, and therefore easier to use, then it may be easier to create the modelled scores from the star schema than from a 3nf database.

But more basic, if I have aggregate tables in the middle tier, I'll want them to be created from the detailed data in the middle tier.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Where should the ETL be ? Empty Re: Where should the ETL be ?

Post  ngalemmo Mon Feb 07, 2011 11:59 pm

I don't quite get the middle tier as the all encompassing 'business tier'.

I tend to look at things as back end (data warehouse and everything associated with supporting it, including ETL and operational interfaces) and the front end, the end user environment. Business definitions and rules play a role in both. I try to incorporate the more complex transformations and rules in the back end where you have better control over the development and testing of such logic. Doing so makes the front end much easier to deploy and use. Those business rules that do make it to the front end should be accomodated in the metalayer provided by the tool being used so that information is not left open to 'creative interpretation'.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Where should the ETL be ? Empty Re: Where should the ETL be ?

Post  Jeff Smith Wed Feb 23, 2011 2:42 pm

After rereading the original post, I want to change my suggestion.

If at all possible, transformations should occur during the database load process. The Reporting interface can have some ETL but I would keep it to a minimum. I would limit it to things that will eventually be pushed to the database load process. I can see some calculations in the reporting interface or concatenations that should have been done in the dimension table, etc. But nothing heavy.

I think reports can have some manipulations of the data. I don't think that standard reports should always be limited to aggregations or filters. I think there are times when a report can contain extensive SQL. I tend to do this when the information is report specific as opposed to being information that users need to query in an ad hc fashion. But this is the exception rather than the rule.

For example, I have indicators in the dimension tables. I wish that in addition to the indicators, I had a descriptive fields that can be included in a reports so that the report said "In Network" and "out of Network" instead of Y/N. I can create this in the Reporting interface. The next time that dimension comes up for tweaking, I will put the descriptions in the dimension table.

I have a report that projects cost savings. The user can enter certain variables including revenue. The report apportions the cost savings to various buckets based on history (time period is also a variable). This report requires a lot of code to be written. I can't figure out how to change the database to make creating the report less complex and even if I could, as soon as I made the database changes, the users would request changes to the report.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Where should the ETL be ? Empty Re: Where should the ETL be ?

Post  hang Fri Mar 04, 2011 8:43 pm

In my view, generally there are two categories in ETL process. One has a lot of business logics and rules involved and the other has all the common DW related processes, things like SCD handling, sorting out surrogate key pipeline etc.

Either side of balance could by far outweigh the other depending on the business and modelling complexity. However having these two types of complexities in separate layer will make it easier to sharpen you focus on distinctively different logics.

In this model, the valuable resources can be best leveraged to their respective strength. Some people with great SQL skills and business knowledge may struggle to comprehend the fundamental concept of ETL and dimensional modeling, while others may be DW experts but overwhelmed by the sheer complexity in the business.

If business is very complex, I tend to have an ODS layer that holds business related stored proc used by ETL and people coming from OLTP system will feel at home in this area as the schema is in 3NF form. If the structure in dimensional data store (DDS) is overwhelming, I would store all the ETL stored procs in DDS and use synonyms to reference db objects in other data store (eg. ODS). In the second layer, I would only focus on how to handle SCD, resolving surrogate key conversion for fact loading, populating junk dimension and bridge etc.

There could be a third layer for ETL, but I can hardly call it ETL in terms of complexity and difficulty in the other two layers. However, as Jeff suggested, the third layer should be very straight forward for aggregation, KPI calculation, any reporting requirements, otherwise shift the complexity to the upstream layers or leverage the power of OLAP cube. To my understanding, that’s one of the main goals of DW/BI system, making reporting very easy while leaving hard and dirty work to ETL.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Where should the ETL be ? Empty Re: Where should the ETL be ?

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum