Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Where to apply ETL Looping, Archiving and Partitioning

2 posters

Go down

Where to apply ETL Looping, Archiving and Partitioning Empty Where to apply ETL Looping, Archiving and Partitioning

Post  rimages Mon Jan 09, 2012 11:08 am

Hi,

Our core financial system produces separate extracts (flat files) per country. Currently, there are 52 extracts per country and we operate in 2 countries, therefore we have 104 extracts all together. Note, the structure of the files are identical per country.

Our current DW Vendor (developed the whole DW from scratch) currently extracts all files to the staging area (files are then archived and partisioned on a daily basis) and then consolidates them in another DW layer.

As a new owner of the DW, I am questioning why they took this approach...Here are my issues with this approach. Could someone please correct me (and explain why) if I my approach is wrong.

Issues:
1. The more countries we operate in (expanding to 2 more countries) , the more extracts to store in the staging area.
- in order to minimize the number of tables in the staging area, why not apply the loop (metadata stored in the database for entity_id(country) and file_path) during the extract process. Therefore, I will have 52 tables in my staging area, with entity_id as the country identity. In addition, some countries are not critical, therefore my metadata table will also contain a flag if the extract process should fail or continue if specific country file is missing. Currently, the extract process fails on everything.
- the vendor rejected this as this tranforming the source data, but I strongly disagree with their opinion.....so, I am looking for your commnets.

2. Daily Partisioning of staging data.
- Our datawarehouse developer (vendor) is partisioning staging tables on a daily basis. I am questioning the need for this as the tables are backed up on daily basis and the source extract files are archived as well. Could someone please explain why the daily partisioning is needed.
- the vendor commented that this is the best practice, and it is recommended solution if we need to reload data. I strongly disagree with their opinion as we have data backed up and available in the file share archive if reload is necessary.

I really feel that the Vendor is creating more work that is not necessary. They created sort of a monopoly at the bank, since up until I joined the company, nobody had any data warehousing experience, therefore their estimates/work was never challaged, hence more money is spent on the development and maintenance of DW.

Can't wait to hear your opinion.

Regards,

RKJR

rimages

Posts : 6
Join date : 2012-01-09

Back to top Go down

Where to apply ETL Looping, Archiving and Partitioning Empty Re: Where to apply ETL Looping, Archiving and Partitioning

Post  BoxesAndLines Mon Jan 09, 2012 2:44 pm

Hard to say without more information whether issue 1 is a bad design on your vendor's part. I may have come to the same conclusion but based on different criteria. It seems your vendor is opposed to doing anything to a file when loading to a staging database based on some notion of transforming data is not allowed in this stage of the ETL lifecycle. I would be more focused on volumetrics, parallel ETL processing, recovery, and performance, etc, to reach the same decision.

For issue 2, partitioning does provide all sorts of benefits for managing data. I'm still amazed to find daily snapshot fact tables that are not partitioned daily. In this case, I'm more inclined with the vendor since partitioning adds more benefits than drawbacks, especially on an Oracle platform where it is simple to manage.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Where to apply ETL Looping, Archiving and Partitioning Empty Re: Where to apply ETL Looping, Archiving and Partitioning

Post  rimages Mon Jan 09, 2012 3:48 pm

thank you for the feedback....

Issue 1:
I am just coming from the experience where we had to pull data from MS Dynamics where each entity had a its own database (same structure). So, imagine about 30+ source tables times 150 databases (per entity)... if I were to follow the vendor's design, the staging area would grow significantly in terms of objects in the upcoming years...





rimages

Posts : 6
Join date : 2012-01-09

Back to top Go down

Where to apply ETL Looping, Archiving and Partitioning Empty Re: Where to apply ETL Looping, Archiving and Partitioning

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum