Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Why extract?

Go down

Why extract? Empty Why extract?

Post  indignatz Wed Apr 18, 2012 10:29 am

Hi all,
I question for you which will hopefully generate some discussion.

Most of the source systems we extract from are Oracle. There are mirrors set up for the production databases that are currently refreshed nightly, but can be refreshed on a more frequent schedule as it takes a matter of minutes to do the refresh. It's been suggested that instead of extracting the data to our staging database, we could pull directly from the mirrors and bypass the extract step altogether. I'm wondering what every thinks would be the pros and cons of this approach?

I think it would be more difficult to audit where the data came from, what you are loading each night etc, as well as maintaining a copy of history, but beyond that I'm having a hard time seeing what the downside(s) would be. It would certainly speed up the loading process if we didn't have to extract data every night.

What are your thoughts?

Thanks,
-Mark

indignatz

Posts : 4
Join date : 2009-03-13

Back to top Go down

Why extract? Empty Re: Why extract?

Post  ngalemmo Wed Apr 18, 2012 11:43 am

bypass the extract step altogether

Do you mean staging? After all, you are still extracting are you not?

If you have a true mirror available, and the system admins let you do it, it shouldn't hurt pulling from the mirror instead of the live system. However, admins usually don't like the idea because the mirror is usually the failover system and they don't want any accessing it so it is ready to go should the need arise.

But, regardless of where the data comes from, staging still has a purpose.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Why extract? Empty Re: Why extract?

Post  indignatz Wed Apr 18, 2012 11:51 am

Sorry, yes, the extract would still occur, but instead of dropping the data in a staging database, we would transform and load it directly from the mirror.

What are the potential gotchas of not staging the data?

indignatz

Posts : 4
Join date : 2009-03-13

Back to top Go down

Why extract? Empty Re: Why extract?

Post  ngalemmo Wed Apr 18, 2012 1:20 pm

One reason to stage is to isolate a unit of work. It provides clarity to a process and a recovery point. Also, if you are dealing with multiple sources for the same data, it allows for soft transformations into a common staging view, allowing a single load process regardless of source.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Why extract? Empty Re: Why extract?

Post  Jeff Smith Wed Apr 18, 2012 2:24 pm

You're still staging it. The difference is that you will be "pulling" data instead of having the data "pushed" to you.

Pros and cons to both. A lot of times, the people creating the pushed data know the source system really well and the people involved in pulling the data know the DW really well. If you pull the data, your knowledge of the source system has to go up. Imagine pulling data from multiple source systems in multiple types of database platforms.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Why extract? Empty Re: Why extract?

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum