Why extract?
3 posters
Page 1 of 1
Why extract?
Hi all,
I question for you which will hopefully generate some discussion.
Most of the source systems we extract from are Oracle. There are mirrors set up for the production databases that are currently refreshed nightly, but can be refreshed on a more frequent schedule as it takes a matter of minutes to do the refresh. It's been suggested that instead of extracting the data to our staging database, we could pull directly from the mirrors and bypass the extract step altogether. I'm wondering what every thinks would be the pros and cons of this approach?
I think it would be more difficult to audit where the data came from, what you are loading each night etc, as well as maintaining a copy of history, but beyond that I'm having a hard time seeing what the downside(s) would be. It would certainly speed up the loading process if we didn't have to extract data every night.
What are your thoughts?
Thanks,
-Mark
I question for you which will hopefully generate some discussion.
Most of the source systems we extract from are Oracle. There are mirrors set up for the production databases that are currently refreshed nightly, but can be refreshed on a more frequent schedule as it takes a matter of minutes to do the refresh. It's been suggested that instead of extracting the data to our staging database, we could pull directly from the mirrors and bypass the extract step altogether. I'm wondering what every thinks would be the pros and cons of this approach?
I think it would be more difficult to audit where the data came from, what you are loading each night etc, as well as maintaining a copy of history, but beyond that I'm having a hard time seeing what the downside(s) would be. It would certainly speed up the loading process if we didn't have to extract data every night.
What are your thoughts?
Thanks,
-Mark
indignatz- Posts : 4
Join date : 2009-03-13
Re: Why extract?
bypass the extract step altogether
Do you mean staging? After all, you are still extracting are you not?
If you have a true mirror available, and the system admins let you do it, it shouldn't hurt pulling from the mirror instead of the live system. However, admins usually don't like the idea because the mirror is usually the failover system and they don't want any accessing it so it is ready to go should the need arise.
But, regardless of where the data comes from, staging still has a purpose.
Re: Why extract?
Sorry, yes, the extract would still occur, but instead of dropping the data in a staging database, we would transform and load it directly from the mirror.
What are the potential gotchas of not staging the data?
What are the potential gotchas of not staging the data?
indignatz- Posts : 4
Join date : 2009-03-13
Re: Why extract?
One reason to stage is to isolate a unit of work. It provides clarity to a process and a recovery point. Also, if you are dealing with multiple sources for the same data, it allows for soft transformations into a common staging view, allowing a single load process regardless of source.
Re: Why extract?
You're still staging it. The difference is that you will be "pulling" data instead of having the data "pushed" to you.
Pros and cons to both. A lot of times, the people creating the pushed data know the source system really well and the people involved in pulling the data know the DW really well. If you pull the data, your knowledge of the source system has to go up. Imagine pulling data from multiple source systems in multiple types of database platforms.
Pros and cons to both. A lot of times, the people creating the pushed data know the source system really well and the people involved in pulling the data know the DW really well. If you pull the data, your knowledge of the source system has to go up. Imagine pulling data from multiple source systems in multiple types of database platforms.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Similar topics
» Question asked about ETL (little explanation needed)
» LEFT OUTER join in source extract SQL versus downstream lookup?
» Extract to staging: use constraints or not?
» Kimball SCD vs Extract from trxn System
» Multiple sources for the same data - which one to extract from?
» LEFT OUTER join in source extract SQL versus downstream lookup?
» Extract to staging: use constraints or not?
» Kimball SCD vs Extract from trxn System
» Multiple sources for the same data - which one to extract from?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum