I have normally extracted straight from production databases for incremental loads in the ETL system. With the new team I am working with, they back up the production databases nightly and restore them to use them as the source for ETL which is not efficient at all. The team doesn't feel comfortable extracting directly from the production databases. All production databases are MS SQL server 2008 R2 and the data warehouse is also SQL Server. One idea they are throwing around is to use SQL Server Replication to replicate the prod databases to the ETL server and extract from that. The other idea is to use the new SQL server feature Always On when we upgrade to 2012 (which is similar to replication in that sense). Is it common to extract directly from live databases? What are some pros and cons? What are some efficient setups?
- Posts : 23
Join date : 2011-10-18
Location : Vancouver, Canada
Usually you don't query all the data from the source database in each ETL load. We have extracted data from production databases directly, no problem at all. We use insert/update timestamps and extract only data which have been inserted or modified in the previous 40 days. And this sliding window is the data we extract from the source database every night. I don't see a problem here.
- Posts : 5
Join date : 2011-03-01
Permissions in this forum:You cannot reply to topics in this forum