ETL from Oracle to SQL Server 2008 Data Warehouse
+5
ngalemmo
John Simon
hang
BoxesAndLines
sman
9 posters
Page 1 of 1
ETL from Oracle to SQL Server 2008 Data Warehouse
Hello,
Our data warehouse runs on SQL 2008 servers. One of the transactions systems we have runs on an Oracle database and managed by a separate team.
What are the recommended methods to extract source data from an Oracle database and loading to a SQL Server 2008 data warehouse?
One option I can think of is to export data from Oracle to a set of flat files (CSVs) and deliver them to the Data warehouse team. The data warehouse team can use an ETL tool like SSIS to load these CSVs to SQL Server 2008. Is this practical?
Any other methods?
Thank you
Our data warehouse runs on SQL 2008 servers. One of the transactions systems we have runs on an Oracle database and managed by a separate team.
What are the recommended methods to extract source data from an Oracle database and loading to a SQL Server 2008 data warehouse?
One option I can think of is to export data from Oracle to a set of flat files (CSVs) and deliver them to the Data warehouse team. The data warehouse team can use an ETL tool like SSIS to load these CSVs to SQL Server 2008. Is this practical?
Any other methods?
Thank you
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Use a tool like Informatica. Different databases or servers are not an issue.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
You could also use OLE DB connection in SSIS to extract data from Oracle database directly. A typical approach is to load all the relevant tables into staging area in SQL Server and let ETL process carry out all the necessary down-stream activities either by stored procedures or by tools.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
You can use SSIS to load directly into the Data Warehouse via an OLEDB connection. However the issue with this is that if there is a problem on either end, you may lose the data at that point. I think it's safer to get an extract as a csv file and load with SSIS.
Boxes and Lines, why use Informatica when you get SSIS for free?
Boxes and Lines, why use Informatica when you get SSIS for free?
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Thanks John Simon for the reply.
How about getting a back up of Oracle database, restore in an staging Oracle environment in Data Warehouse team and use in ETL tool to extract data and load into SQL Server data warehouse? Obviously, this relieves the work load on Oracle DBA team (Transaction system), but adds more work for the ETL team in the Data warehouse team. Any thoughts on this approach?
How about getting a back up of Oracle database, restore in an staging Oracle environment in Data Warehouse team and use in ETL tool to extract data and load into SQL Server data warehouse? Obviously, this relieves the work load on Oracle DBA team (Transaction system), but adds more work for the ETL team in the Data warehouse team. Any thoughts on this approach?
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
You could use the SQL Server's data replication facility to move data from Oracle to SQL Server for initial load and then incremental load by cvs files.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Change Data Capture
Thanks Hang.
Do you know whether types of Change Data Capture stratergies are available in Oracle?
I have come across various CDC methods like relying on created/updated datatime columns in transaction system tables, triggers, transaction logs, hash values etc.
This is what I tend to think as a possible solution. Adding a hash column to every table in data-warehouse that is calculated based on all the columns of the table so that updated records can be identified by comparing hash columns. inserted and deleted records can be identified by left and right outer joins respectively between transaction-extract and existing-data-warehouse.
Does this sound sensible / practical?
Do you know whether types of Change Data Capture stratergies are available in Oracle?
I have come across various CDC methods like relying on created/updated datatime columns in transaction system tables, triggers, transaction logs, hash values etc.
This is what I tend to think as a possible solution. Adding a hash column to every table in data-warehouse that is calculated based on all the columns of the table so that updated records can be identified by comparing hash columns. inserted and deleted records can be identified by left and right outer joins respectively between transaction-extract and existing-data-warehouse.
Does this sound sensible / practical?
Last edited by sman on Mon Jan 31, 2011 12:44 am; edited 1 time in total (Reason for editing : fixed a typo)
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
There is a relevant topic on the forum: http://forum.kimballgroup.com/t519-fact-table-incremental-load#3546
The built in CDC feature in SQL Server 2008 would be only suitable when OLTP is in SQL Server 2008. Hopefully the post would give you some clue about loading data stored in Oracle. I think the issue is to identify and export only delta data into csv or staging area for incremental load.
The built in CDC feature in SQL Server 2008 would be only suitable when OLTP is in SQL Server 2008. Hopefully the post would give you some clue about loading data stored in Oracle. I think the issue is to identify and export only delta data into csv or staging area for incremental load.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Keep it simple.
The transaction system would have dates in it for new/updated records.
You can store the date of your last process in a control table, the use SSIS to pull data from Oracle where the insert_update date in the OLTP system is greater than in your control table. The update the control table to the latest date.
This way you only pull the deltas.
The transaction system would have dates in it for new/updated records.
You can store the date of your last process in a control table, the use SSIS to pull data from Oracle where the insert_update date in the OLTP system is greater than in your control table. The update the control table to the latest date.
This way you only pull the deltas.
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
John Simon,
In this approach, how are the deleted records identified?
In this approach, how are the deleted records identified?
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Why would you delete anything? You want to keep a history. What kind of record are you thinking of?
If a reference record is deleted, then it can remain valid in your SCDType2 and it doesn't matter. Why would the source system be doing physical deletes?
If a reference record is deleted, then it can remain valid in your SCDType2 and it doesn't matter. Why would the source system be doing physical deletes?
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Transaction system deletes records that are no longer valid. It does a physical delete rather than a soft delete.
When this happens, data warehouse has to mark the record as deleted by setting warehouse_end_date column. The data warehouse stages Transaction system data with time banding (ie. every table has warehouse_start_date, warehouse_end_date columns).
Eg.
Active records have '9999-12-31' in warehouse_end_date column. When a record is deleted in the Transaction system, the deleted date is entered into 'warehouse_end_date' column.
When this happens, data warehouse has to mark the record as deleted by setting warehouse_end_date column. The data warehouse stages Transaction system data with time banding (ie. every table has warehouse_start_date, warehouse_end_date columns).
Eg.
Active records have '9999-12-31' in warehouse_end_date column. When a record is deleted in the Transaction system, the deleted date is entered into 'warehouse_end_date' column.
Last edited by sman on Mon Jan 31, 2011 7:59 pm; edited 1 time in total (Reason for editing : typo)
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
You could run a piece of SQL that does a comparison between your extract and your dimension e.g.
UPDATE a
SET a.EndDate = GETDATE() -1
FROM DimProduct a
WHERE NOT EXISTS(SELECT 1 FROM SrcProduct b WHERE a.ProductCode = b.ProductCode)
UPDATE a
SET a.EndDate = GETDATE() -1
FROM DimProduct a
WHERE NOT EXISTS(SELECT 1 FROM SrcProduct b WHERE a.ProductCode = b.ProductCode)
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Oracle maintains transaction logs and there are ETL tools that can read these logs to capture change information. SSIS is not one of them.
Alternately you can modify the Oracle database and add a DELETE trigger on the tables of interest and write code to capture the key information of the row being deleted into another table.
If none of those options are available, all you have left is the old fashioned way... extract all rows from the source and compare them to the rows in the DW to find ones that are no longer in the source.
Alternately you can modify the Oracle database and add a DELETE trigger on the tables of interest and write code to capture the key information of the row being deleted into another table.
If none of those options are available, all you have left is the old fashioned way... extract all rows from the source and compare them to the rows in the DW to find ones that are no longer in the source.
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Thanks everyone for the replies.
sman- Posts : 22
Join date : 2011-01-30
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Our Data Warehouse is being hosted on our Servers. The only problem is it is off site. I have no problems with the database. I have issues with backup. I find it difficult to query strings on our domain controlled environment. I am not sure if this issue is localized on my end or on the domain itself. Need to find answers
Herman Reid
http://www.blenderreviewer.com
Herman Reid
http://www.blenderreviewer.com
Oracle to SqlServer - CDC Options
Feel free to contact me if you want detail. I have made I think pretty much every mistake (although always looking for new ones) that can be made and willing to share experience.
I would recommend thinking this through for your situation and not putting too much weight on one specific proposed solution, or mentioned lack of solution, in this thread. Questions like "why buy a tool if SSIS is free?" is a valid question only in certain circumstances as SSIS isn't AbInitio or Informatica or Datastage. There are more considerations than just cost. Think about your larger strategy (hopefully you have one) in terms of what you're building and its longevity and support levels. There are many options for CDC, depending on your source system and objectives so unfortunately it is one of those classic "it depends". And some of the "depends" is how much experience one has in trying different options in general and then specifically with the combination of specific ETL tool and source/target database types.
Re. some of the replies...Log scraping (cdc tools) can work but there are very significant caveats. No, SSIS/SqlServer's built in CDC doesn't work for Oracle source, but Attunity has a CDC add-in to SSIS that does. Oracle's CDC may be an option but only for certain circumstance (a SqlServer DW not being one of them). All log scraping tools require supplemental logging on Oracle and the ability to make backups wait for the log scraper to get the redo/archived redo before its compressed or moved. No you don't have to extract to flat files. No, don't just add triggers to the source tables to do custom CDC. These are all options but again, it depends. And it's these and other "depends" are what keeps you from pooping your CDC pants, as it were. Sorry couldn't resist.
Considerations that will make or break your CDC solution:
This is not a comprehensive list but should get you a good start in making sound judgement on next steps. And they should not be difficult questions to answer. If they are, then warning signs should go off and I would recommend getting sufficiently confident in the answers before making final decisions on approach and tools.
Gary
I would recommend thinking this through for your situation and not putting too much weight on one specific proposed solution, or mentioned lack of solution, in this thread. Questions like "why buy a tool if SSIS is free?" is a valid question only in certain circumstances as SSIS isn't AbInitio or Informatica or Datastage. There are more considerations than just cost. Think about your larger strategy (hopefully you have one) in terms of what you're building and its longevity and support levels. There are many options for CDC, depending on your source system and objectives so unfortunately it is one of those classic "it depends". And some of the "depends" is how much experience one has in trying different options in general and then specifically with the combination of specific ETL tool and source/target database types.
Re. some of the replies...Log scraping (cdc tools) can work but there are very significant caveats. No, SSIS/SqlServer's built in CDC doesn't work for Oracle source, but Attunity has a CDC add-in to SSIS that does. Oracle's CDC may be an option but only for certain circumstance (a SqlServer DW not being one of them). All log scraping tools require supplemental logging on Oracle and the ability to make backups wait for the log scraper to get the redo/archived redo before its compressed or moved. No you don't have to extract to flat files. No, don't just add triggers to the source tables to do custom CDC. These are all options but again, it depends. And it's these and other "depends" are what keeps you from pooping your CDC pants, as it were. Sorry couldn't resist.
Considerations that will make or break your CDC solution:
- What is the total volume of data in your source system?
- Is the source system COTS application or custom - what are your options for adding triggers to the source tables?
- Are there "update timestamps" on your larger source tables and are they 100% reliable?
- Does every source table have a primary key? (some CDC tools require them)
- Do you need every change to rows or just state of data at specific points in time?
- What frequency of update do you need for your DW (and are you hoping to build an active DW?)?
- How many developers will be working with your ETL tool? (SSIS has no built-in checkin/checkout like informatica does, SSIS requires remote desktop to develop/debug using server service, etc.)
- What's your budget for ETL tools? How important is saving money on the tool?
- What's the cost/availability of training in the ETL tools under consideration?
- Does your shop already have an ETL tool and expertise in that tool?
- What's your metadata strategy? (Technical metadata is heavily dependent on ETL tools these days)
- Does your source system have Blobs/Clobs or other hard to handle datatypes like SDO_GEOMETRY? Nearly all CDC and ETL tools do not support them)
- Are you building a datamart or an enterprise-wide solution?
This is not a comprehensive list but should get you a good start in making sound judgement on next steps. And they should not be difficult questions to answer. If they are, then warning signs should go off and I would recommend getting sufficiently confident in the answers before making final decisions on approach and tools.
Gary
GMELHAFF- Posts : 3
Join date : 2011-03-01
Location : Olympia, WA
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
My preferred method is to extract using SSIS, transform in the SSIS Dataflow and load directly into your target SQL tables. Intermediate CSVs or staging tables will slow the process dramatically and add to the testing and maintenance challenges IMO. Almost all scenarios where you might consider intermediate storage can nowadays be resolved in the SSIS Control or Data Flow.
I prefer to connect to Oracle using the Oracle Provider for OLE DB which comes with the 11G Oracle Client install in either x32 or x64 flavours. This only falls down on Oracle BLOB columns, where I revert to an ADO Net Source (.Net Providers\OracleClient Data Provider).
I retire old SCD type 2 rows using a similar UPDATE statement to the one above.
The big gotcha with Oracle-to-SQL ETL is often the extreme date issue. Oracle DATE datatype can handle more extreme dates than SQL datetime (e.g. 1/1/1693), with ugly results. I recommend using SQL datetime2.
Good luck!
Mike
I prefer to connect to Oracle using the Oracle Provider for OLE DB which comes with the 11G Oracle Client install in either x32 or x64 flavours. This only falls down on Oracle BLOB columns, where I revert to an ADO Net Source (.Net Providers\OracleClient Data Provider).
I retire old SCD type 2 rows using a similar UPDATE statement to the one above.
The big gotcha with Oracle-to-SQL ETL is often the extreme date issue. Oracle DATE datatype can handle more extreme dates than SQL datetime (e.g. 1/1/1693), with ugly results. I recommend using SQL datetime2.
Good luck!
Mike
Re: ETL from Oracle to SQL Server 2008 Data Warehouse
Garry and Mike,
Thank you very much for your detailed and informative insights.
Thank you very much for your detailed and informative insights.
sman- Posts : 22
Join date : 2011-01-30
Similar topics
» Data replication of 500+ tables from Oracle to SQL Server
» Nulls and SQL Server 2008
» SQL Server 2008 Date data type as dimension key
» SK generation in SQL Server 2005/2008
» Sql Server Integration Services (2008) - Sort Data Flow Task / Advanced OLE Source .....
» Nulls and SQL Server 2008
» SQL Server 2008 Date data type as dimension key
» SK generation in SQL Server 2005/2008
» Sql Server Integration Services (2008) - Sort Data Flow Task / Advanced OLE Source .....
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum