Data replication of 500+ tables from Oracle to SQL Server
4 posters
Page 1 of 1
Data replication of 500+ tables from Oracle to SQL Server
Hello,
Our organisation has over 500+ tables in an Oracle database server. About 10GB in size.
A requirement has arisen that data in these tables have to be replicated to a SQL 2008 server periodically (ie roughly, once a month). Change data capture is not required.
Truncating destination tables and re-loading all the data is enough.
What options do I have?
1) The simplest I can think of is to use SSIS and create a "Data Flow Source + Data Flow destination" for each table. Repeating this for 500+ times sounds boring! I can't think of an easy way to automatically generate the SSIS package based on meta data.
2) Another approach I thought of was to create a store procedure at in the Oracle server to export all the records in a given table to an XML document, use SSIS to pass it to the SQL Server and create a store procedure in the SQL Server to shred the XML document and insert data in to the appropriate table. To me, this approach looks more generic and extensible IIRC, the XML functionality of SQL Server/SSIS has a limitation that the maximum size of an XML document it can handle is 2GB, and I think that won't be a problem for us.
About generating the DDL statements for initially creating the matching tables in SQL Server, I intend to generate them in Oracle and use regular expressions to convert them to SQL Server syntax and data types.
Thanks for your insights.
Our organisation has over 500+ tables in an Oracle database server. About 10GB in size.
A requirement has arisen that data in these tables have to be replicated to a SQL 2008 server periodically (ie roughly, once a month). Change data capture is not required.
Truncating destination tables and re-loading all the data is enough.
What options do I have?
1) The simplest I can think of is to use SSIS and create a "Data Flow Source + Data Flow destination" for each table. Repeating this for 500+ times sounds boring! I can't think of an easy way to automatically generate the SSIS package based on meta data.
2) Another approach I thought of was to create a store procedure at in the Oracle server to export all the records in a given table to an XML document, use SSIS to pass it to the SQL Server and create a store procedure in the SQL Server to shred the XML document and insert data in to the appropriate table. To me, this approach looks more generic and extensible IIRC, the XML functionality of SQL Server/SSIS has a limitation that the maximum size of an XML document it can handle is 2GB, and I think that won't be a problem for us.
About generating the DDL statements for initially creating the matching tables in SQL Server, I intend to generate them in Oracle and use regular expressions to convert them to SQL Server syntax and data types.
Thanks for your insights.
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
Don't use XML - it's painfully slow to load, and the XML files will be huge.
I would suggest using BCP. You will need to create 500 format files, but only one package to loop through each file. You may be able to somehow automate the file format creation.
I would suggest using BCP. You will need to create 500 format files, but only one package to loop through each file. You may be able to somehow automate the file format creation.
Re: Data replication of 500+ tables from Oracle to SQL Server
John,
IIRC, BCP does not fully support CSV. If the CSV files have strings which have CR+LF, single and double quotes, then BCP cannot handle them.
IIRC, BCP does not fully support CSV. If the CSV files have strings which have CR+LF, single and double quotes, then BCP cannot handle them.
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
John,
There are two problems with CSV+BCP approach.
1. Jamie Thompson showed that BCP treats delimiting quotes [eg. "hello", "world"] as data itself and inserts quotes to the columns.
http://consultingblogs.emc.com/jamiethomson/archive/2005/06/13/SSIS_3A00_-Using-Bulk-Insert-Task-with-csv-files.aspx
This problem can be partly fixed by using a format file for BCP and specifying combined delimiters (coma and quotes) for appropriate columns.
2. There is a problem with empty vs null values in CSV.
Empty string 2nd column;
BCP Format file does not work in this situation when a column has nulls.
There are two problems with CSV+BCP approach.
1. Jamie Thompson showed that BCP treats delimiting quotes [eg. "hello", "world"] as data itself and inserts quotes to the columns.
http://consultingblogs.emc.com/jamiethomson/archive/2005/06/13/SSIS_3A00_-Using-Bulk-Insert-Task-with-csv-files.aspx
This problem can be partly fixed by using a format file for BCP and specifying combined delimiters (coma and quotes) for appropriate columns.
2. There is a problem with empty vs null values in CSV.
Empty string 2nd column;
- Code:
"hello","","world"
- Code:
"hello",,"world"
BCP Format file does not work in this situation when a column has nulls.
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
If you have control of the Oracle database, why is this a problem?
Who is writing the extracts?
Who is writing the extracts?
Re: Data replication of 500+ tables from Oracle to SQL Server
John,
I didn't quite understand your question. I can get Oracle DBA to produce CSVs.
The problem is with the limitations of BCP utility in SQL Server in handling CSVs.
John Simon wrote:If you have control of the Oracle database, why is this a problem?
Who is writing the extracts?
I didn't quite understand your question. I can get Oracle DBA to produce CSVs.
The problem is with the limitations of BCP utility in SQL Server in handling CSVs.
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
Why does it need to be a csv? Why can't you use a pipe-delimited file?
Re: Data replication of 500+ tables from Oracle to SQL Server
John,
The delimiter could be any character, but the problem remains the same. Pipe itself could be among data as valid data (albeit less frequent than coma).
John Simon wrote:Why does it need to be a csv? Why can't you use a pipe-delimited file?
The delimiter could be any character, but the problem remains the same. Pipe itself could be among data as valid data (albeit less frequent than coma).
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
What answer do you want? There must be some control character you can use.
It is entirely possible to use BULK INSERT with csv, pipe delimited or any other character you wish to use as a column break. I've worked on a large project where we loaded many files using this method of one package with many format files quite successfully.
It is entirely possible to use BULK INSERT with csv, pipe delimited or any other character you wish to use as a column break. I've worked on a large project where we loaded many files using this method of one package with many format files quite successfully.
Re: Data replication of 500+ tables from Oracle to SQL Server
I've worked on a large project where we loaded many files using this method of one package with many format files quite successfully
Suppose some columns in the source tables have NULL and others have empty strings.
How do you differentiate this in Coma (or any other character) delimitted files?
Then, how do you get BCP to recognise the above difference?
I have worked in DW environment where daily extracts came in delimitted files. We mostly used non-printable control characters as delimitters, however, some columns still had them as the data. Due to these messy problems, we specified a unique delimitter for each table. (I can't remember how we handle NULL vs Empty string situation at the time.) In another situation, I wrote a C# app to read CSVs and load to tables. This was considerably slower than BCP, however, dotnet has APIs to gracefully handle complex CSVs.
I would like to see how others use BCP and overcome this problem. (ie. NULL vs Empty string).
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
There is no concept in Oracle of an empty string - it's simply a NULL. So it's a non-issue.
Re: Data replication of 500+ tables from Oracle to SQL Server
John Simon wrote:There is no concept in Oracle of an empty string - it's simply a NULL. So it's a non-issue.
Thanks John for the clarification.
sman- Posts : 22
Join date : 2011-01-30
Re: Data replication of 500+ tables from Oracle to SQL Server
sman wrote:...I can't think of an easy way to automatically generate the SSIS package based on meta data....
I don't think there is an easy way, but in theory at least you could create an SSIS package for each table programmatically based on table-definition metadata.
http://blogs.msdn.com/b/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx
http://lakshmik.blogspot.com/2005/05/how-to-programmatically-create-ssis.html
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Data replication of 500+ tables from Oracle to SQL Server
I can see two options, SQL Server built in Replication facility and Import Data option (Database-tasks-Import Data...). Both options can move data from Oracle to SQL Server.
With SQL Server Replication facility, you have three options, Transactional, Merge and Snapshot. The snapshot replication is the simplest one to set up but normally used for once-off operation. However if frequency is once a month, the snapshot may suffice. You may google about how to use the replication facility.
The Import Data can wizard you through and allow you to save the configurations into a SSIS package so that you can further finetune the package to deal with tricky issues as mentioned.
The good thing about these options is that SQL can automatically resolve most transformation logics for you so that you don’t have to spend too much time going through each table.
With SQL Server Replication facility, you have three options, Transactional, Merge and Snapshot. The snapshot replication is the simplest one to set up but normally used for once-off operation. However if frequency is once a month, the snapshot may suffice. You may google about how to use the replication facility.
The Import Data can wizard you through and allow you to save the configurations into a SSIS package so that you can further finetune the package to deal with tricky issues as mentioned.
The good thing about these options is that SQL can automatically resolve most transformation logics for you so that you don’t have to spend too much time going through each table.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» ETL from Oracle to SQL Server 2008 Data Warehouse
» Dimension Design with Multiple Data Sources (ORACLE and SQL Server)
» Foreign key referential integrity in the fact tables (Oracle 11g)
» Pivoting data on the fly using Oracle 11g PIVOT clause
» ETL from SAP ECC to SQL Server DW
» Dimension Design with Multiple Data Sources (ORACLE and SQL Server)
» Foreign key referential integrity in the fact tables (Oracle 11g)
» Pivoting data on the fly using Oracle 11g PIVOT clause
» ETL from SAP ECC to SQL Server DW
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum