Data replication of 500+ tables from Oracle to SQL Server

sman Wed Mar 23, 2011 8:34 pm

Hello,

Our organisation has over 500+ tables in an Oracle database server. About 10GB in size.
A requirement has arisen that data in these tables have to be replicated to a SQL 2008 server periodically (ie roughly, once a month). Change data capture is not required.
Truncating destination tables and re-loading all the data is enough.

What options do I have?

1) The simplest I can think of is to use SSIS and create a "Data Flow Source + Data Flow destination" for each table. Repeating this for 500+ times sounds boring! I can't think of an easy way to automatically generate the SSIS package based on meta data.

2) Another approach I thought of was to create a store procedure at in the Oracle server to export all the records in a given table to an XML document, use SSIS to pass it to the SQL Server and create a store procedure in the SQL Server to shred the XML document and insert data in to the appropriate table. To me, this approach looks more generic and extensible IIRC, the XML functionality of SQL Server/SSIS has a limitation that the maximum size of an XML document it can handle is 2GB, and I think that won't be a problem for us.

About generating the DDL statements for initially creating the matching tables in SQL Server, I intend to generate them in Oracle and use regular expressions to convert them to SQL Server syntax and data types.

Thanks for your insights.

John Simon Wed Mar 23, 2011 10:42 pm

Don't use XML - it's painfully slow to load, and the XML files will be huge.

I would suggest using BCP. You will need to create 500 format files, but only one package to loop through each file. You may be able to somehow automate the file format creation.

sman Wed Mar 23, 2011 11:04 pm

John,

IIRC, BCP does not fully support CSV. If the CSV files have strings which have CR+LF, single and double quotes, then BCP cannot handle them.

sman Thu Mar 24, 2011 2:50 am

John,

There are two problems with CSV+BCP approach.

1. Jamie Thompson showed that BCP treats delimiting quotes [eg. "hello", "world"] as data itself and inserts quotes to the columns.

http://consultingblogs.emc.com/jamiethomson/archive/2005/06/13/SSIS_3A00_-Using-Bulk-Insert-Task-with-csv-files.aspx

This problem can be partly fixed by using a format file for BCP and specifying combined delimiters (coma and quotes) for appropriate columns.

2. There is a problem with empty vs null values in CSV.

Empty string 2nd column;

Code:: "hello","","world"

NULL 2nd column;

Code:: "hello",,"world"

BCP Format file does not work in this situation when a column has nulls.

John Simon Thu Mar 24, 2011 5:54 pm

If you have control of the Oracle database, why is this a problem?

Who is writing the extracts?

sman Thu Mar 24, 2011 6:29 pm

John,

John Simon wrote:If you have control of the Oracle database, why is this a problem?

Who is writing the extracts?

I didn't quite understand your question. I can get Oracle DBA to produce CSVs.

The problem is with the limitations of BCP utility in SQL Server in handling CSVs.

John Simon Thu Mar 24, 2011 7:15 pm

Why does it need to be a csv? Why can't you use a pipe-delimited file?

sman Thu Mar 24, 2011 8:19 pm

John,

John Simon wrote:Why does it need to be a csv? Why can't you use a pipe-delimited file?

The delimiter could be any character, but the problem remains the same. Pipe itself could be among data as valid data (albeit less frequent than coma).

John Simon Thu Mar 24, 2011 8:37 pm

What answer do you want? There must be some control character you can use.

It is entirely possible to use BULK INSERT with csv, pipe delimited or any other character you wish to use as a column break. I've worked on a large project where we loaded many files using this method of one package with many format files quite successfully.

sman Thu Mar 24, 2011 9:14 pm

I've worked on a large project where we loaded many files using this method of one package with many format files quite successfully

Suppose some columns in the source tables have NULL and others have empty strings.

How do you differentiate this in Coma (or any other character) delimitted files?
Then, how do you get BCP to recognise the above difference?

I have worked in DW environment where daily extracts came in delimitted files. We mostly used non-printable control characters as delimitters, however, some columns still had them as the data. Due to these messy problems, we specified a unique delimitter for each table. (I can't remember how we handle NULL vs Empty string situation at the time.) In another situation, I wrote a C# app to read CSVs and load to tables. This was considerably slower than BCP, however, dotnet has APIs to gracefully handle complex CSVs.

I would like to see how others use BCP and overcome this problem. (ie. NULL vs Empty string).

John Simon Thu Mar 24, 2011 10:23 pm

There is no concept in Oracle of an empty string - it's simply a NULL. So it's a non-issue.

sman Thu Mar 24, 2011 10:46 pm

John Simon wrote:There is no concept in Oracle of an empty string - it's simply a NULL. So it's a non-issue.

Thanks John for the clarification.

VHF Fri Mar 25, 2011 11:44 am

sman wrote:...I can't think of an easy way to automatically generate the SSIS package based on meta data....

I don't think there is an easy way, but in theory at least you could create an SSIS package for each table programmatically based on table-definition metadata.

http://blogs.msdn.com/b/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx

http://lakshmik.blogspot.com/2005/05/how-to-programmatically-create-ssis.html

hang Fri Mar 25, 2011 9:49 pm

I can see two options, SQL Server built in Replication facility and Import Data option (Database-tasks-Import Data...). Both options can move data from Oracle to SQL Server.

With SQL Server Replication facility, you have three options, Transactional, Merge and Snapshot. The snapshot replication is the simplest one to set up but normally used for once-off operation. However if frequency is once a month, the snapshot may suffice. You may google about how to use the replication facility.

The Import Data can wizard you through and allow you to save the configurations into a SSIS package so that you can further finetune the package to deal with tricky issues as mentioned.

The good thing about these options is that SQL can automatically resolve most transformation logics for you so that you don’t have to spend too much time going through each table.

Sponsored content

Data replication of 500+ tables from Oracle to SQL Server

Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server

Re: Data replication of 500+ tables from Oracle to SQL Server