Using 3rd party Sort packages in ETL stream
2 posters
Page 1 of 1
Using 3rd party Sort packages in ETL stream
I was wondering if anyone can share your experience with using a 3rd party Sort package (CoSort, SyncSort etc) in your ETL Stream.
I have been using Business Objects Data Integrator for the last 5 years and never had a chance to integrate a 3rd party Sort package into the ETL stream. My understanding is that it is always faster to process Flat file (source and target), compared to a database table. My dilemma is that by processing using a flat file, you lose the ability to query in the event you need to troubleshoot the data stream. I have on rare occasions used a Flat file staging for dumping my initial extract from source, followed by db tables downstream. All my aggregates are performed in the DB engine.
I guess my question really is when does it make sense (performance and storage) to integrate the use of a Flat file (staging, target?) with a 3rd Party Sort Package in the ETL stream? What are the disadvantages? This sound like an open ended question but any comment would be appreciated.
Thanks!
Brandon
I have been using Business Objects Data Integrator for the last 5 years and never had a chance to integrate a 3rd party Sort package into the ETL stream. My understanding is that it is always faster to process Flat file (source and target), compared to a database table. My dilemma is that by processing using a flat file, you lose the ability to query in the event you need to troubleshoot the data stream. I have on rare occasions used a Flat file staging for dumping my initial extract from source, followed by db tables downstream. All my aggregates are performed in the DB engine.
I guess my question really is when does it make sense (performance and storage) to integrate the use of a Flat file (staging, target?) with a 3rd Party Sort Package in the ETL stream? What are the disadvantages? This sound like an open ended question but any comment would be appreciated.
Thanks!
Brandon
juz_b- Posts : 17
Join date : 2009-02-07
Re: Using 3rd party Sort packages in ETL stream
It really depends upon how much data you are dealing with. I have used CoSort in the past (about 8 years ago) with DataStage and it provided significant improvement in sorting times.
Generally speaking, sort capabilities within an ETL tool are pretty poor compared to CoSort or SyncSort. However, with 64 bit CPUs and many GB's of RAM, the ETL tool's internal sort functionality will do a decent job if they can keep everything in memory.
But the most common practice today is to do the sorts in database queries. Sort performance in many database systems is comparable to external tools in most applications. But, if your data is not in a database, is of significant size and you really need to reduce your processing time, the sort tools are well worth considering.
Generally speaking, sort capabilities within an ETL tool are pretty poor compared to CoSort or SyncSort. However, with 64 bit CPUs and many GB's of RAM, the ETL tool's internal sort functionality will do a decent job if they can keep everything in memory.
But the most common practice today is to do the sorts in database queries. Sort performance in many database systems is comparable to external tools in most applications. But, if your data is not in a database, is of significant size and you really need to reduce your processing time, the sort tools are well worth considering.
Re: Using 3rd party Sort packages in ETL stream
Thanks for your input! This really helps clear some confusion.
juz_b- Posts : 17
Join date : 2009-02-07
Similar topics
» Too Many SSIS packages to manage!
» Sort Holiday in date Dimension
» Headquarters and Field Sales Sort
» Loading and cleaning "randomly" updated, timestamped data from a 3rd party
» 3rd Party Review
» Sort Holiday in date Dimension
» Headquarters and Field Sales Sort
» Loading and cleaning "randomly" updated, timestamped data from a 3rd party
» 3rd Party Review
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum