SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
+2
warrent
NbleSavage
6 posters
Page 1 of 1
SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
The 'MERGE' statement in SQL 2008 seems a powerful tool for managing incremental ETL.
Seems lots of examples of using from SSMS, but few (none?) of calling from within the SSIS framework.
Hopefully in Denali we'll see a package object OOTB that can better implement this functionality in SSIS.
Anyone have experience in deploying the MERGE* construct from within SSIS?
Peace.
- Savage
*note - not to be confused with the existing 'MERGE TRANSFORM' object in SSIS
Seems lots of examples of using from SSMS, but few (none?) of calling from within the SSIS framework.
Hopefully in Denali we'll see a package object OOTB that can better implement this functionality in SSIS.
Anyone have experience in deploying the MERGE* construct from within SSIS?
Peace.
- Savage
*note - not to be confused with the existing 'MERGE TRANSFORM' object in SSIS
NbleSavage- Posts : 24
Join date : 2010-11-28
Re: SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
The MERGE command is pretty powerful for solving many ETL issues, and any Management Studio example you see could be run from within SSIS via the Execute SQL task. I have run code similar to the example shown in Design Tip #107 from within SSIS.
The problem from an SSIS perspective with MERGE, or any T-SQL command, is it works on data structures within the relational environment. That means you have to have your data in SQL Server tables before you run the Execute SQL task from SSIS, which totally undermines SSIS's in-memory flow architecture. It would be great if SSIS had an equivalent construct. SQL 2011 will include a major upgrade to SSIS (like an UNDO command in the designer!). Maybe they will improve this as well.
The problem from an SSIS perspective with MERGE, or any T-SQL command, is it works on data structures within the relational environment. That means you have to have your data in SQL Server tables before you run the Execute SQL task from SSIS, which totally undermines SSIS's in-memory flow architecture. It would be great if SSIS had an equivalent construct. SQL 2011 will include a major upgrade to SSIS (like an UNDO command in the designer!). Maybe they will improve this as well.
warrent- Posts : 41
Join date : 2008-08-18
Re: SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
Thanks Warrent. I'm looking forward to the new SSIS as well
NbleSavage- Posts : 24
Join date : 2010-11-28
Re: SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
Hi Savage
I have use the merge command extensively in my ETL processes. I use SSIS primarily to control the flow of the ETL process more than to do the transformation etc. My source data, staging area and final warehouse are all hosted in SQL and I prefer data manipulation through stored procedures. I persist my source extracts in the staging area, build surrogate key maps and then use the merge statement to do the type 2 changes on the destination tables. I use variable to build up the 'exec my_storedproc' with whatever parameters I need and then a simple sql task to execute it.
Mind you my warehouse is not exactly large so not sure how this would scale. I did not like the SSIS type 2 update and preferred the control the stored procs gave me over the process and the ability to transactionalise them and stop start etc.
Any particular questions/concerns you have with the merge statement?
Greg
I have use the merge command extensively in my ETL processes. I use SSIS primarily to control the flow of the ETL process more than to do the transformation etc. My source data, staging area and final warehouse are all hosted in SQL and I prefer data manipulation through stored procedures. I persist my source extracts in the staging area, build surrogate key maps and then use the merge statement to do the type 2 changes on the destination tables. I use variable to build up the 'exec my_storedproc' with whatever parameters I need and then a simple sql task to execute it.
Mind you my warehouse is not exactly large so not sure how this would scale. I did not like the SSIS type 2 update and preferred the control the stored procs gave me over the process and the ability to transactionalise them and stop start etc.
Any particular questions/concerns you have with the merge statement?
Greg
gpotter- Posts : 1
Join date : 2011-02-02
Re: SSIS and the SQL 2008 MERGE Statement for Insert, Update and Delete
GPotter,
I think you provided a solid example in the form of how you use SSIS to employ the MERGE construct to manager your inserts and updates (you call a sproc from within SSIS to accomplish this). If you'd be so inclined as to share a code snippet of your sproc, it would be greatly appreciated (if you'd prefer not I understand as well).
Thanks for your response!
I think you provided a solid example in the form of how you use SSIS to employ the MERGE construct to manager your inserts and updates (you call a sproc from within SSIS to accomplish this). If you'd be so inclined as to share a code snippet of your sproc, it would be greatly appreciated (if you'd prefer not I understand as well).
Thanks for your response!
NbleSavage- Posts : 24
Join date : 2010-11-28
Merge statement and SSIS
I also use it extensively for significant data volumes. I created a couple of sprocs and call from Sql task so variables drive the merge dynamically. I also use in conjunction with the "Multiple Hash" in my dataflows you can find on codeplex (a great add-on btw). My tsql proc queries data dictionary to discover key fields and builds all the merge statement and runs in dynamic sql. Remember that there is 4k limit to expression so if you have large tables (lots of columns) then you have to do this in stored proc.
Someone mentioned the need for future SSIS version to do a merge...that would be great but gets into the whole issue of ETL tool doing a "push down" so that really the ETL tool is generating database-specific SQL, rather than just simple DML statements. Informatica came out with this feature as an add-on a number of years ago and has kept adding new functions that can be "pushed down". That said, those of us in this field for many years (too many for me to admit) have been doing this as custom "push downs" via custom selects in the source query in all ETL tools. I have a performance maturity graph I did once showing the progression of exponential performance as one goes from doing everything graphically in an ETL tool on the bad performance end of the spectrum verses the opposite screaming performance of push down (totally non-graphical). The notorious problem of all ETL tools is that they try to "do it all" when you use the pre-built tasks & wizards. Meaning they normally try to suck all the data into memory table at a time and then munge & mess (highly technical terms) with the result, maybe spilling onto disk and generally doing really nasty performance things. Can you say "Bad idea"? Informatica was the first one to pioneer the concept of keeping the tasks graphical and yet automatically intelligent enough to push down the operation to the database where it makes sense. And they started with Teradata - wow. That's my kind of vendor. It is only in this scenario that you can "have your cake and eat it too" when it comes to providing the desired graphical view and data lineage of data movement with high performance. As to the hope that Denali will do this, don't hold your breath. This takes some rather clever programming by the ETL software development team which I don't see SSIS getting any time soon. Heck they often don't even allow you to sort parameters or run BIDS/SSIS service in real client server mode let alone drive SQL joins & merges down to the database. However if I (and I'm not very smart) could code a tslq proc to do it then why the developers, who are probably a set of brilliant programmers writing SSIS in bellevue, do it? Makes you wonder doesn't it? Maybe they're more concerned with developing the latest xbox game then SSIS. Course SSIS is a free tool after all;-)
Someone mentioned the need for future SSIS version to do a merge...that would be great but gets into the whole issue of ETL tool doing a "push down" so that really the ETL tool is generating database-specific SQL, rather than just simple DML statements. Informatica came out with this feature as an add-on a number of years ago and has kept adding new functions that can be "pushed down". That said, those of us in this field for many years (too many for me to admit) have been doing this as custom "push downs" via custom selects in the source query in all ETL tools. I have a performance maturity graph I did once showing the progression of exponential performance as one goes from doing everything graphically in an ETL tool on the bad performance end of the spectrum verses the opposite screaming performance of push down (totally non-graphical). The notorious problem of all ETL tools is that they try to "do it all" when you use the pre-built tasks & wizards. Meaning they normally try to suck all the data into memory table at a time and then munge & mess (highly technical terms) with the result, maybe spilling onto disk and generally doing really nasty performance things. Can you say "Bad idea"? Informatica was the first one to pioneer the concept of keeping the tasks graphical and yet automatically intelligent enough to push down the operation to the database where it makes sense. And they started with Teradata - wow. That's my kind of vendor. It is only in this scenario that you can "have your cake and eat it too" when it comes to providing the desired graphical view and data lineage of data movement with high performance. As to the hope that Denali will do this, don't hold your breath. This takes some rather clever programming by the ETL software development team which I don't see SSIS getting any time soon. Heck they often don't even allow you to sort parameters or run BIDS/SSIS service in real client server mode let alone drive SQL joins & merges down to the database. However if I (and I'm not very smart) could code a tslq proc to do it then why the developers, who are probably a set of brilliant programmers writing SSIS in bellevue, do it? Makes you wonder doesn't it? Maybe they're more concerned with developing the latest xbox game then SSIS. Course SSIS is a free tool after all;-)
GMELHAFF- Posts : 3
Join date : 2011-03-01
Location : Olympia, WA
Lookup Caching as an alternative to MERGE for SSIS 2008+
I've been using Lookup Caching as an alterantive to the MERGE statement for my SSIS packages (from 2008 onwards).
The basic design is that you load a Lookup Cache using a Cache Transform with any of the columns you want to preserve (at a minimum, the surrogate key and business key). Then truncate the target table, or delete the refresh date range if you are incrementally loading. Then at the end of your main Data Flow, add a Lookup using the Cache, joined on your Business Key, and return all the columns you want to preserve. Match Output goes to an OLE DB Destination with Keep identity checked (to preserve your Surrogate Keys). No Match Output goes to a separate OLE DB Destination . Both destinations must have Table lock unchecked (to avoid deadlocks), but can use fast load.
I realise this design is conceptually a long way away from the SQL MERGE statement and may be a stretch for SQL coders, but the end result can be identical. The amount of code required is probably an order of magnitude less than a SQL Merge. More complex requirements can be catered to with intermediate transformations between that last Lookup and the Destinations.
This solutions seems to work well, is reliable and very very fast. I've scaled it as far as 500,000 rows without any dramas on a server with only 4GB RAM running SQL, SSAS, SSIS and SharePoint - runs in under 30 minutes.
Mike
The basic design is that you load a Lookup Cache using a Cache Transform with any of the columns you want to preserve (at a minimum, the surrogate key and business key). Then truncate the target table, or delete the refresh date range if you are incrementally loading. Then at the end of your main Data Flow, add a Lookup using the Cache, joined on your Business Key, and return all the columns you want to preserve. Match Output goes to an OLE DB Destination with Keep identity checked (to preserve your Surrogate Keys). No Match Output goes to a separate OLE DB Destination . Both destinations must have Table lock unchecked (to avoid deadlocks), but can use fast load.
I realise this design is conceptually a long way away from the SQL MERGE statement and may be a stretch for SQL coders, but the end result can be identical. The amount of code required is probably an order of magnitude less than a SQL Merge. More complex requirements can be catered to with intermediate transformations between that last Lookup and the Destinations.
This solutions seems to work well, is reliable and very very fast. I've scaled it as far as 500,000 rows without any dramas on a server with only 4GB RAM running SQL, SSAS, SSIS and SharePoint - runs in under 30 minutes.
Mike
Lookup Caching as an alternative to MERGE for SSIS 2008+
I'm attempting to use a SQL MERGE statement to replace lookups within one of my data flow tasks...my staging table will have approx. 5 million rows and requires several lookups. While the MERGE is certainly much faster, the problem I've experienced while using a MERGE is that it fills up my transaction log. SSIS (including lookup transformations) logs minimal transactions to the log but will cause my data flow task to run all night. I've tried a number of things to improve the performance of the lookup transformations, none of which seem to help. If a SQL MERGE is recommended in this forum (and in other forums I've read), I was wondering what other people are doing to prevent their transaction logs from filling up.
Many thanks.
Many thanks.
bellgodz- Posts : 1
Join date : 2011-10-10
Similar topics
» Delete and Insert huge records in warehouse table
» SCD using MERGE statement
» how to optimize using SQL Server Merge statement for bulk records
» Data Vault v's Dimensional Model
» How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle
» SCD using MERGE statement
» how to optimize using SQL Server Merge statement for bulk records
» Data Vault v's Dimensional Model
» How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum