ETL design and performance questions
2 posters
Page 1 of 1
ETL design and performance questions
I have some questions about ETL design and process optimization:
1) CDC systems: I know CDC systems used in ETL process that read the database log (for example Oracle redolog) in order to get the changes and create a input to the ETL process, I suppose there are more CDCsystems, which is the best one?, whar are the differences?
2) indexes: I have read that when you are making a load to database in a ETL process is better to disable indexes and enable it after loading process. is right?, is this the best practice?. I think the enable process cant take long....isn't it??
3) Foreign keys: I have read is better to validate integrity between dimensions key and fact table in the code of the process not with database constraints for performance reasons..is it rigth??
4) I have read is better to use sql native connections from ETL to Source database, in terms of performance, any other best practice in connectors?
5) Working in-memory in ETL: Can be problems doing ETL steps in memory with large datasets?, which is the best option?
6) ETL vs ELT: I have always thougth that is better to do the ETL work in the ETL tool than in database, but I know that there are ELT architectures like Oracle Data Integrator, which is the best options? advantages and disadvantages of each one?
7) If I want to define restart points in the ETL process, how can I do?, I have long ETL process and in case of errors I don't want to execute all the processes then I wnat to save an intermediate table to continue. is that possible in PowerCenter? ant other option?
datawarehouse appliance: is a server with configurations oriented to dwh processes??, what special configuration is this??, an example..teradata? netezza?
Thanks in advance
1) CDC systems: I know CDC systems used in ETL process that read the database log (for example Oracle redolog) in order to get the changes and create a input to the ETL process, I suppose there are more CDCsystems, which is the best one?, whar are the differences?
2) indexes: I have read that when you are making a load to database in a ETL process is better to disable indexes and enable it after loading process. is right?, is this the best practice?. I think the enable process cant take long....isn't it??
3) Foreign keys: I have read is better to validate integrity between dimensions key and fact table in the code of the process not with database constraints for performance reasons..is it rigth??
4) I have read is better to use sql native connections from ETL to Source database, in terms of performance, any other best practice in connectors?
5) Working in-memory in ETL: Can be problems doing ETL steps in memory with large datasets?, which is the best option?
6) ETL vs ELT: I have always thougth that is better to do the ETL work in the ETL tool than in database, but I know that there are ELT architectures like Oracle Data Integrator, which is the best options? advantages and disadvantages of each one?
7) If I want to define restart points in the ETL process, how can I do?, I have long ETL process and in case of errors I don't want to execute all the processes then I wnat to save an intermediate table to continue. is that possible in PowerCenter? ant other option?
datawarehouse appliance: is a server with configurations oriented to dwh processes??, what special configuration is this??, an example..teradata? netezza?
Thanks in advance
juanvg1972- Posts : 25
Join date : 2015-05-05
Re: ETL design and performance questions
#2 - Disabling indexes during load is a common practice, but it depends on the volume of data you are loading. This is usually done with fact tables, but rarely with dimensions since the load volumes (and the creation of new rows) is not that great. What to do is a matter of testing to see which saves time.
#3 - Yes. Enforcing referential integrity in the database is redundant if you have a process in place to assign surrogate keys. It isn't necessary.
#4 - That is a matter of the particular DBMS you are using. Sometimes it makes a difference, sometimes it does not. Generally if they provide a native driver, you should use it.
#6 - ELT is useful when dealing with very large data volumes. Downside is it puts a load on the DBMS which could affect user response times. ETL is fine for moderate volumes (most typical applications). They typically cache lookups and such and can perform very well.
#8 - I can speak on Netezza. It is Postgres based, but they stripped it down and do not support indexes or referential integrity. The transaction model supports serialization only. It is an MPP system, can handle huge data volumes and performs really well.
#3 - Yes. Enforcing referential integrity in the database is redundant if you have a process in place to assign surrogate keys. It isn't necessary.
#4 - That is a matter of the particular DBMS you are using. Sometimes it makes a difference, sometimes it does not. Generally if they provide a native driver, you should use it.
#6 - ELT is useful when dealing with very large data volumes. Downside is it puts a load on the DBMS which could affect user response times. ETL is fine for moderate volumes (most typical applications). They typically cache lookups and such and can perform very well.
#8 - I can speak on Netezza. It is Postgres based, but they stripped it down and do not support indexes or referential integrity. The transaction model supports serialization only. It is an MPP system, can handle huge data volumes and performs really well.
Similar topics
» NDS to DDS Design Questions
» Datawarehousing design questions
» Modeling multiple multivalued dimensions and other design questions
» Specific Questions (Basic Questions)
» Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion
» Datawarehousing design questions
» Modeling multiple multivalued dimensions and other design questions
» Specific Questions (Basic Questions)
» Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|