Extracting Master/Detail data into Staging Tables
4 posters
Page 1 of 1
Extracting Master/Detail data into Staging Tables
I need to extract invoice data from our source system; this data is made up of header and line info as you'd expect.
My first question is should I denormalise the data at this stage and have just one invoice staging table or should I structure the staging area similar to the source and have both header and line staging tables? If it's relevant the data warehouse invoice fact table will be down to line level.
If the answer to the question above is to keep the data normalised in the staging area then I need to know best practice for extracting the data in separate packages without duplicating the source extraction logic. Possible answers that I've considered are:
Any help would be appreciated!
My first question is should I denormalise the data at this stage and have just one invoice staging table or should I structure the staging area similar to the source and have both header and line staging tables? If it's relevant the data warehouse invoice fact table will be down to line level.
If the answer to the question above is to keep the data normalised in the staging area then I need to know best practice for extracting the data in separate packages without duplicating the source extraction logic. Possible answers that I've considered are:
- Extract the header data into a staging table and then join the staging table with the source line table. The downside to this is the join will be across different databases on different servers so not sure what this will mean in terms of performance
- Retrieve the extracted header data from option 1 above and then cycle through each header record and retrieve the line info from the source system. The downside to this is the number of hits to the database but this is the option that feels more like how the ETL (SSIS) should work
- Extract the header and line data in a single query and then decompose the data into two staging tables. Downside is it seems like a lot of unnecessary data massaging will take place, which could introduce bugs
Any help would be appreciated!
monserob- Posts : 4
Join date : 2013-01-30
Re: Extracting Master/Detail data into Staging Tables
Hi monserob
I'd normally design a single query spanning both header and line info - the other options seem more complex and I'd expect them to be less efficient by a factor of 10 or more.
If volumes & requirements permit, I'd deliver the data from that query directly into the fact table. I only use staging tables where I have to, and you can easily deliver 10m rows using this method with modern hardware and ETL tools.
Good luck!
Mike
I'd normally design a single query spanning both header and line info - the other options seem more complex and I'd expect them to be less efficient by a factor of 10 or more.
If volumes & requirements permit, I'd deliver the data from that query directly into the fact table. I only use staging tables where I have to, and you can easily deliver 10m rows using this method with modern hardware and ETL tools.
Good luck!
Mike
Re: Extracting Master/Detail data into Staging Tables
I don't like the first 2 options. Assuming these databases are sitting on different servers, cross-database joins like those tend to perform very poorly.
As for option 3, its not clear why you would 'decompose' the data. Usually, ETL is running on a separate server so I doesn't impact the operational system. Depending on your volume and load on the operational environment, you can pull header and detail in the same query, but, it does require more resources on the operational machine. If you have sufficient information in the source tables that allow you to pull the tables individually (no joins) that would be a more efficient option, putting less load on the operational system. The latter is usually the way to do it in a 24/7 operational environment. If the operational system goes down every night, is it not a major consideration.
As for option 3, its not clear why you would 'decompose' the data. Usually, ETL is running on a separate server so I doesn't impact the operational system. Depending on your volume and load on the operational environment, you can pull header and detail in the same query, but, it does require more resources on the operational machine. If you have sufficient information in the source tables that allow you to pull the tables individually (no joins) that would be a more efficient option, putting less load on the operational system. The latter is usually the way to do it in a 24/7 operational environment. If the operational system goes down every night, is it not a major consideration.
Re: Extracting Master/Detail data into Staging Tables
If the volume of data is low you could pull all the header and line data over to a staging area and then do your processing there. Even if the volume is high it may make more sense to do that than doing cross database joins or cycling through header records and pulling lines for each header (depending on how it's implemented).
I would personally create two staging tables, header and line. Pull all the relevant headers into the header staging table and pull all the lines for the relevant headers into the line staging table. If you can determine what the relevant lines are without looking at headers that's great, if not then do a join and only pull back the line information. You're doing a little more work but I think it's a cleaner approach doing it this way.
I would personally create two staging tables, header and line. Pull all the relevant headers into the header staging table and pull all the lines for the relevant headers into the line staging table. If you can determine what the relevant lines are without looking at headers that's great, if not then do a join and only pull back the line information. You're doing a little more work but I think it's a cleaner approach doing it this way.
zip159- Posts : 6
Join date : 2013-06-24
Similar topics
» the master detail tables facts
» Extracting - separate source tables vs SQL query
» Extracting data when there is no timestamp in the source system
» Is surrogate key needed on master dimension tables?
» master-detail scenarios
» Extracting - separate source tables vs SQL query
» Extracting data when there is no timestamp in the source system
» Is surrogate key needed on master dimension tables?
» master-detail scenarios
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum