Data Quality Strategy
3 posters
Page 1 of 1
Data Quality Strategy
How do we go about getting Data Quality Analysis? What aspects to be touch upon?
dwcurious- Posts : 20
Join date : 2011-04-14
Re: Data Quality Strategy
Hi,
Firstly, when you are inserting records during ETL - into either Dimensions or Facts:
1) Rather insert the records and those fields that you can't find (containing nulls etc.) set them to 'Unknown'
DO NOT ignore records from the source system that are not fully populated - rather bring them to the data warehouse - but set those fields to 'Unknown'
2) Logging - every ETL run should log the # of records inserted, updated and deleted for every dimension and fact.
3) Ask the business!
The should give you rules - (that you can define in the source-to-target mapping) that will raise a flag!
- either in the warehouse you have an extra field that can be set if a certan business rule (check on record fails etc.)
4) Otherwise if you really cannot insert a Fact record because the keys, (say all foreign keys) to dimensions cannot be looked up - then think of adding an exceptions fact table where you can at least insert all those records - hopefully something pulls through like the measure! etc. that you can later on use to correlate with the source system.
so if you have FactAffordability, you also need FactAffordabilityExceptions where those really really bad records can be inserted into.
Bottom line you want everything to be pulled into the data warehouse somewhere, UNLESS the business explicitly defined that you can ignore those 'faulty' records.
BUT more than often if the DataWarehouse integrity comes into question - you must be able to prove that the ETL process did not fail - by pulling ALL source records into the DataWarehouse - even those where most foreign keys / values were unknown - that is why you must also keep a staging environment alive for the DataWarehouse so that you can show them what source records resulted in those funny fact records / fact exception records!
Hope this helps a bit?
- IF you have specific rules setup to identify faulty records - sit with the business and ask them for quality score on each of these rules - then you can when you insert or update records have a field to indicate the quality score for that record, simply S1; S2; S2 or whatever - then you can group records also based on their quality
Firstly, when you are inserting records during ETL - into either Dimensions or Facts:
1) Rather insert the records and those fields that you can't find (containing nulls etc.) set them to 'Unknown'
DO NOT ignore records from the source system that are not fully populated - rather bring them to the data warehouse - but set those fields to 'Unknown'
2) Logging - every ETL run should log the # of records inserted, updated and deleted for every dimension and fact.
3) Ask the business!
The should give you rules - (that you can define in the source-to-target mapping) that will raise a flag!
- either in the warehouse you have an extra field that can be set if a certan business rule (check on record fails etc.)
4) Otherwise if you really cannot insert a Fact record because the keys, (say all foreign keys) to dimensions cannot be looked up - then think of adding an exceptions fact table where you can at least insert all those records - hopefully something pulls through like the measure! etc. that you can later on use to correlate with the source system.
so if you have FactAffordability, you also need FactAffordabilityExceptions where those really really bad records can be inserted into.
Bottom line you want everything to be pulled into the data warehouse somewhere, UNLESS the business explicitly defined that you can ignore those 'faulty' records.
BUT more than often if the DataWarehouse integrity comes into question - you must be able to prove that the ETL process did not fail - by pulling ALL source records into the DataWarehouse - even those where most foreign keys / values were unknown - that is why you must also keep a staging environment alive for the DataWarehouse so that you can show them what source records resulted in those funny fact records / fact exception records!
Hope this helps a bit?
- IF you have specific rules setup to identify faulty records - sit with the business and ask them for quality score on each of these rules - then you can when you insert or update records have a field to indicate the quality score for that record, simply S1; S2; S2 or whatever - then you can group records also based on their quality
ian.coetzer- Posts : 57
Join date : 2010-05-31
Age : 44
Location : South Africa
Re: Data Quality Strategy
Everyone in any organisation has a role to play in improving data quality and sustaining that improvement:
• The maintainer standing in front of a pump is the expert on the data representing that pump;
• The call handler dealing with a complaint is the expert on that customer and their data issues;
• Business Intelligence staff can spot unusual data patterns and validity issues
• Executives can spot areas where cross business reports are not providing expected result
• Etc. etc.
Data quality technology is only one part of the overall solution. A fundamental requirement is for that overall DQ solution to be defined, with clear standards, expectations and individual responsibilities clearly defined. The role of technology withing this solution should be defined, as should the areas that technology cannot solve. Delegated ownership responsibilities should be clear to all.
Data quality improvement should involve the whole organisation, and should address the root causes of data quality problems. It is unlikely that any organisation can reach an improved level of data quality, and sustain this quality, without the majority of the organisation being actively involved.
• The maintainer standing in front of a pump is the expert on the data representing that pump;
• The call handler dealing with a complaint is the expert on that customer and their data issues;
• Business Intelligence staff can spot unusual data patterns and validity issues
• Executives can spot areas where cross business reports are not providing expected result
• Etc. etc.
Data quality technology is only one part of the overall solution. A fundamental requirement is for that overall DQ solution to be defined, with clear standards, expectations and individual responsibilities clearly defined. The role of technology withing this solution should be defined, as should the areas that technology cannot solve. Delegated ownership responsibilities should be clear to all.
Data quality improvement should involve the whole organisation, and should address the root causes of data quality problems. It is unlikely that any organisation can reach an improved level of data quality, and sustain this quality, without the majority of the organisation being actively involved.
bbjumpman- Posts : 3
Join date : 2011-11-04
Similar topics
» BI Strategy Vs Big Data Strategy Vs Datawarehousing solution
» Data warehouse / data retention strategy - ERP upgrade and consolidation
» Data Quality
» Table column analysis/profiling tool?
» Data Warehouse Purge Strategy - HELP
» Data warehouse / data retention strategy - ERP upgrade and consolidation
» Data Quality
» Table column analysis/profiling tool?
» Data Warehouse Purge Strategy - HELP
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum