How to avoid duplicate Records in the fact table?
3 posters
Page 1 of 1
How to avoid duplicate Records in the fact table?
I have a periodic snapshot fact table to capture the business process "real estate sales". Then we have one row per property per buyer per day, that to say a property can be sold at most once in a day to one buyer.
Now I want to ensure data quality and avoid faulty duplicate records coming into the fact table (bad data everywhere, you never know). If the Dimension tables (Property, Buyer, Date) are not slowly changing, then I could put in the fact table a unique index on these 3 Surrogate FK Keys (PropertyKey, BuyerKey, DateKey), and that would be enough.
But what would you do, if one of the Dimension Tables is of SCD2 type? Then that unique index will be failing to prevent entering duplicate fact rows, because the new dimension Record for the same entity (property, buyer) gets a new Key in the fact table, although its still the same entity in real life. For example suppose the Buyer with the same Natural Key that is slowly changing (moving to a new address), gets a new BuyerKey but is still the same Buyer.
----------------------------------
DimBuyer
----------------------------------
BuyerKey (PK)
BuyerID (Natural Key)
City (SCD2)
----------------------------------
Now I want to ensure data quality and avoid faulty duplicate records coming into the fact table (bad data everywhere, you never know). If the Dimension tables (Property, Buyer, Date) are not slowly changing, then I could put in the fact table a unique index on these 3 Surrogate FK Keys (PropertyKey, BuyerKey, DateKey), and that would be enough.
But what would you do, if one of the Dimension Tables is of SCD2 type? Then that unique index will be failing to prevent entering duplicate fact rows, because the new dimension Record for the same entity (property, buyer) gets a new Key in the fact table, although its still the same entity in real life. For example suppose the Buyer with the same Natural Key that is slowly changing (moving to a new address), gets a new BuyerKey but is still the same Buyer.
----------------------------------
DimBuyer
----------------------------------
BuyerKey (PK)
BuyerID (Natural Key)
City (SCD2)
----------------------------------
JimBeam- Posts : 2
Join date : 2011-09-20
Re: How to avoid duplicate Records in the fact table?
That sounds like a transactional fact table rather than a periodic snapshot.JimBeam wrote:I have a periodic snapshot fact table to capture the business process "real estate sales".
If the BuyerID (Natural Key) is the only durable identifier that spans SCD2 rows for a given buyer, then you would have to use it to uniquely identify the buyer.
You could build a table that you would keep in your staging database (kitchen) which has an identifier for buyer, property, and date. You can then put a unique index on this table (could even be the clustered primary key).
DateKey, BuyerID, PropertyID*
During ETL, first insert into this table. If there is a duplicate it will fail. If it passes the load into this table, then load into your actual fact table.
*this could be PropertyKey if property is SCD1
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: How to avoid duplicate Records in the fact table?
Alternatively, you could use this query after-the-fact to see if a duplicate has crept into your fact table:
SELECT f.PropertyKey, b.BuyerID, f.DateKey, COUNT(1) AS [RecordCount]
FROM FactRealEstateSales f
INNER JOIN DimBuyer b
ON b.BuyerKey = f.BuyerKey
GROUP BY f.PropertyKey, b.BuyerID, f.DateKey
HAVING COUNT(1) > 1
SELECT f.PropertyKey, b.BuyerID, f.DateKey, COUNT(1) AS [RecordCount]
FROM FactRealEstateSales f
INNER JOIN DimBuyer b
ON b.BuyerKey = f.BuyerKey
GROUP BY f.PropertyKey, b.BuyerID, f.DateKey
HAVING COUNT(1) > 1
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: How to avoid duplicate Records in the fact table?
thanks for the tip's guys....
nmorgan96- Posts : 1
Join date : 2012-01-11
Similar topics
» Fact table and a duplicate one, please clarify
» Duplicate a dimension attribute in a fact table
» Updating records in a fact table
» Adding dummy Fact records to a Fact_SurveyAnswer table
» One Fact table having records at different granularity level
» Duplicate a dimension attribute in a fact table
» Updating records in a fact table
» Adding dummy Fact records to a Fact_SurveyAnswer table
» One Fact table having records at different granularity level
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum