SCD Type 2 Change Reasons

ebry74 Mon Jun 20, 2011 3:48 pm

Anyone willing to share the list of type 2 change reasons you use?

I'm looking to populate my Customer dimension with change reasons. My dimension has type 2 set on LastName, State, and PostalCode, and any other changes are type 1. I was thinking New Record, Type 1, Type 2 etc...but I'm not sure how useful this is. Ideally, I would love to actually say what changed, but feel it would be difficult with multiple type 2 columns.

Thanks in advance for any guidance.

ngalemmo Tue Jun 21, 2011 9:45 am

Never done it, and I don't think you could do it as you describe. For one thing, a type 1 change updates every row for the natural key, 'new record' is easily identified and short-lived, and a type 2 change is implied by the fact a new row was created.

If you must audit change history, you need to do it with another fact table (or just some kind of log table on the side).

BoxesAndLines Thu Jun 23, 2011 2:34 pm

Any change reason would have to come from the source. I'm with ngalemmo as well, I've never incorporated a derived change reason.

warrent Thu Jun 23, 2011 7:18 pm

Just to be clear, I think the OP is asking about why a row appears in a dimension. It's either a New, (or original) row, or we've added it because there was a change to one or more Type 2 columns. (There should never be a row change reason of Type 1 because we don't track Type 1 changes.) We don't need a reason from the source to do this.

I've used a change reason column several times, and found it to be quite useful. Many of the ETL tools have a Type 2 change tracking function, but I've never seen one that surfaces a list of the changed columns. When I've done it, I've had to do it myself, either in my own SQL-based Type 2 tracking code, or as a post-processing pass on the dimension using the same comparison code. Design Tip #80 shows this comparison code.

As the original question, I usually make up a two charater abbreviation for each Type 2 column, and append these characters on a string, separated by spaces. So, in your case, you might have one row that had a LastName change, another row that was new, and a third row that had State and PostalCode changes. The RowChangeReason for these two rows would look like this:
LN
NW
ST PC

What this does is help you track the dynamics of the dimension without having to go back to the previous row. For example, if someone asks, how many new customers did we get last year? You can easily answer this with a select count(*) from Customer WHERE RowChangeReason LIKE '%NW%' AND YEAR(Effective_Date) = 2010.

(Of course the LIKE isn't needed for NW rows, but you would need it for any other change reason code.)

If you want users to be able to work with the change reasons, you can add a decode dimension as described in this
2002 Intelligent Enterprise article.

Hope this helps!
--Warren

ATran Wed Sep 07, 2011 11:19 pm

hi, im trying to understand the 'What changed' article, and I need some help.

I get the general idea, but as the link for the figure does not work anymore, i don't get how the group table join with the scd column table.

here's is how i understand so far:

Type2CustomerDimension table:
CUST_ID,LAST_NAME,MARITAL_STATUS,CITY,STATE,GROUP_TABLE_KEY
1 Combs Single NYC AA 0
2 Combs Single IUY BB 1
3 Combs-Henry Maried LDN CC 5

ScdColumnTable:
COLUMN_TABLE_ID,COLUMN_NAME
1 LAST_NAME
2 MARITAL_STATUS
3 CITY
4 STATE

SCDGroupTable:
Group_ID,LIST_COLUMN_ID
0 new column
1 1
2 1,2
3 3,4
4 2
5 1,2,3,4
...

can anyone tell me if i misunderstood something ? or how im suppose to join the 3 tables ?

Thanks
Andy

KS_EDW Thu Sep 08, 2011 12:08 am

This concept/idea was practiced in an organization I have worked for, where users were concerned about changed data details (specific to free text notes). The shortfall is that, to get at the column detail the ETL tool’s delivered CDC logic could not be used; a column by column compare of data was used to capture column and row level change. This DRASTICALLY slowed down the ETLs involved.

The only data we captured from the change was the update, delete, or insert code of the COLUMN and user ID, and date of the ROW change… the night it was first observed as created, changed, or deleted. Just to clarify, this was captured on a column and row level.

Unfortunately this was not super accurate because several users can change the same row in the same day but may not have changed he specific column in question… the last user of the day gets stuck “holding the bag”. We looked into the table’s logs but they were simply too large to bring into EDW for this purpose.

It ended up being an only half useful exercise and was not repeated past POC. There are simply too many factors and moving parts that a reporting EDW just isn’t meant to capture. Type 3 or hybrid might be your best bet.

In my defense… I said it was inaccurate but the users insisted on it.

Honestly, just be creative… try something new. You might be onto something!!!

Hope this helps.

cjrinpdx Fri Sep 09, 2011 12:06 pm

The Task Factory Dimension Merge SCD Transform (v2.0) product will do this for you.

Sponsored content

SCD Type 2 Change Reasons

SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons

Re: SCD Type 2 Change Reasons