Suggestions for cleaning data
4 posters
Page 1 of 1
Suggestions for cleaning data
I ran into data that had trailing spaces. I expect it from CHAR fields, but it took me by surprise in a VARCHAR field. The field was an ID Number that was defined as a VARCHAR. I'm thinking that it might be wise to use RTRIM() on all VARCHAR fields early on in the ETL process just to be safe.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Suggestions for cleaning data
I usually do isnull(ltrim(rtrim()),'no_string_supplied') for all varchar fields just in case. And those 'no_strings' are way more often than you'd expect.
treat '' empty set like NULLs
I like to treat a empty string '' like a NULL. So I do something like.
The LEN takes care of trailing spaces on VARCHAR (doesn't count them) and the LEN of a NULL is not > 0 so both scenarios trigger the ELSE.
- Code:
CASE WHEN LEN([FIELD_NAME]) > 0 THEN RTRIM([FIELD_NAME]) ELSE 'Unknown' END
The LEN takes care of trailing spaces on VARCHAR (doesn't count them) and the LEN of a NULL is not > 0 so both scenarios trigger the ELSE.
Re: Suggestions for cleaning data
benlotter wrote:I like to treat a empty string '' like a NULL. So I do something like.
- Code:
CASE WHEN LEN([FIELD_NAME]) > 0 THEN RTRIM([FIELD_NAME]) ELSE 'Unknown' END
The LEN takes care of trailing spaces on VARCHAR (doesn't count them) and the LEN of a NULL is not > 0 so both scenarios trigger the ELSE.
Old Oracle habit??? :-)
Similar topics
» Data Cleaning - replacing nulls and addresses
» Loading and cleaning "randomly" updated, timestamped data from a 3rd party
» When to implement cleaning?
» Looking for suggestions on ETL tools for DW/BI Project
» where to keep sql server data type text
» Loading and cleaning "randomly" updated, timestamped data from a 3rd party
» When to implement cleaning?
» Looking for suggestions on ETL tools for DW/BI Project
» where to keep sql server data type text
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum