Customer and Addresses
3 posters
Page 1 of 1
Customer and Addresses
Hi,
I have to design a dimensional relationship for Customer and Address.
In this case Customer and Address have many to many relationship. i.e. There could be joint customers and each customer has postal and physical address
I am thinking to create a Customer dimension and Address dimension and Create FactLess fact table in between.
My question is: What could be the business key for Address dimension?
Address dimension has the column e.g.: Address Key , Unit, Property, Street , City, Postcode, State, Country.
I have to design a dimensional relationship for Customer and Address.
In this case Customer and Address have many to many relationship. i.e. There could be joint customers and each customer has postal and physical address
I am thinking to create a Customer dimension and Address dimension and Create FactLess fact table in between.
My question is: What could be the business key for Address dimension?
Address dimension has the column e.g.: Address Key , Unit, Property, Street , City, Postcode, State, Country.
zahid_ash- Posts : 5
Join date : 2011-09-15
Address Hash
Your best bet is to use a HASH function on the fields that you would count as a unique address. For example, in Postgres, you could do something like:
Store the "addr_hash" as your business key.
- Code:
addr_string = upper(Unit) || upper(Property) || upper(Street) || upper(City) || upper(Postcode) || upper(State) || upper(Country)
addr_hash = md5(addr_string)
Store the "addr_hash" as your business key.
elmorejr- Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol
Re: Customer and Addresses
Depends on the number of address, but generally, it is not a good idea to use a hash as a key, unless you are using something like SHA-256. Given that generates a 32 byte hash, your not gaining much over just using the address itself.
When you deal with hashes in a combinatorial situation (i.e. using it as a key, the hash value needs to be unique among all others in the table) the probability of a collision (two different addresses having the same hash value) increases very rapidly as the population (# rows in the table) increases. (For example, if you have 31 people in a room, the probability that any two people in the room have the same birthday is around 50%). There have been collisions found with md5 and SHA-1.
If the issue is index size and lookup performance, then one approach is to use a small hash, such as CRC-32, as a non-unique index on the table. This generates a 32-bit hash and will likely have collisions as the table reaches 1-2 million rows. But as a non unique index, it will quickly isolate 1-3 rows that you then check field for field.
When you deal with hashes in a combinatorial situation (i.e. using it as a key, the hash value needs to be unique among all others in the table) the probability of a collision (two different addresses having the same hash value) increases very rapidly as the population (# rows in the table) increases. (For example, if you have 31 people in a room, the probability that any two people in the room have the same birthday is around 50%). There have been collisions found with md5 and SHA-1.
If the issue is index size and lookup performance, then one approach is to use a small hash, such as CRC-32, as a non-unique index on the table. This generates a 32-bit hash and will likely have collisions as the table reaches 1-2 million rows. But as a non unique index, it will quickly isolate 1-3 rows that you then check field for field.
Re: Customer and Addresses
I have post a reply before, dont know where it has disappeared
--------------------------------------------------------
For every combination of strings their will be a new HASH value , If any column of an Address dimension changes like any street name then the HASH key will change so how can implement the Slowly changing dimension for Address dimension if we use the HASH key as a business Key?
--------------------------------------------------------
For every combination of strings their will be a new HASH value , If any column of an Address dimension changes like any street name then the HASH key will change so how can implement the Slowly changing dimension for Address dimension if we use the HASH key as a business Key?
zahid_ash- Posts : 5
Join date : 2011-09-15
By definition you can not / should not
If you plan to implement an address dimension, where each entry is a unique address, there should not be a need to implement Type 2 changes. Every address is unique. Thus, a change in address will become a new address entry in your dimension unless an entry for that address already exists.
In this scenario, the following are unique:
100 Anystreet, Apt. 2B, Columbus, Ohio 43210
100 Anystreet, Apt. 2C, Columbus, Ohio 43210
If you would like to implement Type 2 as it applies to the customer changing their address, then you would not want to implement a separate address dimension. You will want to include the address information in your customer dimension and apply Type 2 rules accordingly.
In this scenario, the following are unique:
100 Anystreet, Apt. 2B, Columbus, Ohio 43210
100 Anystreet, Apt. 2C, Columbus, Ohio 43210
If you would like to implement Type 2 as it applies to the customer changing their address, then you would not want to implement a separate address dimension. You will want to include the address information in your customer dimension and apply Type 2 rules accordingly.
elmorejr- Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol
Re: Customer and Addresses
@ elmorejr
Thanks mate.
Got your point i.e. Address is not a SCD2 dimension and FactLess fact will hold the customer and address relationship for any change in address or customer.
Thanks mate.
Got your point i.e. Address is not a SCD2 dimension and FactLess fact will hold the customer and address relationship for any change in address or customer.
zahid_ash- Posts : 5
Join date : 2011-09-15
Similar topics
» Merging customer data from disparate sources to create a master customer dimension
» Customer & Multiple Addresses
» Addresses in Customer Dimension
» Customer dimension with multiple addresses
» Customer addresses in a high volume retail environment
» Customer & Multiple Addresses
» Addresses in Customer Dimension
» Customer dimension with multiple addresses
» Customer addresses in a high volume retail environment
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum