Too normalized? And question on aggregated fact
4 posters
Page 1 of 1
Too normalized? And question on aggregated fact
Hi,
Im designing a dimensional model for a datawarehouse for some kind of webshop.
I've got some attributes I wanna use, but I'm afraid it contains too much outriggers and is too much snowflaked. (i left some of attributes out for this example)
customer dimension
company_name
full_name
gender
email
registration_date : FK -> date dimension
birthday_date : FK -> date dimension
first_purchase_date : FK -> date dimension
last_purchase_date : FK -> date dimension
How many outriggers is acceptable?
Also, I'd like to add 'last_year_spendings' and maybe similair attributes as an aggregated fact. However I can't find much information on this topic. It's only half a page in the book 'The Complete Guide to Dimensional Modelling' by Kimball. How do these aggregated fact tables look?
At last I'd like to thanks this forum and the book. I've come far with my overall dimensional model thanks to these resources
Im designing a dimensional model for a datawarehouse for some kind of webshop.
I've got some attributes I wanna use, but I'm afraid it contains too much outriggers and is too much snowflaked. (i left some of attributes out for this example)
customer dimension
company_name
full_name
gender
registration_date : FK -> date dimension
birthday_date : FK -> date dimension
first_purchase_date : FK -> date dimension
last_purchase_date : FK -> date dimension
How many outriggers is acceptable?
Also, I'd like to add 'last_year_spendings' and maybe similair attributes as an aggregated fact. However I can't find much information on this topic. It's only half a page in the book 'The Complete Guide to Dimensional Modelling' by Kimball. How do these aggregated fact tables look?
At last I'd like to thanks this forum and the book. I've come far with my overall dimensional model thanks to these resources
Booma- Posts : 12
Join date : 2014-03-10
Re: Too normalized? And question on aggregated fact
For the date dimension, don't model a FK from the customer dimension. You can always join to the date dimension with a date column. Aggregate fact tables are just sums and group by's of a lower grain fact. Which columns you use or drop is driven by reporting. That is, I want my aggregate fact to handle the vast majority of queries while my detail grained fact should cover everything else.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Too normalized? And question on aggregated fact
Thank you for your fast reply.BoxesAndLines wrote:For the date dimension, don't model a FK from the customer dimension. You can always join to the date dimension with a date column. Aggregate fact tables are just sums and group by's of a lower grain fact. Which columns you use or drop is driven by reporting. That is, I want my aggregate fact to handle the vast majority of queries while my detail grained fact should cover everything else.
I dont really understand your comment on the customer dimension. You mean just put a Date field for registration_date, birthday_date etc? And if I need to group by month, for example, just join the tables on the DATE field? Would that be much slower than by ID?
Booma- Posts : 12
Join date : 2014-03-10
Re: Too normalized? And question on aggregated fact
You should have an index on the date column in the date dimension to address the joins. I typically add date dimension FK's to the fact table. If I have other dates that end up in dimensions, I leave them as date knowing that I can always join to the date dimension if needed.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Too normalized? And question on aggregated fact
I still don't really understand aggregated facts as a dimension attribute.
Here is a snippet of my dimensional model:
I have the standard Customer dimension
Customer dimension
gender
full_name
...other attributes...
spent_fact_id (FK to Spent aggregated fact)
Spent aggregated fact
id
spent_last_year (decimal that indicates the amount of dollars/euros)
average_lifetime_spending
I don't see how that is a fact table, since it contains no dimensions.
Let's say I want a report with the average spending of Male customers, which spent at least $1000 last year. I don't really how you would query that with this model.
How do I record historical spendings? How much did they spent last year, 2 years ago, 5 years ago. The last 5 years in total?
Here is a snippet of my dimensional model:
I have the standard Customer dimension
Customer dimension
gender
full_name
...other attributes...
spent_fact_id (FK to Spent aggregated fact)
Spent aggregated fact
id
spent_last_year (decimal that indicates the amount of dollars/euros)
average_lifetime_spending
I don't see how that is a fact table, since it contains no dimensions.
Let's say I want a report with the average spending of Male customers, which spent at least $1000 last year. I don't really how you would query that with this model.
How do I record historical spendings? How much did they spent last year, 2 years ago, 5 years ago. The last 5 years in total?
Booma- Posts : 12
Join date : 2014-03-10
Re: Too normalized? And question on aggregated fact
Customer attributes are dimensional values, last year's spend, for example, is a classification of the customer, not a fact, so it belongs in a dimension table.
In retail it is not unusual to have multiple customer related dimension tables. Each table would carry the same primary key value and one or more tables could be used in a query depending on what is needed. The reason for this is a lot of attributes are not static and often fully recalculated. Maintaining separate tables helps this process as often a table is replaced rather than updated.
As far as the date attributes go, it is probably not a good idea to snowflake to the date dimension, instead include the desired date attributes in the table itself. Customer is going to be your largest dimension, snowflaking such a dimension can significant impact performance.
In retail it is not unusual to have multiple customer related dimension tables. Each table would carry the same primary key value and one or more tables could be used in a query depending on what is needed. The reason for this is a lot of attributes are not static and often fully recalculated. Maintaining separate tables helps this process as often a table is replaced rather than updated.
As far as the date attributes go, it is probably not a good idea to snowflake to the date dimension, instead include the desired date attributes in the table itself. Customer is going to be your largest dimension, snowflaking such a dimension can significant impact performance.
Re: Too normalized? And question on aggregated fact
You're right! Thanks, I got it now.
This question has few to do with the other questions I asked.
What is the preferred way to determine age of a product/customer in a dimensional model? For example: What is the average age of all customers?
I thought of a dimension 'TimeDelta'
The day_amount could also be a measure for a fact. But with both solutions, it means that the fact table which contains this dimension needs to be updated daily.
This question has few to do with the other questions I asked.
What is the preferred way to determine age of a product/customer in a dimensional model? For example: What is the average age of all customers?
I thought of a dimension 'TimeDelta'
attribute name | example values |
id | 1 |
years | 1 |
months | 17 |
days | 523 |
The day_amount could also be a measure for a fact. But with both solutions, it means that the fact table which contains this dimension needs to be updated daily.
Booma- Posts : 12
Join date : 2014-03-10
Re: Too normalized? And question on aggregated fact
There are two ways to deal with this, depending on how the question is phrased. Do you want the current age of people who have purchased things or do you want the age of the purchaser?
The former is simply an age calculation off the birthdate attribute of the customer, the latter is the difference between the birthdate and the time of the transaction. You don't need additional structures.
The former is simply an age calculation off the birthdate attribute of the customer, the latter is the difference between the birthdate and the time of the transaction. You don't need additional structures.
Too normalized? And question on aggregated fact
Booma wrote:I still don't really understand aggregated facts as a dimension attribute.
Here is a snippet of my dimensional model:
I have the standard Customer dimension
Customer dimension
gender
full_name
...other attributes...
spent_fact_id (FK to Spent aggregated fact)
Spent aggregated fact
id
spent_last_year (decimal that indicates the amount of dollars/euros)
average_lifetime_spending
I don't see how that is a fact table, since it contains no dimensions.
Let's say I want a report with the average spending of Male customers, which spent at least $1000 last year. I don't really how you would query that with this model.
How do I record historical spendings? How much did they spent last year, 2 years ago, 5 years ago. The last 5 years in total?
Take out first_purchase_date and last_purchase_date from customer dimension and keep one purchase date in the fact table. So your Fact table has all the transactions date and amount in it for all customers. Like here:
Customer Purchase date Amount
Adam 1/1/2012 100
Adam 11/1/2012 200
Adam 5/12/2013 10
Adam 1/1/2014 20
Adam 3/1/2014 20
Now if you want to know how much Adam made in 2012 uses SQL between function. Sum purchase amount where purchase date between 1/1/2012 and 12/31/2012 group by customer is $300 for Adam
You can adjust sql to run for 2 years or 5 years . I hope you get the idea now.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Similar topics
» Aggregated fact tables
» accumulating fact question
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» 1 Fact or 2, that is the question
» accumulating fact question
» FACT Design Question - How to design FACT Table when computation is required on the degenerate dimension?
» Modelling question - Multiple fact tables with one to many relationships and querying from many fact tables
» 1 Fact or 2, that is the question
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum