Modeling Sales & Visits
5 posters
Page 1 of 1
Modeling Sales & Visits
Here's my latest uncertainty...
I'm building a DW for a company that's tracking sales of items and visits to one of their outlets. I'm good with the sales, but not sure which path to go for the visits. Sales looks something like this (very generic):
SALES
============
ProductID
CustomerID
DateID
Quantity
Amount
My original plan led me to creating a Visit factless fact
CUST_VISITS
============
DateID
VisitTypeID
Visit (defaults to 1)
VISIT_TYPES
============
VisitTypeID
VisitDescr
This seemed best fit but then I started thinking I should just hang the visit dim off the sales fact and have something like:
SALES
============
ProductID
CustomerID
DateID
Quantity
Amount
VisitType1 (YES or NO)
VisitType2 (YES or NO)
VisitType3 (YES or NO)
VisitType4 (YES or NO)
My issue with doing this is the Visit and Sales transactions are at different granularity. Sales happen at an exact point in time (e.g. 3/6/12 @ 14:45:32) whereas a Visit=1 Day (e.g. 3/6/12). Sales can also have multiple records per a customer visit (e.g. there could be 5 sales records on 3/6/12 for customer 123 but only 1 Visit record for 3/6/12).
I'm sure this is very basic and I shouldn't be confused, but wanted to know what you guys all thought. Any comments would be appreciated! Thanks in advance.
I'm building a DW for a company that's tracking sales of items and visits to one of their outlets. I'm good with the sales, but not sure which path to go for the visits. Sales looks something like this (very generic):
SALES
============
ProductID
CustomerID
DateID
Quantity
Amount
My original plan led me to creating a Visit factless fact
CUST_VISITS
============
DateID
VisitTypeID
Visit (defaults to 1)
VISIT_TYPES
============
VisitTypeID
VisitDescr
This seemed best fit but then I started thinking I should just hang the visit dim off the sales fact and have something like:
SALES
============
ProductID
CustomerID
DateID
Quantity
Amount
VisitType1 (YES or NO)
VisitType2 (YES or NO)
VisitType3 (YES or NO)
VisitType4 (YES or NO)
My issue with doing this is the Visit and Sales transactions are at different granularity. Sales happen at an exact point in time (e.g. 3/6/12 @ 14:45:32) whereas a Visit=1 Day (e.g. 3/6/12). Sales can also have multiple records per a customer visit (e.g. there could be 5 sales records on 3/6/12 for customer 123 but only 1 Visit record for 3/6/12).
I'm sure this is very basic and I shouldn't be confused, but wanted to know what you guys all thought. Any comments would be appreciated! Thanks in advance.
wonka- Posts : 13
Join date : 2011-08-10
Declare the grain ans stick to it
Don't go with a mixed grain fact, breaks Ralph's first law of fact table design "keep to the grain".
rob.hawken- Posts : 13
Join date : 2010-09-19
Re: Modeling Sales & Visits
You are not really mixing grains in your sales fact table as adding CustomerID and VisitType to it does not render any measure (Quantity, Amount) non-additive. You are on right track by getting rid of the initial CUST_VISITS fact table, as you may easily achieve that by aggregation on sales fact table. All you need is to have a timestamp, in addition to the date key, in the sales fact table.
However I have a little advice on the 4 columns for VisitType in the fact table. The model would be much cleaner if you can wrap them up into a single junk dimension and connect them to the fact table by a single FK.
However I have a little advice on the 4 columns for VisitType in the fact table. The model would be much cleaner if you can wrap them up into a single junk dimension and connect them to the fact table by a single FK.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Modeling Sales & Visits
I would again go by Ralph view on creating fact as it says, a fact should represent what happened and whatever the dimensions involved during that particular thing happening.
This is how we start and come with dimension while creating a new system.
So when a sale is happening i.e at point of sale (POS), we have following things coming
1) Time is there
2) Customer is there
3) Store location is involved
4) Item is involved
Here when any sale is happening, "visit" is not involved so it should not at all go this fact.
You are also thinking from your comfort perspective, think about how a reports will be delivered out of your this design where you have all the things in a single fact. Performance won't be that good and un-necessary you are also increasing the size of a fact table.
Thanks
This is how we start and come with dimension while creating a new system.
So when a sale is happening i.e at point of sale (POS), we have following things coming
1) Time is there
2) Customer is there
3) Store location is involved
4) Item is involved
Here when any sale is happening, "visit" is not involved so it should not at all go this fact.
You are also thinking from your comfort perspective, think about how a reports will be delivered out of your this design where you have all the things in a single fact. Performance won't be that good and un-necessary you are also increasing the size of a fact table.
Thanks
Last edited by Vishy on Wed Mar 07, 2012 1:31 am; edited 1 time in total (Reason for editing : typo)
Vishy- Posts : 75
Join date : 2012-02-28
Location : India
Re: Modeling Sales & Visits
Vishy, what is available in 'Visits' but not available in sales transaction fact, VisitType or anything else you can see from this case. Are you saying VisitType should not be in sales fact? then how can you analyse the sales down to the grain of the time in a day by VisitType.
To me, 'visits' also applies to time of a day if you 'allocate' VisitType down to sales fact table. a daily 'Visit' by customer is just an aggregate based on sales fact table. So only one base fact table as follows:
Sales fact table
DateKey,
SalesDateTime
ProductKey,
CustomerKey,
VisitTypeKey (junk dimension)
Count=1
Don't get carried away by SalesDateTime. Very likely you still load the sales transactions on daily basis to the fact table in data warehouse. So all the dimensions should be available.
Now with this base fact table, you may work out all kinds of aggregate facts, including 'visit'. You may summarise the count by day by customer, considering a customer could visit twice, or have a distinct count on customer. You may create views ( maybe materialised) or physical aggregate tables for performance reason, or you just feed the base fact to cube and let cube aggregate everything for you.
The point is, you don't treat 'Visit' as part of base model. If it is a physical table, it's just a performance booster like many other aggregate (rollup) fact tables.
To me, 'visits' also applies to time of a day if you 'allocate' VisitType down to sales fact table. a daily 'Visit' by customer is just an aggregate based on sales fact table. So only one base fact table as follows:
Sales fact table
DateKey,
SalesDateTime
ProductKey,
CustomerKey,
VisitTypeKey (junk dimension)
Count=1
Don't get carried away by SalesDateTime. Very likely you still load the sales transactions on daily basis to the fact table in data warehouse. So all the dimensions should be available.
Now with this base fact table, you may work out all kinds of aggregate facts, including 'visit'. You may summarise the count by day by customer, considering a customer could visit twice, or have a distinct count on customer. You may create views ( maybe materialised) or physical aggregate tables for performance reason, or you just feed the base fact to cube and let cube aggregate everything for you.
The point is, you don't treat 'Visit' as part of base model. If it is a physical table, it's just a performance booster like many other aggregate (rollup) fact tables.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Modeling Sales & Visits
Thanks for the suggestions. This is what I'm leaning towards doing...
Sales
=============
CustomerID
ProductID
DateID
VisitTypeID
DateTime
Quantity
Amount
Sample records would be:
CustomerID|ProductID|DateID|VisitTypeID|DateTime|Quantity|Amount
123|2|20110203|1001|2011-02-03 06:00:00|1|950.00
123|4|20110203|1001|2011-02-03 15:00:00|5|25.00
dimVisitType
=============
VisitTypeID
Walk (visit without a sale)
Sale (sale generated)
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)
Sample record would be:
VisitTypeID|Walk|Sale|FirstVisit|Reactivated
1000|YES|NO|NO|NO
1001|NO|YES|NO|NO
1002|NO|YES|YES|NO
...
...
Is this kind of more along the lines of what you guys are suggesting?
Sales
=============
CustomerID
ProductID
DateID
VisitTypeID
DateTime
Quantity
Amount
Sample records would be:
CustomerID|ProductID|DateID|VisitTypeID|DateTime|Quantity|Amount
123|2|20110203|1001|2011-02-03 06:00:00|1|950.00
123|4|20110203|1001|2011-02-03 15:00:00|5|25.00
dimVisitType
=============
VisitTypeID
Walk (visit without a sale)
Sale (sale generated)
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)
Sample record would be:
VisitTypeID|Walk|Sale|FirstVisit|Reactivated
1000|YES|NO|NO|NO
1001|NO|YES|NO|NO
1002|NO|YES|YES|NO
...
...
Is this kind of more along the lines of what you guys are suggesting?
wonka- Posts : 13
Join date : 2011-08-10
Re: Modeling Sales & Visits
Sales are not visits and visits are not sales. They are two different events and should be tracked by two different fact tables. Of course, one wonders how on earth they are going to get visit information if the customer never buys anything. Its hard enough to know who the customer really is when you have a sale (I assume this is retail and not a membership club). On-line is about the only mechanism to get visit counts, but you almost never know who the person really is.
Re: Modeling Sales & Visits
When I consolidate seemingly two different fact into one, I firstly check the dimensionality and see if they could share exactly the same dimension set. In this case, I guess the only questionable attribute is the DateTime. If there are different registering processes, the DateTime would be different from the sales even if the visit is linked and it would be really hard to properly connect the two fact together.
However as you said, the visit would only be on day level, so obviously the visiting time can only be the sale time, and that makes me wonder if there is anything else that stops you from consolidating the two facts. Would consolidating introduce double counting on any measures, I guess not if you sensibly set them to 0 or null for non sale visit and the value itself is an rough indicator without looking further into VisitType dimension.
Of course splitting them up is easy and consolidating is always harder as it involves more insight into the business. Actually what is even harder is when you either have two many small fact tables or some fact tables are just too wide. What I do is to make an extra effort trying to find out if the small fact tables can be combined based on dimensionalities, and wide fact tables spitted based on the business focus or process to strike out a properly balanced schema.
Why consolidating if possible? Because you don't have to leave the consolidating to user if the link is inherently in the same table. Whereas splitting up an overloaded fact based on business focus may avoid double counting and sparsely populated fact table. So based on these criteria, I would consolidate in this case.
However as you said, the visit would only be on day level, so obviously the visiting time can only be the sale time, and that makes me wonder if there is anything else that stops you from consolidating the two facts. Would consolidating introduce double counting on any measures, I guess not if you sensibly set them to 0 or null for non sale visit and the value itself is an rough indicator without looking further into VisitType dimension.
Of course splitting them up is easy and consolidating is always harder as it involves more insight into the business. Actually what is even harder is when you either have two many small fact tables or some fact tables are just too wide. What I do is to make an extra effort trying to find out if the small fact tables can be combined based on dimensionalities, and wide fact tables spitted based on the business focus or process to strike out a properly balanced schema.
Why consolidating if possible? Because you don't have to leave the consolidating to user if the link is inherently in the same table. Whereas splitting up an overloaded fact based on business focus may avoid double counting and sparsely populated fact table. So based on these criteria, I would consolidate in this case.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Modeling Sales & Visits
Transaction fact table itself tells even if you don't have typeID/visit there.
It tells you
1) Customer visited or not on a given day
2) customer bought something or not on a given day
so only thing the fact table is not showing is customer's "window shopping".
Hang, I was not able to understand clearly what did you mean but I am from reporting background also at that sometimes pushes me to have that kind of design so that report is easy and fast.
customer visit itself has too many other attributes also associated with it other then attributes mentioned. Like following
------------------------------------------------------
Date
Customer ID
Walk (visit without a sale) --sale generated or not can be calculated from this column itself
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)
Visit after how many days?
What kind of items (category) purchased ( baby related,grocry,hardwares etc)
reason of not purchasing (wanted to purchase but didn't find that item or price issue)
store location ID
----------------------------------------------------
I am not sure if you get few of these columns details from source but over the period of time you might start getting them or customer may demand this.
I can tell you that both facts can be joined at reporting level, getting details for a (day+customer) from this factless fact table and then for the same (day+customer) from the transaction and then have left outer join b/w these 2 outputs.
Having all the details in your fact is almost doubling the size of the main fact and for a retailed store kind of thing where you have lot many customers+lot many transactions , this fact would grow like anything....
It tells you
1) Customer visited or not on a given day
2) customer bought something or not on a given day
so only thing the fact table is not showing is customer's "window shopping".
Hang, I was not able to understand clearly what did you mean but I am from reporting background also at that sometimes pushes me to have that kind of design so that report is easy and fast.
customer visit itself has too many other attributes also associated with it other then attributes mentioned. Like following
------------------------------------------------------
Date
Customer ID
Walk (visit without a sale) --sale generated or not can be calculated from this column itself
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)
Visit after how many days?
What kind of items (category) purchased ( baby related,grocry,hardwares etc)
reason of not purchasing (wanted to purchase but didn't find that item or price issue)
store location ID
----------------------------------------------------
I am not sure if you get few of these columns details from source but over the period of time you might start getting them or customer may demand this.
I can tell you that both facts can be joined at reporting level, getting details for a (day+customer) from this factless fact table and then for the same (day+customer) from the transaction and then have left outer join b/w these 2 outputs.
Having all the details in your fact is almost doubling the size of the main fact and for a retailed store kind of thing where you have lot many customers+lot many transactions , this fact would grow like anything....
Vishy- Posts : 75
Join date : 2012-02-28
Location : India
Re: Modeling Sales & Visits
Vishy, I think what you and Ngalemmo said makes sense, as I found out the product may not be involved in the visit. So separating them looks better idea.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» Modeling Invoice Level Sales With a Volatile Sales Org
» Modeling large sales dimension?
» rethinking sales invoice line modeling
» Modeling a Sales Order to Billing to Shipping consolidated data model
» Dimensional modelling for Sales and Marketing (opportunity, quotes, sales orders)
» Modeling large sales dimension?
» rethinking sales invoice line modeling
» Modeling a Sales Order to Billing to Shipping consolidated data model
» Dimensional modelling for Sales and Marketing (opportunity, quotes, sales orders)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum