Representing data as (Factless) vs (Dimension Outriggers)

a_sherbeeny Tue Dec 13, 2011 7:53 am

Hi All,

We have a dimensional modeling question -- >

We need to model the following information

Customer : ID,Name ,City and Assigned_Sales_Agent

The question is , what is the best practice to model the previous info :

Choice1 : As Factless Fact table which is dimensioned by the following dimensions
== Customer_DIM : Cust_ID, Cust_Name
== City_Dim : City_ID, City_Name
== Assigned_Sales_Agent_Dim: ASA_ID, ASA_Name

Choice2 : Embedding all information into one dimension as follows:
== Customer_DIM: Cust_ID, Cust_Name,City_ID [Outriger], ASA_ID [Outrigger]

Taking into consideration that City dimension and Sales_Agent dimension are used in other business processes in the enterprise

ngalemmo Tue Dec 13, 2011 10:16 am

Generally, option 2, as it avoids a fact table that has little value. However, you don't "outrig" anything. While you may carry the keys to the salesman and city dimensions on the customer dimension, you do it solely for maintenance purposes, NOT for queries. The customer dimension should carry attributes from city and salesman that are appropriate for customer. Fact tables would carry, in addition to customer, references to city and salesman dimensions as needed.

Or you may just do none of the above. How do salesmen get a list of customers assigned to them now?

a_sherbeeny Tue Dec 13, 2011 11:25 am

Thanks ngalemmo for your quick reply.

may be I didn't specify the case clearly ,City and Assigned_Sales_Agent dimensions have a lot of attributes rather than the name only ,they have hierarchies and other attributes related to them. Does this change your answer ?

Another thing , how to answer you question in last line in your reply ?

ngalemmo Tue Dec 13, 2011 6:05 pm

Maybe yes, maybe no. From a fact table point of view (not the factless fact you mentioned originally), it should reference all three dimensions when applicable, giving you access to all attributes.

The issue of adding things to the customer dimension depend on how much you really need to know from that point of view. If you need everything, then the original factless fact table would be a better choice. But all it can be used for is dimensional reporting, such as customer lists by city or sales agent without regard to any activity. Which leads to the question as to wither it belongs in the DW at all, or is it better retrieved from the operational system.

If it goes into the DW, you need to deal with building, documenting and maintaining the fact table as well as latency between the operational system where the assignment is made and the update to the DW (usually that night). If you don't need a running history of these relationships, putting it in the DW has more downside than upside.

mostafa_mahrous75 Thu Dec 15, 2011 11:45 am

I think the question is originally about how to model the Customer's City (not the city where the sales transaction happened) and the Customer's Sales Agent who communicates with the customer (not the sales person who handled the sales transaction). in other words, Customers' City and Customers' Sales Agent are two additional attributes for the Customer information.

Customer's City and Customer's Sales Agent can be considered as two hierarchies of the Customer; ex. we can list customers by region then by city, also we can list customers by assigned sales department then by assigned sales agent. That is why we need to embed both attributes as outriggers into the Customer dimension to easily slice and dice by both hierarchies

If we apply Option 1, we'll always need to visit that factless fact table in our queries whenever we need to retrieve information about Customer's City or Customer's Assigned Sales Agent

ngalemmo Thu Dec 15, 2011 4:03 pm

need to embed both attributes as outriggers

This is the part I am struggling with. Do you really need to snowflake?

Customers' City and Customers' Sales Agent are two additional attributes for the Customer information.

Better describes what is going on. They are attributes of customer.

Storing them in customer versus a snowflake has tradeoffs. Putting them in the dimension takes up space, snowflake impacts performance (join complexity can be hidden in a view). I generally value performance over space requirements. Either way works, what you do depends on your situation.

hang Thu Dec 15, 2011 8:02 pm

Agree with ngalemmo. In general, you don't snowflake/outrig a dimensions unless it falls into the following categories:

Multivalued attributes - For instance, a customer has multiple addresses so you don't have other options but snowflake the 1-m relationship in another table.

Monster dimension - The dimension is too big (many million records). Then minimizing repeating groups in the dimension becomes significant in space/performance saving. Snowflaking or normalizing the monster dimension can make the size of monster dimension horizontally much smaller. If it's type2, moving low cardinality and fast changing attributes into mini or standalone dimensions can dramatically reduce the size both vertically and horizontally.

Highly sharable outrigger dimension - A typical example is location outrigger which may carry a whole hierarchy of attributes that are shared by many other dimensions. Centralizing the location dimension in the outrigger will help to simplify the maintenance and keep the consistency.

a_sherbeeny Sat Dec 17, 2011 3:27 am

Thanks you Hang and Ngalemmo for your reply. But I have a little question here ,

Can't we consider my option#1 as a coverage factless fact table? to get benefit of dimension conformation with other facts. for example City & Sales_Agent dimension ,as they are used in other fact tables.

Regards
Ahmed

hang Sat Dec 17, 2011 8:26 am

I would not think of Customer-City association as factless fact. If SCD 1 is not sufficient, at most, the city and its related attributes, or its FK in case of location outrigger, are the SCD 2 of customer dimension if previous residing cities are of interest. Resolving such an association within dimensions is better for dimension conformance as well, as you can connect different fact tables by single dimension entry.

However the relationship between Customer and Sales Agent could well be considered as a coverage fact, as the Sales Agent is not an attribute of Customer dimension, not even SCD2. Customer-Sales Agent association is a predetermined fact regardless of the relevant sales transaction fact. With Customer_Sales Agent coverage factless fact table, you can easily query to measure sales agent performance, such as who has not soled to their customers.

mostafa_mahrous75 Sat Dec 17, 2011 12:48 pm

I'm totally agree with you Hang, your justification seems reasonable...

Can we set a rule to differentiate between the case where we should resolve the correspondence between two entities within the same dimension (such as Customer's City case) and the other case where we should use a Coverage Factless Table (such as Customer's Sales Agent case)

Thank you Hang and Ngalemmo for your great support....

hang Sat Dec 17, 2011 6:50 pm

The best thing about dimensional modeling is, we don't normally set rules, instead, we have guidelines based on Kimball methodology. We apply the guidelines flexibly to suit each business case.

As I said in my initial response to the question, in general, you don't use fact(less) table to reflect dimension relationships if they are non-transacitonal, meaning the relationship is not determined at the time when a transaction occurs, unless you have to in case of monster dimensions where the relationship is too dynamic to reasonably control the size of the dimension. So the tradeoff is to use the fact table to track the relationship changes, but you may miss some relationships if the relevant dimension record has not appeared in the fact yet.

Now with factless fact table, or coverage fact table for that matter, you may pre-populate all the possible correlations between two dimensions. However there are two issues. Firstly, how are you going to use the relationship in your real fact table, say transaction fact. Secondly, how are you going to share that relationship in other fact or business processes for the purpose of dimension conformance? One rule about dimensional modeling is not to relate the fact tables referentially, but through the common dimensionality.

The reason I classify the Customer-Sales Agent association as coverage fact is that the relationship is a changing relationship corresponding to the one in sales transaction so that you can query for non_achieving sales agents. You could have one customer looked after by more than one sales agent at the same time (multivalued) or different time. You most likely need to snapshot that changing relationship periodically. So you need a date key in this coverage table instead of SCD date pair. The date key, instead of SCD date, is the driving time series which can be used to partition the fact table, and when you have such a date key in a relationship table, it is a fact(less) table.

So the answer is, reasonably denormalise the relationships into a single dimension so that you can share/conform them by other facts through a single dimension entry, and leverage factless fact table to simplify the SCD 2 complications if necessary.

Sponsored content

Representing data as (Factless) vs (Dimension Outriggers)

Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)

Re: Representing data as (Factless) vs (Dimension Outriggers)