single denormalized dimension or 2 separate dimensions?

salaman Fri Jan 03, 2014 1:11 pm

Hi,

I'm trying to model a monitoring environment where objects (such as network cards, disks, services) are monitored and alerts are raised if something goes wrong with these objects. The specific process being modelled is "raised_alerts" so there's a fact table called fact_raised_alerts. Amongst various dimensions there's a "monitored_object" dimension.

Now, each monitored object lives on a computer so my initial thought was to snow-flake off the "monitored_object" dimension to a "monitored_computer" dimension, which would contain all of the attributes that belong to a computer (e.g. operating_system, service_pack, amount_of_ram, is_virtual_machine, etc....).

However, in an attempt to avoid snow-flaking, I figured I would denormalize the "monitored_object" dimension and add all of the computer-related attributes to this dimension. However, this doesn't quite feel right - partly because there are thousands of monitored objects on a given computer so repeating attributes like operating_system thousands of times seems like an overhead; and partly because computer-related attributes don't logically sound like they belong in this dimension.

So, would a valid alternative be to create a completely separate dimension called "monitored_compter" and then just have 2 separate FKs in my fact table? Any input/feedback would be much appreciated. I've included a brief sample below.

Option 1 (single dimension table):

dim_monitored_object

object_key	object_name	object_type	computer_name	computer_domain	operating_system	amount_of_ram	is_clustered	is_virtual_machine
1	Drive C:	Disk	Srv1	ABC.COM	Windows Server	4 GB	No	No
2	Drive D:	Disk	Srv1	ABC.COM	Windows Server	4 GB	No	No
3	rpc.exe	Service	WkSt1	ABC.COM	Windows Vista	1 GB	No	No

Option 2 (2 separate dimension tables):

dim_monitored_object

object_key	object_name	object_type
1	Drive C:	Disk
2	Drive D:	Disk
3	rpc.exe	Service

and...

dim_monitored_computer

computer_key	computer_name	computer_domain	operating_system	amount_of_ram	is_clustered	is_virtual_machine
1	Srv1	ABC.COM	Windows Server	4 GB	No	No
2	Srv1	ABC.COM	Windows Server	4 GB	No	No
3	WkSt1	ABC.COM	Windows Vista	1 GB	No	No

Option 3: Something else?

ngalemmo Fri Jan 03, 2014 4:33 pm

Why snowflake? Your question about the 'valid alternative' is backward. The idea of separate dimensions and two foreign keys on the fact is not a 'valid alternative'… it is the primary way to do it. Snowflaking and consolidating the dimension are the alternatives.

salaman Fri Jan 03, 2014 5:45 pm

Thanks ngalemmo,

The reason my logic sounded "backwards" is because I approached the original problem "backwards" (i.e. snowflake first) - probably as a result of my relational modelling background.

Out of interest, would there ever be a situation in which the alternatives (snowflake/consolidate) would be preferable?

ngalemmo Fri Jan 03, 2014 6:02 pm

Offhand, no. But Ralph wrote about it a (long) while back and identified situations where snowflaking is preferable, if not necessary. You may be able to find it by searching the main site.

Sponsored content

single denormalized dimension or 2 separate dimensions?

single denormalized dimension or 2 separate dimensions?

Re: single denormalized dimension or 2 separate dimensions?

Re: single denormalized dimension or 2 separate dimensions?

Re: single denormalized dimension or 2 separate dimensions?

Re: single denormalized dimension or 2 separate dimensions?