single denormalized dimension or 2 separate dimensions?
2 posters
Page 1 of 1
single denormalized dimension or 2 separate dimensions?
Hi,
I'm trying to model a monitoring environment where objects (such as network cards, disks, services) are monitored and alerts are raised if something goes wrong with these objects. The specific process being modelled is "raised_alerts" so there's a fact table called fact_raised_alerts. Amongst various dimensions there's a "monitored_object" dimension.
Now, each monitored object lives on a computer so my initial thought was to snow-flake off the "monitored_object" dimension to a "monitored_computer" dimension, which would contain all of the attributes that belong to a computer (e.g. operating_system, service_pack, amount_of_ram, is_virtual_machine, etc....).
However, in an attempt to avoid snow-flaking, I figured I would denormalize the "monitored_object" dimension and add all of the computer-related attributes to this dimension. However, this doesn't quite feel right - partly because there are thousands of monitored objects on a given computer so repeating attributes like operating_system thousands of times seems like an overhead; and partly because computer-related attributes don't logically sound like they belong in this dimension.
So, would a valid alternative be to create a completely separate dimension called "monitored_compter" and then just have 2 separate FKs in my fact table? Any input/feedback would be much appreciated. I've included a brief sample below.
Option 1 (single dimension table):
dim_monitored_object
Option 2 (2 separate dimension tables):
dim_monitored_object
and...
dim_monitored_computer
Option 3: Something else?
I'm trying to model a monitoring environment where objects (such as network cards, disks, services) are monitored and alerts are raised if something goes wrong with these objects. The specific process being modelled is "raised_alerts" so there's a fact table called fact_raised_alerts. Amongst various dimensions there's a "monitored_object" dimension.
Now, each monitored object lives on a computer so my initial thought was to snow-flake off the "monitored_object" dimension to a "monitored_computer" dimension, which would contain all of the attributes that belong to a computer (e.g. operating_system, service_pack, amount_of_ram, is_virtual_machine, etc....).
However, in an attempt to avoid snow-flaking, I figured I would denormalize the "monitored_object" dimension and add all of the computer-related attributes to this dimension. However, this doesn't quite feel right - partly because there are thousands of monitored objects on a given computer so repeating attributes like operating_system thousands of times seems like an overhead; and partly because computer-related attributes don't logically sound like they belong in this dimension.
So, would a valid alternative be to create a completely separate dimension called "monitored_compter" and then just have 2 separate FKs in my fact table? Any input/feedback would be much appreciated. I've included a brief sample below.
Option 1 (single dimension table):
dim_monitored_object
object_key | object_name | object_type | computer_name | computer_domain | operating_system | amount_of_ram | is_clustered | is_virtual_machine |
1 | Drive C: | Disk | Srv1 | ABC.COM | Windows Server | 4 GB | No | No |
2 | Drive D: | Disk | Srv1 | ABC.COM | Windows Server | 4 GB | No | No |
3 | rpc.exe | Service | WkSt1 | ABC.COM | Windows Vista | 1 GB | No | No |
Option 2 (2 separate dimension tables):
dim_monitored_object
object_key | object_name | object_type |
1 | Drive C: | Disk |
2 | Drive D: | Disk |
3 | rpc.exe | Service |
dim_monitored_computer
computer_key | computer_name | computer_domain | operating_system | amount_of_ram | is_clustered | is_virtual_machine |
1 | Srv1 | ABC.COM | Windows Server | 4 GB | No | No |
2 | Srv1 | ABC.COM | Windows Server | 4 GB | No | No |
3 | WkSt1 | ABC.COM | Windows Vista | 1 GB | No | No |
Option 3: Something else?
salaman- Posts : 21
Join date : 2011-03-24
Re: single denormalized dimension or 2 separate dimensions?
Why snowflake? Your question about the 'valid alternative' is backward. The idea of separate dimensions and two foreign keys on the fact is not a 'valid alternative'… it is the primary way to do it. Snowflaking and consolidating the dimension are the alternatives.
Re: single denormalized dimension or 2 separate dimensions?
Thanks ngalemmo,
The reason my logic sounded "backwards" is because I approached the original problem "backwards" (i.e. snowflake first) - probably as a result of my relational modelling background.
Out of interest, would there ever be a situation in which the alternatives (snowflake/consolidate) would be preferable?
The reason my logic sounded "backwards" is because I approached the original problem "backwards" (i.e. snowflake first) - probably as a result of my relational modelling background.
Out of interest, would there ever be a situation in which the alternatives (snowflake/consolidate) would be preferable?
salaman- Posts : 21
Join date : 2011-03-24
Re: single denormalized dimension or 2 separate dimensions?
Offhand, no. But Ralph wrote about it a (long) while back and identified situations where snowflaking is preferable, if not necessary. You may be able to find it by searching the main site.
Similar topics
» A single Dimension table Or separate the Dimensions?
» Correlated - Separate Dimensions OR Single Dimensions ?
» Hierarchy within single dimension or two dimensions
» Multiple dimensions Vs. Single dimension and hierarchy
» Combined Date Time dimension vs. Separate Day and Time dimensions and timezones
» Correlated - Separate Dimensions OR Single Dimensions ?
» Hierarchy within single dimension or two dimensions
» Multiple dimensions Vs. Single dimension and hierarchy
» Combined Date Time dimension vs. Separate Day and Time dimensions and timezones
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum