Conformed dimensions
4 posters
Page 1 of 1
Conformed dimensions
Hello,
This questions relates to a data model that I am designing, according to Kimball methodology.
This data ware house combines data from a CMDB that contains information about computers (12,000) and scanning software that scans all computers for all installed software products.
So simplified there would be a factless fact table called Software Installation, with some dimensions such as Computer, Date Discovered, and Software Product.
The problem lies in the fact that with software there is a hierarchy of Vendor, Product and Version (Adobe, Photoshop, CS4). Most reports that have to be built have to ignore the version (but not all). If a computer has both Photoshop CS3 and CS4 installed (so two versions but one product) in a report this computer would have to be counted as 1 for the line Adobe Photoshop.
My idea is to make a SoftwareProductDimension, and conformed to that, a SoftwareProductVersionDimension. I could then use two fact tables: SoftwareProductInstallation and SoftwareProductVersionInstallation.
What would you think of that solution?
I just need to save information about software products, such as description and unit price, that are on a product level and not on a version level. Could I just add these extra attributes to the SoftwareProductDimension, or should I create a separate dimension for this.
I appreciate anyone for helping me a step further.
Feel free to let me know if you would have any questions about my case.
Jochem
This questions relates to a data model that I am designing, according to Kimball methodology.
This data ware house combines data from a CMDB that contains information about computers (12,000) and scanning software that scans all computers for all installed software products.
So simplified there would be a factless fact table called Software Installation, with some dimensions such as Computer, Date Discovered, and Software Product.
The problem lies in the fact that with software there is a hierarchy of Vendor, Product and Version (Adobe, Photoshop, CS4). Most reports that have to be built have to ignore the version (but not all). If a computer has both Photoshop CS3 and CS4 installed (so two versions but one product) in a report this computer would have to be counted as 1 for the line Adobe Photoshop.
My idea is to make a SoftwareProductDimension, and conformed to that, a SoftwareProductVersionDimension. I could then use two fact tables: SoftwareProductInstallation and SoftwareProductVersionInstallation.
What would you think of that solution?
I just need to save information about software products, such as description and unit price, that are on a product level and not on a version level. Could I just add these extra attributes to the SoftwareProductDimension, or should I create a separate dimension for this.
I appreciate anyone for helping me a step further.
Feel free to let me know if you would have any questions about my case.
Jochem
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Conformed dimensions
I imagine you could build another fact, but are you not overcomplicating things? Could you not get the same result with a query using an embedded select?
Re: Conformed dimensions
jochem_van_grondelle wrote:My idea is to make a SoftwareProductDimension, and conformed to that, a SoftwareProductVersionDimension.
What you are describing sounds like snowflaked dimension (rather than conformed dimensions.) It is quite common for new dimensional modelers to create snowflake dimensions because it follows best practices 3NF relational design. However, snowflaking is generally frowned upon in the DW world because it adds complexity (especially for end users that have trouble understanding relational models) and reduces query performance.
Instead, you should denormalize your dimensions in order to create simple star schemas. In your case, this would mean a single SoftwareProductDimension with attributes for Vendor, Product, and Version. There would be one row for each possible version of a product. (Your ETL process will need to handle creating a new row whenever a new vendor/product/row shows up.) You can also add Description and Price* to this dimenson. It is OK to repeat the description information for as many rows as there are versions of a product. This is the denormalized aspect of the dimension.
You can then have a single fact table SoftwareInstalledFact which would point to the appropriate row in SoftwareProductDimension as well as other dimensions for machine (which could include attributes for department, physical location, etc.) and date.
*There are two ways to handle price: Either as an attribute in SoftwareProductDimension or in a fact table of its own. What happens when a price changes? How will you receive the price information? How often will prices change? Do you want to keep historical price information, or only the most current price? If you need only the most current price for a given version of the software, this can easily be stored as a dimension attribute. Tracking changes would require more advanced SCD2 (Slowly Changing Dimension Type 2) techniques or the use of a periodic snapshot fact table to capture price.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Conformed dimensions
Thank you for your response.ngalemmo wrote:I imagine you could build another fact, but are you not overcomplicating things? Could you not get the same result with a query using an embedded select?
For multiple reasons I would not stick to one fact table and one productdimension.
-Information about products (such as price, reseller and description) is provided on a product level (not version). Should I repeat all of these extra attributes in every version of a product?
-For all users that will have access to a reporting environment, I would like it easy for them to distinguish Product vs Product+Version, and to avoid them to make any mistakes. Because there will be big differences between aggregated data on a Product vs. Product Version level.
-As my case was simplified, there will be an even more grained fact table, concentrating on individual files that were found on a computer that identify a product version (and so a product), which should be designed the same way. Product -> Product Version -> Product Version Files. From this quite detailed data, I could quickly build up the other - kind of aggregated - fact tables.
I hope you understand these, but I am still very curious to your opinion.
Just to understand you well about the embedded select; Is this what you mean if I would stick to one fact table?
- Code:
Computer Name | Product | Version
--------------+-----------+----------
Comp1 | Photoshop | CS3
Comp2 | Photoshop | CS3
Comp2 | Photoshop | CS4
SELECT Product, Count(*)
FROM
(SELECT DISTINCT Computer Name, Product From VersionInstallations) AS ProductInstallations
GROUP BY Product
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Conformed dimensions
VHF wrote:What you are describing sounds like snowflaked dimension (rather than conformed dimensions.)
Thanks for your explanation.
I am aware of the important differences between snowflakes and denormalized dimensional modeling.
I believe that in the case I tried to explain, I am still referring to a dimension model. Maybe I have caused some misunderstanding?
My solution would be (very simplified)
FACTS
SoftwareVersionInstallation (FK to Computer, SoftwareProduct, SoftwareProductVersion)
SoftwareInstallation (FK to Computer, SoftwareProduct)
DIMENSIONS
Computer (Computer Name, Owner, IP)
SoftwareProduct (Vendor, Product Name, Description, Price)
SoftwareProductVersion (Vendor, Product Name, Version)
As history of pricing is not important, I could add price as an attribute to the SoftwareProductDimensionVHF wrote:*There are two ways to handle price: Either as an attribute in SoftwareProductDimension or in a fact table of its own. What happens when a price changes? How will you receive the price information? How often will prices change? Do you want to keep historical price information, or only the most current price? If you need only the most current price for a given version of the software, this can easily be stored as a dimension attribute. Tracking changes would require more advanced SCD2 (Slowly Changing Dimension Type 2) techniques or the use of a periodic snapshot fact table to capture price.
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Conformed dimensions
Yes, that is best practice in dimensional modeling. It is very hard when coming from the 3NF world to resist the temptation to factor out repeated values!jochem_van_grondelle wrote:Should I repeat all of these extra attributes in every version of a product?
Let's try it out with a single fact table:
FACTS
SoftwareInstallation (FK to Computer, FK to SoftwareProduct)
DIMENSIONS
Computer (Computer Name, Owner, IP)
SoftwareProduct (Vendor, Product Name, Version, Description, Price)
This allows you get information by version:
SELECT Vendor, Product Name, Version, COUNT(*)
FROM SoftwareInstallation AS f1
JOIN SoftwareProduct AS d1 ON d1.SoftwareProductKey = f1.SoftwareProductKey
GROUP BY Vendor, Product Name, Version
Or just by product:
SELECT Vendor, Product Name, COUNT(*)
FROM SoftwareInstallation AS f1
JOIN SoftwareProduct AS d1 ON d1.SoftwareProductKey = f1.SoftwareProductKey
GROUP BY Vendor, Product Name
(Note that there is no need for a DISTINCT subquery--just GROUP BY whatever level you are after.)
You are concerned about what your users will see and how they will query the system. In general, a single star schema for each subject area keeps things as simple as possible for them. What tool(s) will they be using to query the DW? Most BI tools automatically do the grouping based on which fields the users drag into their reports.
(No concerns with you creating a separate fact table to identify which files were found on the scanned systems. Presumably only IT-insiders will be using that one!)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Conformed dimensions
BTW, I see now that you are not snowflaking. You are just creating another dimension and another fact at different grains. Sometimes this is required, so it isn't always a wrong approach, although I still feel that in your case everything can be handled with a single version-grained star schema.
Are there any queries you still feel couldn't be easily answered using the single fact table?
Are there any queries you still feel couldn't be easily answered using the single fact table?
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Conformed dimensions
Thanks again.
I need to think a bit. I will get back on this!
I need to think a bit. I will get back on this!
jochem_van_grondelle- Posts : 11
Join date : 2009-09-22
Re: Conformed dimensions
VHF's model should work. Just remember the grain of the product dimension is down to the version level. The product name/id is just a grouping (aggregate) attribute in the dimension. The natural key would be on two columns, product id and version number.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Conformed dimensions
I have no problem having separate product and product version dimensions. There is some minor redundancy, but it is no big deal.
I question the need for a second fact table in order to avoid counting multiple versions of the same product twice for a given user. A count of distinct users (i.e. COUNT(DISTINCT) ) grouped by product (no version) would handle the problem in a version level fact table.
I question the need for a second fact table in order to avoid counting multiple versions of the same product twice for a given user. A count of distinct users (i.e. COUNT(DISTINCT
Similar topics
» Design all dimensions as conformed dimensions
» About Conformed dimensions
» Conformed Dimensions
» Conformed Dimensions
» Conformed Dimensions
» About Conformed dimensions
» Conformed Dimensions
» Conformed Dimensions
» Conformed Dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum