Large number of snowflake code tables per dimension
2 posters
Page 1 of 1
Large number of snowflake code tables per dimension
Ten(10) or so key dimensions(lets call them base dimensions) each has about 25 parent code tables in the 3rd normal operational source. Each code has a short and long description. In the multidimensional model If a fact table has say 3 base dimensions then I could end up with 3 * 25 extra code dimensions (minus the common code across the 3 base dimensions). This will be way too many dimensions. Any suggestions would be appreciated.
dennisf- Posts : 9
Join date : 2009-04-14
Re: Large number of snowflake code tables per dimension
Why would you create copies of dimension tables for each fact table? But that's another topic...
What I do is have a 'list of values' table for each of the little code/description sets. On the dimension tables I store the code and description so that I do not have a snowflake (exposing the code to users is optional). I then put a trigger on the list of value tables that, when a description for a code changes, updates the dimension tables that hold the description with the new description based on the code. I also use the list of value tables during ETL as a lookup to get the descriptions when I am updating or inserting a dimension row.
The user never uses the list of value tables in queries. However, they do come in handy as a list source for most BI tool interfaces. It avoids the BI tool from doing a distinct query against the dimension when presenting a drop-down selection list.
What I do is have a 'list of values' table for each of the little code/description sets. On the dimension tables I store the code and description so that I do not have a snowflake (exposing the code to users is optional). I then put a trigger on the list of value tables that, when a description for a code changes, updates the dimension tables that hold the description with the new description based on the code. I also use the list of value tables during ETL as a lookup to get the descriptions when I am updating or inserting a dimension row.
The user never uses the list of value tables in queries. However, they do come in handy as a list source for most BI tool interfaces. It avoids the BI tool from doing a distinct query against the dimension when presenting a drop-down selection list.
Large number of code values
I would have included the short and long descriptions in the base dimension table but for another feature we have to provide i.e. multi language descriptions. Currently we provide descriptions for 10 languages. If a base dimesnion table had 25 codes that would mean 25 * 2 * 10 description columns to cater for the short and long descriptions for all languages. Thats 500 additional columns just for descriptions, 25 for the codes and the other columns in the table.
dennisf- Posts : 9
Join date : 2009-04-14
Re: Large number of snowflake code tables per dimension
When dealing with multiple languages, one approach is to have a simple star schema with multiple copies of a dimension table, each in a different language. Each copy would have identical structures and the same primary key, just the descriptive fields would be in a particular language. You would then use database synonyms or views to present the appropriate dimension based on a user's language.
For example, if you are using Oracle, you would have a common schem containing the fact tables and multiple language specific schema with the dimension tables. A user is logged into a language specific schema which contain synonyms to the fact tables. From a user/BI tool point of view, everthing looks the same regardless of the language. This approach does not work very well if you need to provide multiple languages in the same query.
For example, if you are using Oracle, you would have a common schem containing the fact tables and multiple language specific schema with the dimension tables. A user is logged into a language specific schema which contain synonyms to the fact tables. From a user/BI tool point of view, everthing looks the same regardless of the language. This approach does not work very well if you need to provide multiple languages in the same query.
Large number of code tables
In fact there is a requirement that the end user be allowed to select a language from a scroll down list at report execution time after he has loggen in.
dennisf- Posts : 9
Join date : 2009-04-14
Re: Large number of snowflake code tables per dimension
Ok then... the main thing you want to avoid is have multiple columns for each language. It makes queries or using 3rd party tools very, very difficult.
You could always go with the snowflake you described at the outset, including a language code as part of the key and maintaining multiple rows for a code based on language, or you can denormalize into a single dimension table with a compound primary key made up of the surrogate key and a language code. For any given dimension entry, there would be multiple rows with the same surrogate key but different language codes. This way your fact table only carries a single foreign key, regardless of language. The language filter would need to be applied globally to all dimensions (which would be the case with the snowflake approach as well).
If your objective is reducing the number of tables, then the latter approach is the way to go. It not only allows users to choose a language, but it also allows you create the same report in multiple languages in a single query.
You could always go with the snowflake you described at the outset, including a language code as part of the key and maintaining multiple rows for a code based on language, or you can denormalize into a single dimension table with a compound primary key made up of the surrogate key and a language code. For any given dimension entry, there would be multiple rows with the same surrogate key but different language codes. This way your fact table only carries a single foreign key, regardless of language. The language filter would need to be applied globally to all dimensions (which would be the case with the snowflake approach as well).
If your objective is reducing the number of tables, then the latter approach is the way to go. It not only allows users to choose a language, but it also allows you create the same report in multiple languages in a single query.
Large number of snowflake code tables per dimension
I prefer the following: Say the surrogate key of the base dimension table say BD is SK. I would have another table BD_OUTRIGGER with a composite primary of LANGUAGE and SK list partioned on LANGUAGE. A single record in the base dimension table with surrogate key say SK1 would have many records in the BD_OUTRIGGER table with pKs of (English,SK1), (Chinese,SK1) etc ( approx. 10 languages are supported now). This means that once the language is selected at report execution time there would be a one to one relationship between table BD and BD_OUTRIGGER. The only problem with this solution is that if table BD is very large say 500 million rows then BD_OUTRIGGER will be500 million times the number of languages supported.
What do you think about the size problem the above solution poses?
What do you think about the size problem the above solution poses?
dennisf- Posts : 9
Join date : 2009-04-14
Re: Large number of snowflake code tables per dimension
What dimension has 500M rows and needs reporting in 10 different languages?
It would be appreciated, if you are looking for advice, to be a little more forthcoming with the details. It's real difficult to propose a workable solution dealing with presumptions and hypotheticals. Obviously, if you are dealing with 500M rows, you would probably think about reworking the model a little bit...
It would be appreciated, if you are looking for advice, to be a little more forthcoming with the details. It's real difficult to propose a workable solution dealing with presumptions and hypotheticals. Obviously, if you are dealing with 500M rows, you would probably think about reworking the model a little bit...
Similar topics
» Number of Columns in Fact Tables vs. Dimension Tables
» SCD Type 2 on large and wide dimension tables
» Fact and dimension tables - avoiding same number of rows in both
» Language translation for large number of codes
» Large number of late arriving facts
» SCD Type 2 on large and wide dimension tables
» Fact and dimension tables - avoiding same number of rows in both
» Language translation for large number of codes
» Large number of late arriving facts
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum