Language translation for large number of codes
2 posters
Page 1 of 1
Language translation for large number of codes
The significant dimension tables in our data model each contain 25 or so codes. Example currency code, category code
etc. Each of these dimension tables have different sets of codes i.e they do not share identical sets of codes. Each
code requires a short and long description for each supported language. The descriptions do not change frequently.
One model would be to store the three attributes required for each code i.e code, short desc and long desc with a
language prefix in the dimension table itself thus:
-- Language English - Currency Code
English Currency Code
English Currency Short Description
English Currency Long Description
-- Language Chinese - Currency Code
Chinese Currency Code
Chinese Currency Short Description
Chinese Currency Long Description
-- Language English - Category Code
English Category Code
English Category Short Description
English Category Long Description
-- Language Chinese - Category Code
Chinese Category Code
Chinese Category Short Description
Chinese Category Long Description
...........
...........
...........
If there were 15 supported languages then 15 * 25* 3 = 1125 distinct attributes would be required. The BI layer, Cognos, knows
the language from the login id. The attribute named say CURRENCY SHORT DESCRIPTION in the Cognos metadata layer
could be a CASE statement returning the appropriate attribute based on the language code of the logged in user. The
other two attributes can be treated similarly. However, the problem with this approach is the large number of
attributes in each dimension table. I would appreciate any suggestions/comments/solutions.
etc. Each of these dimension tables have different sets of codes i.e they do not share identical sets of codes. Each
code requires a short and long description for each supported language. The descriptions do not change frequently.
One model would be to store the three attributes required for each code i.e code, short desc and long desc with a
language prefix in the dimension table itself thus:
-- Language English - Currency Code
English Currency Code
English Currency Short Description
English Currency Long Description
-- Language Chinese - Currency Code
Chinese Currency Code
Chinese Currency Short Description
Chinese Currency Long Description
-- Language English - Category Code
English Category Code
English Category Short Description
English Category Long Description
-- Language Chinese - Category Code
Chinese Category Code
Chinese Category Short Description
Chinese Category Long Description
...........
...........
...........
If there were 15 supported languages then 15 * 25* 3 = 1125 distinct attributes would be required. The BI layer, Cognos, knows
the language from the login id. The attribute named say CURRENCY SHORT DESCRIPTION in the Cognos metadata layer
could be a CASE statement returning the appropriate attribute based on the language code of the logged in user. The
other two attributes can be treated similarly. However, the problem with this approach is the large number of
attributes in each dimension table. I would appreciate any suggestions/comments/solutions.
Last edited by dennisf on Sat Aug 08, 2009 3:57 pm; edited 1 time in total
dennisf- Posts : 9
Join date : 2009-04-14
Re: Language translation for large number of codes
I would not consider 60 columns large. I think your design is fine.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Language translation for large number of codes
Thanks for the response. I have two comments/questions.
1. These code columns are not the only columns in each of the dimension tables. So how many columns is large for an Oracle table? Is there a performance impact? If there are a large number of columns then there are fewer rows in each block. When a read and a subsequent update is execeuted all the columns not impacted by the update still go for the ride consuming IO channel bandwidth and memory in the Oracle cache?
What are the considerations for separating attributes into multiple tables that share the same primary key? What considertions justify the two joins instead of one?
2. I thought that these 20 * 3 code attributes should be in a separate table with the same primary key as in a MINI DIMENSION, but on closer examination the reasons for a mini dimension does not exist? My understanding of MINI DIMENSIONS is when you need TYPE 2 for a large dimension. Since a new record duplicates every single attribute we isolate the SLOWLY CHANGING dimensions into a separate table and apply TYPE 2 only on that table. The other non slowly changing dimensions are not duplicated in vain.
I would appreciate any suggestions/comments.
Cheers!
1. These code columns are not the only columns in each of the dimension tables. So how many columns is large for an Oracle table? Is there a performance impact? If there are a large number of columns then there are fewer rows in each block. When a read and a subsequent update is execeuted all the columns not impacted by the update still go for the ride consuming IO channel bandwidth and memory in the Oracle cache?
What are the considerations for separating attributes into multiple tables that share the same primary key? What considertions justify the two joins instead of one?
2. I thought that these 20 * 3 code attributes should be in a separate table with the same primary key as in a MINI DIMENSION, but on closer examination the reasons for a mini dimension does not exist? My understanding of MINI DIMENSIONS is when you need TYPE 2 for a large dimension. Since a new record duplicates every single attribute we isolate the SLOWLY CHANGING dimensions into a separate table and apply TYPE 2 only on that table. The other non slowly changing dimensions are not duplicated in vain.
I would appreciate any suggestions/comments.
Cheers!
dennisf- Posts : 9
Join date : 2009-04-14
Re: Language translation for large number of codes
I don't make mini dimensions or other tuning changes unless I know I have a problem. If the basic dimensional model will support your queries then don't bother tweaking every last bit of performance out of Oracle. The most common problem I see is developers putting all of the dimensional columns on the fact table along with the surrogate dimension key. Kind of defeats the whole principle of dimensional modeling. Unless your dimensions are in the multi million row range your joins should be fine.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Language translation for codes
My original post had a math error which I have correc etd. The number of attributes should be thus:
15 languages * 25 codes * 3 columns each code = 1125 attributes.
15 languages * 25 codes * 3 columns each code = 1125 attributes.
dennisf- Posts : 9
Join date : 2009-04-14
Similar topics
» Large number of snowflake code tables per dimension
» Large number of late arriving facts
» Multi Language hybrid approach
» Codes and indicators
» How to handle bad zip codes
» Large number of late arriving facts
» Multi Language hybrid approach
» Codes and indicators
» How to handle bad zip codes
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum