Generating a Junk dimension - Cross join or based on actual values in fact?
4 posters
Page 1 of 1
Generating a Junk dimension - Cross join or based on actual values in fact?
Hi there,
Are there any problems if I create a junk dimension and maintain it like a Type 1 SCD, compared to creating the junk dimension as a cross join to build an exhaustive list of permutations.
The rationale for a Type 1 styled Junk dimension is that a cross join results in 10,000 rows, while a distinct of attributes in the fact table results in only 100 rows. This 100 rows may never reach 10,000 rows.
Thanks!
ML
Are there any problems if I create a junk dimension and maintain it like a Type 1 SCD, compared to creating the junk dimension as a cross join to build an exhaustive list of permutations.
The rationale for a Type 1 styled Junk dimension is that a cross join results in 10,000 rows, while a distinct of attributes in the fact table results in only 100 rows. This 100 rows may never reach 10,000 rows.
Thanks!
ML
kletos- Posts : 1
Join date : 2013-02-13
RE: Generating a Junk dimension - Cross join or based on actual values in fact?
As per the kimball group advice we should not be creating the exhaustive list with the cross join, to begin with. In your scenario it is still manageable if the junk dimension will go up to 10K rows if created through cross join, but there can be cases where the distinct values for each of the columns may be little high and there can be a number of columns to be placed in the junk dimension which may lead to a huge dimension in terms of rows.
Its better to build your junk dimension based upon data realities i.e. populate them with new rows as they come from source. But if you are sure the number of rows by cross join will be at max 10K i.e. source has some restriction already enforced, distinct values for the columns are predefined then you may build the exhaustive list and may not have to worry about maintaining the ETL job to populate the junk dimension.
Its better to build your junk dimension based upon data realities i.e. populate them with new rows as they come from source. But if you are sure the number of rows by cross join will be at max 10K i.e. source has some restriction already enforced, distinct values for the columns are predefined then you may build the exhaustive list and may not have to worry about maintaining the ETL job to populate the junk dimension.
rathjeevesh- Posts : 15
Join date : 2013-02-16
Re: Generating a Junk dimension - Cross join or based on actual values in fact?
rathjeevesh wrote:As per the kimball group advice we should not be creating the exhaustive list with the cross join, to begin with. In your scenario it is still manageable if the junk dimension will go up to 10K rows if created through cross join, but there can be cases where the distinct values for each of the columns may be little high and there can be a number of columns to be placed in the junk dimension which may lead to a huge dimension in terms of rows.
Its better to build your junk dimension based upon data realities i.e. populate them with new rows as they come from source. But if you are sure the number of rows by cross join will be at max 10K i.e. source has some restriction already enforced, distinct values for the columns are predefined then you may build the exhaustive list and may not have to worry about maintaining the ETL job to populate the junk dimension.
This is correct. There may be a few times when a cross-join makes sense, but most of the time, just create rows for combinations that actually exist.
Re: Generating a Junk dimension - Cross join or based on actual values in fact?
I like to test with:
select count(*) from (select distinct [column] from [table]>);
and in the case that it is customer ids, then leave it out of the select to get the distinct values of what would be the junk dimension and add columns in to the inner select and see what ones make it sky rocket to get an idea of the best combo.
I did that today for a demographic junk dimension and it was only 1037 for every combo! I was very happy. But I want to keep even the old values because the source system just stomps over them. So if I were to say just truncate the table and recreate it, i would lose accurate demographic history, because some combos would not link to that person...so add it as it comes to maintain accurate history on the fact table at the time of the event.
select count(*) from (select distinct [column] from [table]>);
and in the case that it is customer ids, then leave it out of the select to get the distinct values of what would be the junk dimension and add columns in to the inner select and see what ones make it sky rocket to get an idea of the best combo.
I did that today for a demographic junk dimension and it was only 1037 for every combo! I was very happy. But I want to keep even the old values because the source system just stomps over them. So if I were to say just truncate the table and recreate it, i would lose accurate demographic history, because some combos would not link to that person...so add it as it comes to maintain accurate history on the fact table at the time of the event.
chade25- Posts : 29
Join date : 2012-04-12
Age : 44
Location : Oregon
Similar topics
» modelling Product dimension for Pizza outlet
» A design based on junk dimension
» Connecting Actual and Budget Fact table to same Product and Customer Dimension
» Aggregate Dimension Based on Fact?
» Fact to Dimension Join (Best Practice)
» A design based on junk dimension
» Connecting Actual and Budget Fact table to same Product and Customer Dimension
» Aggregate Dimension Based on Fact?
» Fact to Dimension Join (Best Practice)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum