Location and population dim/fact
4 posters
Page 1 of 1
Location and population dim/fact
Hi, I'm new to dimensional modeling and I’m trying to wrap my head around all this.
I'm trying to create a model that allows me to represent locations to populations.
I think my dimension tables looks like this:
Table: DimLocation
LocationId
Country
State
County
City
ZipCode
Table: DimYear
YearId
Year
I’m guessing that I should store a dimension record at the lowest level, say zip code?
And Fact
Table: FactPopulation
PopulationId
LocationId
YearId
TotalPopulation
TotalHouseholds
I stuck with representing each location type above zip codes. For example, I can easily see how I can represent a zip code, pretty straight forward… but what about at the State level, or County Level, etc.?
Thanks in advance for any direction or help.
I'm trying to create a model that allows me to represent locations to populations.
I think my dimension tables looks like this:
Table: DimLocation
LocationId
Country
State
County
City
ZipCode
Table: DimYear
YearId
Year
I’m guessing that I should store a dimension record at the lowest level, say zip code?
And Fact
Table: FactPopulation
PopulationId
LocationId
YearId
TotalPopulation
TotalHouseholds
I stuck with representing each location type above zip codes. For example, I can easily see how I can represent a zip code, pretty straight forward… but what about at the State level, or County Level, etc.?
Thanks in advance for any direction or help.
shanemcmurray- Posts : 5
Join date : 2011-07-18
Re: Location and population dim/fact
State and city are just aggregations of zips. You join on location_id where state = 'TX'. That will give you all households and total population for Texas.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Location and population dim/fact
Thanks for the input. so, store it at the zip level in the fact table and sum it up to get the state level?
like:
select sum(TotalPopulation)
from factpopulation a join dimlocation b on a.location_id = b.location_id
where b.state = 'tx'
I was thinking the data could be stored in the fact table at any level, like state or county, without rolling it up.
would this be an incorrect approach?
Store it at the lowest grain, then aggregate it up? is that a better approach?
Thanks.
like:
select sum(TotalPopulation)
from factpopulation a join dimlocation b on a.location_id = b.location_id
where b.state = 'tx'
I was thinking the data could be stored in the fact table at any level, like state or county, without rolling it up.
would this be an incorrect approach?
Store it at the lowest grain, then aggregate it up? is that a better approach?
Thanks.
shanemcmurray- Posts : 5
Join date : 2011-07-18
Re: Location and population dim/fact
You should always store at the lowest level needed and aggregate up.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Location and population dim/fact
Your SQL query is correct. That's the real power of Dimensional Modeling--the ability to "slice and dice" using any combinations of dimension attributes. As already mentioned, you should always store your fact data at the lowest level (grain) available and aggregate up as needed.
For a large DW, you might alsocreate aggreate fact tables to support common queries, for example a fact table the stores population by state. But don't even think about this for a fact tables with <10M rows. As long as queries are acceptably fast (and most modern database engines are pretty fast at handling star-schema queries) there is no need for the addition of an aggregate table.
For a large DW, you might alsocreate aggreate fact tables to support common queries, for example a fact table the stores population by state. But don't even think about this for a fact tables with <10M rows. As long as queries are acceptably fast (and most modern database engines are pretty fast at handling star-schema queries) there is no need for the addition of an aggregate table.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Location and population dim/fact
VHF, if I were to create an aggreate fact table for say, state population totals, do you use the same dimension table? or create a seperate dim table to represent the state level (grain)?
shanemcmurray- Posts : 5
Join date : 2011-07-18
Separate dimension for aggregate fact table
Either approach is possible.
You can create a separate State dimension. As long as the State attribute in the State dimension contains the same values as the State attribute in your Location dimension these would be said to be conformed dimensions. (In fact, you could create your State dimension by doing a distinct query on your Location dimension.)
Or you can create special "state" records in your Location dimension. These rows would represent an entire state and have either NULL, empty, or a special value such as 'NA' in the City and Zip fields. It is often useful to have a flag that designates a state row.
Some folks here have a preference for one design pattern or the other, and can perhaps chime in with pros/cons. I have done both, and haven't settled on a single approach as best. I try to take into account how the design will affect what the user will see in the BI tools.
Some BI tools (such as Business Objects) have specific support for aggregates. Once configured, the tools will use either the low-level fact table or the aggregate fact table depending on the user’s query. Maintaining a single dimension table (with "special rows" for aggregate use) might facilitate setting up aggregate awareness with some tools.
P.S. I would recommend Mastering Data Warehouse Aggregates by Christopher Adamson with forward by Ralph Kimball.
You can create a separate State dimension. As long as the State attribute in the State dimension contains the same values as the State attribute in your Location dimension these would be said to be conformed dimensions. (In fact, you could create your State dimension by doing a distinct query on your Location dimension.)
Or you can create special "state" records in your Location dimension. These rows would represent an entire state and have either NULL, empty, or a special value such as 'NA' in the City and Zip fields. It is often useful to have a flag that designates a state row.
Some folks here have a preference for one design pattern or the other, and can perhaps chime in with pros/cons. I have done both, and haven't settled on a single approach as best. I try to take into account how the design will affect what the user will see in the BI tools.
Some BI tools (such as Business Objects) have specific support for aggregates. Once configured, the tools will use either the low-level fact table or the aggregate fact table depending on the user’s query. Maintaining a single dimension table (with "special rows" for aggregate use) might facilitate setting up aggregate awareness with some tools.
P.S. I would recommend Mastering Data Warehouse Aggregates by Christopher Adamson with forward by Ralph Kimball.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: Location and population dim/fact
okay, got it. Thanks for all the help.
shanemcmurray- Posts : 5
Join date : 2011-07-18
Re: Location and population dim/fact
I’m guessing that I should store a dimension record at the lowest level, say zip code?
Actually it would be at whatever level you are getting the data. If this is Census data, you would probably be best storing it at the tract level.
Also, ZIP codes cross political boundaries. There are codes that cross states, however almost all of these are in sparsely populated areas. It is common for codes to cross municipal boundaries, so city aggregates, particularly in suburban areas, can be misleading. There are also codes for military (FPO) and large commercial buildings, which may have zero population, causing problems with per capita calculations.
Re: Location and population dim/fact
ngalemmo, I see what you are saying. in this case would you recommend having fact tables for each geo type with matching dim tables to group/slice at that level? For example:
DimZipcode:
LocationId
Country
State
County
City
ZipCode
FactZipcode:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
DimCity:
LocationId
Country
State
County
City
Factcity:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
DimCounty:
LocationId
Country
State
County
FactCounty:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
OR
Is there a better way to model this?
DimZipcode:
LocationId
Country
State
County
City
ZipCode
FactZipcode:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
DimCity:
LocationId
Country
State
County
City
Factcity:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
DimCounty:
LocationId
Country
State
County
FactCounty:
PopulationId
LocationId
TotalPopulation
TotalHouseholds
OR
Is there a better way to model this?
shanemcmurray- Posts : 5
Join date : 2011-07-18
Re: Location and population dim/fact
I wouldn't necessarily go that far. The aggregates are only useful if you can distinquish where zips cross bondaries. It all boils down to the level of detail of the mesures being received. The location dimension should reflect that level of detail.
If all you get is population by zip code, there isn't much else you can do about it and having the aggregtes don't change anything since you would not know the correct city or state anyway... just the one where the post office is located. However if this is census tract data, the location dimension would (and should) by at the tract level with city, state, zip and other attributes.
If the data is by city/state/zip (zip repeats with different values when crossing political boundaries), then that should be the NK of the location dimension. Again you would not need the additional aggregate fact tables unless performance is an issue.
If all you get is population by zip code, there isn't much else you can do about it and having the aggregtes don't change anything since you would not know the correct city or state anyway... just the one where the post office is located. However if this is census tract data, the location dimension would (and should) by at the tract level with city, state, zip and other attributes.
If the data is by city/state/zip (zip repeats with different values when crossing political boundaries), then that should be the NK of the location dimension. Again you would not need the additional aggregate fact tables unless performance is an issue.
Similar topics
» how to design? Vehicle location status fact table
» Best practices for a Fact table that contains a row per date/hour/location/patient stay
» Dimensional model for projecting households and population
» Location in Each Dimension?
» Location and department (n:m)
» Best practices for a Fact table that contains a row per date/hour/location/patient stay
» Dimensional model for projecting households and population
» Location in Each Dimension?
» Location and department (n:m)
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|