type 1 dimension - new requirements for attributes that will be updated often
2 posters
Page 1 of 1
type 1 dimension - new requirements for attributes that will be updated often
i have a type one product dimension which gets updated maybe once every 6-8 weeks. new requirements are for attributes that will likely be updated quite often, possibly once a day, but most likely 2-3 times a week. the product dimension is quite large so i thought to break the new attributes out into their own dimension (for performance reasons). the new attributes are pretty unique to the product so pretty much 1-to-1 with the product itself. how should this be modeled? should the new dimension have the same PK as the product dimension? include the natural keys in the new dimension as well?
note that there is no reason to track history for any of the data.
thanks!
note that there is no reason to track history for any of the data.
thanks!
topcat- Posts : 19
Join date : 2012-08-09
Re: type 1 dimension - new requirements for attributes that will be updated often
What are you concerned about that would make you consider splitting the table?
Daily updates, particularly in a type 1 dimension should not be a major concern. Do you have performance or service level requirements that you are not able to maintain?
Anyway, if you do, the technique is called 'vertical partitioning'. You would use the same PK on both tables. Only one needs to carry the natural key.
Daily updates, particularly in a type 1 dimension should not be a major concern. Do you have performance or service level requirements that you are not able to maintain?
Anyway, if you do, the technique is called 'vertical partitioning'. You would use the same PK on both tables. Only one needs to carry the natural key.
re: type 1 dimension - new requirements for attributes that will be updated often
Concerned about performance, the dimension is very wide and will potentially contain a quarter million records. The dimension values are "behavioral" dimensions based on facts. for example, something along the lines of how many times the product was ordered (its not that exactly, but something similar). So overall, i am concerned about the db engine updating 100K+ records which are each 10k characters wide. The other thing i didnt mention is the db engine (netezza) works on an insert only strategy, so updates are really inserts with a later reclaim process that removes the old inactive records.
topcat- Posts : 19
Join date : 2012-08-09
Re: type 1 dimension - new requirements for attributes that will be updated often
Netezza... ok. Vertical partitioning on a Netezza platform is an effective way to improve query performance of a very wide table. But you need to break it down in a way that minimizes the time you need more than one table in a typical query. One of the vertical partitions should be the 'rarely used' collection of attributes... hopefully these may be a sizable chunk of a row that you remove from query overhead almost all the time. This may mean having some attributes in common across tables.
Make sure all tables are distributed on the primary key, and use the same primary key for all tables. Organizing the tables on one or two commonly used predicates will also help. In V7.0 they are going to a finer grained zone map (smaller disk blocks) which will improve the results of organization and zone maps in general.
Another effective mechanism is to declare materialized views. A Netezza materialized view is restricted to columns from one table (no joins). Its purpose is to create vertical partitions to help query performance. The difference is it materializes a copy of those columns from the main table and it keeps the content in sync with the base table. Another advantage it is transparent to the query. The user submits a query to the main table. The query optimizer will check to see if there is an appropriate materialized view and use that instead. User doesn't need to know or care about the views.
A 100K+ row table is no big deal. If you have a moderate sized machine with 48 SPUs, you are dealing with 48 little tables of around 2K rows each. At an average size of 10K bytes, each spu would need to read, at most, 20MB of disk, cache what it needs to memory (it only pulls the attributes you use in the query, so cache used is much, much less that 20MB), then a hash join to the fact table (with any other needed dimensional attributes in memory). Its very quick. If you can come up with no more than 3-4 different subsets (because of the overhead to maintain these tables when updating the base table), materialized views may be a more effective use of the machine.
Make sure all tables are distributed on the primary key, and use the same primary key for all tables. Organizing the tables on one or two commonly used predicates will also help. In V7.0 they are going to a finer grained zone map (smaller disk blocks) which will improve the results of organization and zone maps in general.
Another effective mechanism is to declare materialized views. A Netezza materialized view is restricted to columns from one table (no joins). Its purpose is to create vertical partitions to help query performance. The difference is it materializes a copy of those columns from the main table and it keeps the content in sync with the base table. Another advantage it is transparent to the query. The user submits a query to the main table. The query optimizer will check to see if there is an appropriate materialized view and use that instead. User doesn't need to know or care about the views.
A 100K+ row table is no big deal. If you have a moderate sized machine with 48 SPUs, you are dealing with 48 little tables of around 2K rows each. At an average size of 10K bytes, each spu would need to read, at most, 20MB of disk, cache what it needs to memory (it only pulls the attributes you use in the query, so cache used is much, much less that 20MB), then a hash join to the fact table (with any other needed dimensional attributes in memory). Its very quick. If you can come up with no more than 3-4 different subsets (because of the overhead to maintain these tables when updating the base table), materialized views may be a more effective use of the machine.

» Design Alternatives for Late Arriving/Updated Type 2 Dimension Data
» Handling of SCD type 2 attributes in outrigger dimension
» Design Standards - Numeric Type 2 Dimension Attributes
» dimension table design question for around 100 attributes and higher level calculated attributes
» Type 2 dimension or type 2 column?
» Handling of SCD type 2 attributes in outrigger dimension
» Design Standards - Numeric Type 2 Dimension Attributes
» dimension table design question for around 100 attributes and higher level calculated attributes
» Type 2 dimension or type 2 column?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|