Index Strategy on FACT Table with 300 Million records
+2
hang
dwman
6 posters
Page 1 of 1
Index Strategy on FACT Table with 300 Million records
I have a FACT table with 30 Foreign Keys and 4 Alternate Keys. This table will have 300 million records and is going to be partitioned based on Date key.
Would like to some advise on Index strategy. Thinking of,
1.Create single column all FK indexes
2.Not sure about the primary key ? - One options is create to create a composite clustered index on the Alternate keys or create a surrogate Key ?
DWMan
Would like to some advise on Index strategy. Thinking of,
1.Create single column all FK indexes
2.Not sure about the primary key ? - One options is create to create a composite clustered index on the Alternate keys or create a surrogate Key ?
DWMan
dwman- Posts : 7
Join date : 2010-11-08
Re: Index Strategy on FACT Table with 300 Million records
30 dimension FKs in a single fact should ring alarm bell as it sounds like a centipede fact. Kimball’s rule of thumb is that less than 15 dimension fact covers most cases, 25 dimensions would be excessive for a single fact table. For design and performance purpose, you may need to review the model and try to combine correlated dimensions or break down the fact into reasonable measure groups.
As far as performance goes, you don't need to create indexes for any FKs with low selectivity, as indexes on such fields will not improve performance but rather a waste of resources.
The combination of the dimension FKs, including date key, is the primary key of your fact. You may partition your fact based on the date key if it is part of primary key regardless of clustered or not.
As far as performance goes, you don't need to create indexes for any FKs with low selectivity, as indexes on such fields will not improve performance but rather a waste of resources.
The combination of the dimension FKs, including date key, is the primary key of your fact. You may partition your fact based on the date key if it is part of primary key regardless of clustered or not.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Index Strategy on FACT Table with 300 Million records
What database system?
Why do you have alternate keys?
Why do you have alternate keys?
Index Strategy on FACT Table with 300 Million records
Platform - SQL-Server 2008.
Sorry No alternate keys.
it was a 3 column composite Primary key.
Sorry No alternate keys.
it was a 3 column composite Primary key.
dwman- Posts : 7
Join date : 2010-11-08
Re: Index Strategy on FACT Table with 300 Million records
Forget a primary key - it doesn't add value, and takes up unnecessary column width.
First of all, as was suggested above, look at the number of foreign keys. You may want to combine some of those dimensions.
As far as indexing, create a clustered index on the most often user date key (you may only have one). Then put nonclustered indexes on the most often used foreign keys - e.g. product and customer. Your indexing strategy should be based on your query usage - so run a trace on the common queries used, and check any indexes you create against those queries to ensure you've created adequate and appropriate indexes.
First of all, as was suggested above, look at the number of foreign keys. You may want to combine some of those dimensions.
As far as indexing, create a clustered index on the most often user date key (you may only have one). Then put nonclustered indexes on the most often used foreign keys - e.g. product and customer. Your indexing strategy should be based on your query usage - so run a trace on the common queries used, and check any indexes you create against those queries to ensure you've created adequate and appropriate indexes.
Re: Index Strategy on FACT Table with 300 Million records
Keep in mind that you can create nonclustered indexes on combinations of fields. If columns 1, 2, and 3 are frequently queried together, then create a nonclustered index for columns 1, 2, and 3. But remember, that if only columns 2 and 3 are in the query, the index won't be used.
The best way is to start minimally. Watch for performance issues and apply indexes targetted at the performance issues. You may find that you need a summary table to improve performance.
Be careful when testing the performance gains from indexing. Depending on the type of disk, you might be fooled into thinking that you're seeing a performance gain from the index when in fact the data used in the test is still in your disk's memory. We're using a EMC clariion SAN. Whenever it reads data from the disk, it holds it in it's own memory (seperate from the server's memory).
The best way is to start minimally. Watch for performance issues and apply indexes targetted at the performance issues. You may find that you need a summary table to improve performance.
Be careful when testing the performance gains from indexing. Depending on the type of disk, you might be fooled into thinking that you're seeing a performance gain from the index when in fact the data used in the test is still in your disk's memory. We're using a EMC clariion SAN. Whenever it reads data from the disk, it holds it in it's own memory (seperate from the server's memory).
Jeff Smith- Posts : 471
Join date : 2009-02-03
Index Strategy on FACT Table with 300 Million records
Are you suggesting no Primary Key on the FACT table ?
I don't have any test data to run a Trace to find out the potential candidates for the Non-clustered indexes ? So probably I need to start with minimum Indexes
I don't have any test data to run a Trace to find out the potential candidates for the Non-clustered indexes ? So probably I need to start with minimum Indexes
dwman- Posts : 7
Join date : 2010-11-08
Re: Index Strategy on FACT Table with 300 Million records
With the size of 300 million and growing, indexing alone is not enough to achieve reasonable performance. You need to seriously consider partitioning the table.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Index Strategy on FACT Table with 300 Million records
yes, I have already designed for a monthly partition which will result in 5- 10 million records in a partition file group.
I am mainly concerned about the 30 Foreign keys making Centipede fact. Probably too late to change the FACT design to bring down the number of dimensions. Any other thoughts put a effective indexes ?
I am mainly concerned about the 30 Foreign keys making Centipede fact. Probably too late to change the FACT design to bring down the number of dimensions. Any other thoughts put a effective indexes ?
dwman- Posts : 7
Join date : 2010-11-08
Re: Index Strategy on FACT Table with 300 Million records
Partitions only help with loading. They don't seem to do a lot for query performance.
The question you need to ask is: How will the data be used? If you have several reports, then as I said, run some traces on the queries that will use the fact table to get an understanding of which columns to index.
You can probably make some assumptions for the reports based on the joins and where clauses. But you need to test.
As I said above, start off with a clustered index on the date, and non-clustered indexes on the most commonly used foreign keys. Keep it minimal and then test against your reports.
The question you need to ask is: How will the data be used? If you have several reports, then as I said, run some traces on the queries that will use the fact table to get an understanding of which columns to index.
You can probably make some assumptions for the reports based on the joins and where clauses. But you need to test.
As I said above, start off with a clustered index on the date, and non-clustered indexes on the most commonly used foreign keys. Keep it minimal and then test against your reports.
Re: Index Strategy on FACT Table with 300 Million records
Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Index Strategy on FACT Table with 300 Million records
BoxesAndLines wrote:Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.
Sometimes they do, depending on the query.
Check this blog:
http://blogs.mssqltips.com/blogs/chadboyd/archive/2008/03/19/partitioning-data-for-query-performance-where-s-the-benefit.aspx
SQL Table Partition with Partition Key not part of primary Key
thanks folks for your advise & thoughts.
One more question, Have any of you have tried the following situation.
1. Partition key is not part of primary key.
2. But a clustered index will be created on the partition key
3. Primary key be implemented as non-clustered Index
Any pros, cons & issues would be great.
One more question, Have any of you have tried the following situation.
1. Partition key is not part of primary key.
2. But a clustered index will be created on the partition key
3. Primary key be implemented as non-clustered Index
Any pros, cons & issues would be great.
dwman- Posts : 7
Join date : 2010-11-08
Re: Index Strategy on FACT Table with 300 Million records
Check out this post by the SQL Server Customer Advisory Team for best practices on Data Warehouses.
http://sqlcat.com/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx
http://sqlcat.com/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx
Re: Index Strategy on FACT Table with 300 Million records
dwman wrote:One more question, Have any of you have tried the following situation.
1. Partition key is not part of primary key.
2. But a clustered index will be created on the partition key
3. Primary key be implemented as non-clustered Index
Any pros, cons & issues would be great.
Why do you have a primary key declared on the fact table?
Anyway, the partitioning rules are independent of the primary key. Use whatever makes sense for paritioning criteria. The tips John references are well worth reading.
SQL Table Partition with Partition Key not part of primary Key
This is not for a FACT table. I am considering Partition for a Large Log table. By taking Partition approach the old data can be deleted from the table without running the expensive delete command.
One of the Rule when you implement Partition is "Partition Key should be part of Primary Key or Clustered Index"
While I have done the partition using Primary Key approach, I haven' tried the second option.
So, looking for other people's views
One of the Rule when you implement Partition is "Partition Key should be part of Primary Key or Clustered Index"
While I have done the partition using Primary Key approach, I haven' tried the second option.
So, looking for other people's views
dwman- Posts : 7
Join date : 2010-11-08
Similar topics
» Primary Key of Fact Table
» My Index Strategy
» Fact Table Indexing Strategy
» Updating records in a fact table
» Dimension with million of records - Performance on delivery
» My Index Strategy
» Fact Table Indexing Strategy
» Updating records in a fact table
» Dimension with million of records - Performance on delivery
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum