Index Strategy on FACT Table with 300 Million records

Page 1 of 2 1, 2  Next

View previous topic View next topic Go down

Index Strategy on FACT Table with 300 Million records

Post  dwman on Mon Nov 08, 2010 1:08 am

I have a FACT table with 30 Foreign Keys and 4 Alternate Keys. This table will have 300 million records and is going to be partitioned based on Date key.

Would like to some advise on Index strategy. Thinking of,

1.Create single column all FK indexes

2.Not sure about the primary key ? - One options is create to create a composite clustered index on the Alternate keys or create a surrogate Key ?

DWMan

dwman

Posts: 7
Join date: 2010-11-08

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  hang on Mon Nov 08, 2010 4:38 pm

30 dimension FKs in a single fact should ring alarm bell as it sounds like a centipede fact. Kimball’s rule of thumb is that less than 15 dimension fact covers most cases, 25 dimensions would be excessive for a single fact table. For design and performance purpose, you may need to review the model and try to combine correlated dimensions or break down the fact into reasonable measure groups.

As far as performance goes, you don't need to create indexes for any FKs with low selectivity, as indexes on such fields will not improve performance but rather a waste of resources.

The combination of the dimension FKs, including date key, is the primary key of your fact. You may partition your fact based on the date key if it is part of primary key regardless of clustered or not.

hang

Posts: 453
Join date: 2010-05-07
Location: Brisbane, Australia

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  ngalemmo on Tue Nov 09, 2010 7:41 pm

What database system?

Why do you have alternate keys?

ngalemmo

Posts: 1729
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records

Post  dwman on Sun Nov 28, 2010 6:04 pm

Platform - SQL-Server 2008.

Sorry No alternate keys.

it was a 3 column composite Primary key.

dwman

Posts: 7
Join date: 2010-11-08

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon on Sun Nov 28, 2010 8:26 pm

Forget a primary key - it doesn't add value, and takes up unnecessary column width.

First of all, as was suggested above, look at the number of foreign keys. You may want to combine some of those dimensions.

As far as indexing, create a clustered index on the most often user date key (you may only have one). Then put nonclustered indexes on the most often used foreign keys - e.g. product and customer. Your indexing strategy should be based on your query usage - so run a trace on the common queries used, and check any indexes you create against those queries to ensure you've created adequate and appropriate indexes.

John Simon

Posts: 170
Join date: 2009-05-28
Location: Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  Jeff Smith on Mon Nov 29, 2010 6:56 am

Keep in mind that you can create nonclustered indexes on combinations of fields. If columns 1, 2, and 3 are frequently queried together, then create a nonclustered index for columns 1, 2, and 3. But remember, that if only columns 2 and 3 are in the query, the index won't be used.

The best way is to start minimally. Watch for performance issues and apply indexes targetted at the performance issues. You may find that you need a summary table to improve performance.

Be careful when testing the performance gains from indexing. Depending on the type of disk, you might be fooled into thinking that you're seeing a performance gain from the index when in fact the data used in the test is still in your disk's memory. We're using a EMC clariion SAN. Whenever it reads data from the disk, it holds it in it's own memory (seperate from the server's memory).

Jeff Smith

Posts: 311
Join date: 2009-02-03

View user profile

Back to top Go down

Index Strategy on FACT Table with 300 Million records

Post  dwman on Mon Nov 29, 2010 1:28 pm

Are you suggesting no Primary Key on the FACT table ?

I don't have any test data to run a Trace to find out the potential candidates for the Non-clustered indexes ? So probably I need to start with minimum Indexes

dwman

Posts: 7
Join date: 2010-11-08

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  hang on Mon Nov 29, 2010 3:16 pm

With the size of 300 million and growing, indexing alone is not enough to achieve reasonable performance. You need to seriously consider partitioning the table.

hang

Posts: 453
Join date: 2010-05-07
Location: Brisbane, Australia

View user profile

Back to top Go down

Index Strategy on FACT Table with 300 Million records

Post  dwman on Mon Nov 29, 2010 4:54 pm

yes, I have already designed for a monthly partition which will result in 5- 10 million records in a partition file group.

I am mainly concerned about the 30 Foreign keys making Centipede fact. Probably too late to change the FACT design to bring down the number of dimensions. Any other thoughts put a effective indexes ?

dwman

Posts: 7
Join date: 2010-11-08

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon on Mon Nov 29, 2010 5:09 pm

Partitions only help with loading. They don't seem to do a lot for query performance.

The question you need to ask is: How will the data be used? If you have several reports, then as I said, run some traces on the queries that will use the fact table to get an understanding of which columns to index.

You can probably make some assumptions for the reports based on the joins and where clauses. But you need to test.
As I said above, start off with a clustered index on the date, and non-clustered indexes on the most commonly used foreign keys. Keep it minimal and then test against your reports.


John Simon

Posts: 170
Join date: 2009-05-28
Location: Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  BoxesAndLines on Mon Nov 29, 2010 7:35 pm

John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.


Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.

BoxesAndLines

Posts: 621
Join date: 2009-02-03
Location: USA

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon on Mon Nov 29, 2010 9:37 pm

BoxesAndLines wrote:
John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.


Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.


Sometimes they do, depending on the query.
Check this blog:
http://blogs.mssqltips.com/blogs/chadboyd/archive/2008/03/19/partitioning-data-for-query-performance-where-s-the-benefit.aspx

John Simon

Posts: 170
Join date: 2009-05-28
Location: Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

SQL Table Partition with Partition Key not part of primary Key

Post  dwman on Tue Nov 30, 2010 8:30 pm

thanks folks for your advise & thoughts.

One more question, Have any of you have tried the following situation.

1. Partition key is not part of primary key.

2. But a clustered index will be created on the partition key

3. Primary key be implemented as non-clustered Index

Any pros, cons & issues would be great.


dwman

Posts: 7
Join date: 2010-11-08

View user profile

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon on Tue Nov 30, 2010 8:55 pm

Check out this post by the SQL Server Customer Advisory Team for best practices on Data Warehouses.

http://sqlcat.com/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx

John Simon

Posts: 170
Join date: 2009-05-28
Location: Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Index Strategy on FACT Table with 300 Million records

Post  ngalemmo on Wed Dec 01, 2010 8:25 pm

dwman wrote:One more question, Have any of you have tried the following situation.

1. Partition key is not part of primary key.

2. But a clustered index will be created on the partition key

3. Primary key be implemented as non-clustered Index

Any pros, cons & issues would be great.



Why do you have a primary key declared on the fact table?

Anyway, the partitioning rules are independent of the primary key. Use whatever makes sense for paritioning criteria. The tips John references are well worth reading.

ngalemmo

Posts: 1729
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

Page 1 of 2 1, 2  Next

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum