Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Index Strategy on FACT Table with 300 Million records

+2
hang
dwman
6 posters

Go down

Index Strategy on FACT Table with 300 Million records Empty Index Strategy on FACT Table with 300 Million records

Post  dwman Mon Nov 08, 2010 5:08 am

I have a FACT table with 30 Foreign Keys and 4 Alternate Keys. This table will have 300 million records and is going to be partitioned based on Date key.

Would like to some advise on Index strategy. Thinking of,

1.Create single column all FK indexes

2.Not sure about the primary key ? - One options is create to create a composite clustered index on the Alternate keys or create a surrogate Key ?

DWMan

dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  hang Mon Nov 08, 2010 8:38 pm

30 dimension FKs in a single fact should ring alarm bell as it sounds like a centipede fact. Kimball’s rule of thumb is that less than 15 dimension fact covers most cases, 25 dimensions would be excessive for a single fact table. For design and performance purpose, you may need to review the model and try to combine correlated dimensions or break down the fact into reasonable measure groups.

As far as performance goes, you don't need to create indexes for any FKs with low selectivity, as indexes on such fields will not improve performance but rather a waste of resources.

The combination of the dimension FKs, including date key, is the primary key of your fact. You may partition your fact based on the date key if it is part of primary key regardless of clustered or not.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  ngalemmo Tue Nov 09, 2010 11:41 pm

What database system?

Why do you have alternate keys?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Index Strategy on FACT Table with 300 Million records

Post  dwman Sun Nov 28, 2010 10:04 pm

Platform - SQL-Server 2008.

Sorry No alternate keys.

it was a 3 column composite Primary key.

dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon Mon Nov 29, 2010 12:26 am

Forget a primary key - it doesn't add value, and takes up unnecessary column width.

First of all, as was suggested above, look at the number of foreign keys. You may want to combine some of those dimensions.

As far as indexing, create a clustered index on the most often user date key (you may only have one). Then put nonclustered indexes on the most often used foreign keys - e.g. product and customer. Your indexing strategy should be based on your query usage - so run a trace on the common queries used, and check any indexes you create against those queries to ensure you've created adequate and appropriate indexes.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  Jeff Smith Mon Nov 29, 2010 10:56 am

Keep in mind that you can create nonclustered indexes on combinations of fields. If columns 1, 2, and 3 are frequently queried together, then create a nonclustered index for columns 1, 2, and 3. But remember, that if only columns 2 and 3 are in the query, the index won't be used.

The best way is to start minimally. Watch for performance issues and apply indexes targetted at the performance issues. You may find that you need a summary table to improve performance.

Be careful when testing the performance gains from indexing. Depending on the type of disk, you might be fooled into thinking that you're seeing a performance gain from the index when in fact the data used in the test is still in your disk's memory. We're using a EMC clariion SAN. Whenever it reads data from the disk, it holds it in it's own memory (seperate from the server's memory).

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Index Strategy on FACT Table with 300 Million records

Post  dwman Mon Nov 29, 2010 5:28 pm

Are you suggesting no Primary Key on the FACT table ?

I don't have any test data to run a Trace to find out the potential candidates for the Non-clustered indexes ? So probably I need to start with minimum Indexes

dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  hang Mon Nov 29, 2010 7:16 pm

With the size of 300 million and growing, indexing alone is not enough to achieve reasonable performance. You need to seriously consider partitioning the table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Index Strategy on FACT Table with 300 Million records

Post  dwman Mon Nov 29, 2010 8:54 pm

yes, I have already designed for a monthly partition which will result in 5- 10 million records in a partition file group.

I am mainly concerned about the 30 Foreign keys making Centipede fact. Probably too late to change the FACT design to bring down the number of dimensions. Any other thoughts put a effective indexes ?

dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon Mon Nov 29, 2010 9:09 pm

Partitions only help with loading. They don't seem to do a lot for query performance.

The question you need to ask is: How will the data be used? If you have several reports, then as I said, run some traces on the queries that will use the fact table to get an understanding of which columns to index.

You can probably make some assumptions for the reports based on the joins and where clauses. But you need to test.
As I said above, start off with a clustered index on the date, and non-clustered indexes on the most commonly used foreign keys. Keep it minimal and then test against your reports.


John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  BoxesAndLines Mon Nov 29, 2010 11:35 pm

John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.

Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon Tue Nov 30, 2010 1:37 am

BoxesAndLines wrote:
John Simon wrote:Partitions only help with loading. They don't seem to do a lot for query performance.

Sure they do. Partition pruning is highly beneficial on reads. This is especially true on snapshot fact tables.

Sometimes they do, depending on the query.
Check this blog:
http://blogs.mssqltips.com/blogs/chadboyd/archive/2008/03/19/partitioning-data-for-query-performance-where-s-the-benefit.aspx

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty SQL Table Partition with Partition Key not part of primary Key

Post  dwman Wed Dec 01, 2010 12:30 am

thanks folks for your advise & thoughts.

One more question, Have any of you have tried the following situation.

1. Partition key is not part of primary key.

2. But a clustered index will be created on the partition key

3. Primary key be implemented as non-clustered Index

Any pros, cons & issues would be great.


dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  John Simon Wed Dec 01, 2010 12:55 am

Check out this post by the SQL Server Customer Advisory Team for best practices on Data Warehouses.

http://sqlcat.com/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  ngalemmo Thu Dec 02, 2010 12:25 am

dwman wrote:One more question, Have any of you have tried the following situation.

1. Partition key is not part of primary key.

2. But a clustered index will be created on the partition key

3. Primary key be implemented as non-clustered Index

Any pros, cons & issues would be great.


Why do you have a primary key declared on the fact table?

Anyway, the partitioning rules are independent of the primary key. Use whatever makes sense for paritioning criteria. The tips John references are well worth reading.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty SQL Table Partition with Partition Key not part of primary Key

Post  dwman Thu Dec 02, 2010 12:47 am

This is not for a FACT table. I am considering Partition for a Large Log table. By taking Partition approach the old data can be deleted from the table without running the expensive delete command.
One of the Rule when you implement Partition is "Partition Key should be part of Primary Key or Clustered Index"
While I have done the partition using Primary Key approach, I haven' tried the second option.
So, looking for other people's views

dwman

Posts : 7
Join date : 2010-11-08

Back to top Go down

Index Strategy on FACT Table with 300 Million records Empty Re: Index Strategy on FACT Table with 300 Million records

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum