index

dellsters Thu Jun 17, 2010 11:12 am

For a junk dimension, do you create individual indexes on columns or one index with multiple columns? How do you determine which way to go with?

Also in the fact table, do you create one index with all the foreign keys or each individually?

ngalemmo Mon Jun 21, 2010 11:50 am

For fact tables, create an index for each foreign key. In an unstructured ad-hoc query environment, compound keys are, for the most part, useless. If you are using Oracle, enable star schema optimization and use bit map indexes for the facts.

For dimensions you should have a PK index on the surrogate key and an index for the natural key (although the latter is optional if you are using a ETL tool and intend to cache dimension lookups). Individual attribute indexes are optional.

dipakpaudel Fri May 25, 2012 4:23 pm

ngalemmo wrote:For fact tables, create an index for each foreign key. In an unstructured ad-hoc query environment, compound keys are, for the most part, useless. If you are using Oracle, enable star schema optimization and use bit map indexes for the facts.

For dimensions you should have a PK index on the surrogate key and an index for the natural key (although the latter is optional if you are using a ETL tool and intend to cache dimension lookups). Individual attribute indexes are optional.

Hello,
I am a BI administrator/developer. One of my report is taking more than 2 hours to run. It is using Time table, Product table, and Fact table, and Customer table. Fact table has 30 million rows, product table has about 500 rows and customer table has about 950000 rows.I saw that my DBA created index on all columns(about 12) on Customer table. this is kind of shocking to me because this is not what I have ever seen before. Do you think report is taking so long because of wrong indexing on Customer table? I don't understand why we need index on all columns from Customer table when it is joind to Fact table using only ProductID?? Thanks a lot for your suggestion.

Dipak

dipakpaudel Fri May 25, 2012 4:25 pm

***Customer table joined using CustomerID, not ProductID.

LAndrews Fri May 25, 2012 4:58 pm

As mentioned earlier, the composite index probably isn't helping much, particularly for the join.

There are lots of factors that could be impacting the report performance, but here's some to look at (Assuming an Oracle DB)

1. Indexes on the dimensions and facts - correct columns? Correct types?(e.g. Bitmap on the surrogate keys and facts) Not only for the join columns, but also consider the columns used for the "where" clause on your report

2. Database statistics - invalid/stale statistics can impact the performance

3. Star-Transformation should be enabled

4. Report SQL - get the generated SQL from your BI tool and sit down with the DBA's to assess options.... an example you may need to look at partitioning the fact and customer tables

dipakpaudel Fri May 25, 2012 5:22 pm

It is Oracle DB. Index on Dimension tabe is I am more concerned about. All indexes on attribute colums are 'Bitmap' and index in Pk are both Bitmap and Normal. Customer dimension has over 900000 rows which are probably causing trouble. When I filter report by just 20-30 customers, report runs in less than 30 seconds but when I run report with entire customers, it takes forever.
Report requirement is to find all customer which is using only 1 product in last 12 months.
So, I am counting (Distinct Product for the customer, and a month) and then if count is more than 1 then filter it OUT on next tabular model(Cognos report Studio). While doing this count and filtering, I am guessing it is doing full-table scan.
Thanks for your help. I will generate SQL from REPORT and sit with DBA to look at stat and let you all know.
Dipak

ngalemmo Sat May 26, 2012 6:53 pm

A bitmap index on a PK is useless. All PK values are unique, a bitmap only functions well with non-unique values, and the fewer the better.

As far as run times go, they seem pretty long. How often to you rebuild the bitmap indexes? They have a tendancy to bloat as tables are updated. Assuming you are partitioning the tables, are the indexes local or global? They should be local indexes.

BoxesAndLines Mon May 28, 2012 10:51 am

LAndrews is on the right track here. There's no way for anyone to tune your query without looking at the explain plan and underlying table structure. That's what you and your DBA should be doing as well.

Mike Honey Mon May 28, 2012 7:20 pm

Hi Dipak,

I haven't worked with Cognos for a while, now but this scenario is so so familiar ...

The Cognos engine typically resolves reports using Cognos-SQL functions (e.g. Count Distinct ... For ... ) by pulling all the required rows out of the RDBMS into cache / temp files and then reprocessing those within the Cognos engine to work out the result. As your filter depends on a Cognos-SQL function it will probably be pulling out the entire fact table. This is perhaps why your report sort of runs for a very limited set of data (I'd say 30 secs for 30 customers is still very slow - a big red flag), but totally blows up when you try to scale it.

I suggest you write a pure Oracle SQL query that will return the results needed for your report, then Edit your report query to use that SQL, and ignore the generated SQL. The Oracle SQL may be complex to write, but the performance should be much better, and it can leverage whatever your index strategy ends up being.

In my experience this is the typical shortcoming of all the SQL-generating tools (Cognos, BO, Microstrategy etc) - once you get into the complexity of "real world" reporting requirements they dont scale.

Good luck!
Mike

Jeff Smith Wed May 30, 2012 4:54 pm

I don't see how indexes will help this query.

I think that since your query comes back quickly when you filter for 20-30 customers that suggests your indexes are working well. It's the counting distinct on everything and then applying the filter on the results that is taking time.

I didn't see a record count for the fact table but since the customer dimension has 9,000,000, I'm assuming the fact table is much, much bigger. The query is counting distinct for at least 10s of millions of records by 9 million records. That in itself is huge. Then it's doing a filter on the results. This is using Cognos Report Studio so it isn't the same thing as writing 1 sql statement with "Having distinct product count = 1". Report studio has it's own way of doing it.

I would do it outside of Cognos and if that isn't an option, I would split it into 2 querys. The first one counting distinct. The second query uses the first query as the query subject and filters on distinct product count = 1.

dipakpaudel Thu May 31, 2012 6:47 pm

Thank you all for your reply.

Cognos Generated SQL was taking too long so we built a MV based on this critirea and wrote reports against it. Since there were 10-12 reports using this MV, I was able to convince my DBA to create a MV. Now all reports runs in less than 3-4 minutes. I also suggested him to remove bitmap index on PKs and he agreed.

Cheers!

Dipak

bela374 Sat Jun 22, 2013 1:20 am

For dimensions you should have a PK index on the surrogate key and an index for the natural key although the latter is optional if you are using a ETL tool and intend to cache dimension lookups. Individual attribute indexes are optional.

Sponsored content

index

index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index

Re: index