index
+4
LAndrews
dipakpaudel
ngalemmo
dellsters
8 posters
Page 1 of 1
index
For a junk dimension, do you create individual indexes on columns or one index with multiple columns? How do you determine which way to go with?
Also in the fact table, do you create one index with all the foreign keys or each individually?
Also in the fact table, do you create one index with all the foreign keys or each individually?
dellsters- Posts : 39
Join date : 2009-02-11
Re: index
For fact tables, create an index for each foreign key. In an unstructured ad-hoc query environment, compound keys are, for the most part, useless. If you are using Oracle, enable star schema optimization and use bit map indexes for the facts.
For dimensions you should have a PK index on the surrogate key and an index for the natural key (although the latter is optional if you are using a ETL tool and intend to cache dimension lookups). Individual attribute indexes are optional.
For dimensions you should have a PK index on the surrogate key and an index for the natural key (although the latter is optional if you are using a ETL tool and intend to cache dimension lookups). Individual attribute indexes are optional.
Re: index
ngalemmo wrote:For fact tables, create an index for each foreign key. In an unstructured ad-hoc query environment, compound keys are, for the most part, useless. If you are using Oracle, enable star schema optimization and use bit map indexes for the facts.
For dimensions you should have a PK index on the surrogate key and an index for the natural key (although the latter is optional if you are using a ETL tool and intend to cache dimension lookups). Individual attribute indexes are optional.
Hello,
I am a BI administrator/developer. One of my report is taking more than 2 hours to run. It is using Time table, Product table, and Fact table, and Customer table. Fact table has 30 million rows, product table has about 500 rows and customer table has about 950000 rows.I saw that my DBA created index on all columns(about 12) on Customer table. this is kind of shocking to me because this is not what I have ever seen before. Do you think report is taking so long because of wrong indexing on Customer table? I don't understand why we need index on all columns from Customer table when it is joind to Fact table using only ProductID?? Thanks a lot for your suggestion.
Dipak
dipakpaudel- Posts : 4
Join date : 2012-05-25
Re: index
***Customer table joined using CustomerID, not ProductID.
dipakpaudel- Posts : 4
Join date : 2012-05-25
Re: index
As mentioned earlier, the composite index probably isn't helping much, particularly for the join.
There are lots of factors that could be impacting the report performance, but here's some to look at (Assuming an Oracle DB)
1. Indexes on the dimensions and facts - correct columns? Correct types?(e.g. Bitmap on the surrogate keys and facts) Not only for the join columns, but also consider the columns used for the "where" clause on your report
2. Database statistics - invalid/stale statistics can impact the performance
3. Star-Transformation should be enabled
4. Report SQL - get the generated SQL from your BI tool and sit down with the DBA's to assess options.... an example you may need to look at partitioning the fact and customer tables
There are lots of factors that could be impacting the report performance, but here's some to look at (Assuming an Oracle DB)
1. Indexes on the dimensions and facts - correct columns? Correct types?(e.g. Bitmap on the surrogate keys and facts) Not only for the join columns, but also consider the columns used for the "where" clause on your report
2. Database statistics - invalid/stale statistics can impact the performance
3. Star-Transformation should be enabled
4. Report SQL - get the generated SQL from your BI tool and sit down with the DBA's to assess options.... an example you may need to look at partitioning the fact and customer tables
LAndrews- Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada
Re: index
It is Oracle DB. Index on Dimension tabe is I am more concerned about. All indexes on attribute colums are 'Bitmap' and index in Pk are both Bitmap and Normal. Customer dimension has over 900000 rows which are probably causing trouble. When I filter report by just 20-30 customers, report runs in less than 30 seconds but when I run report with entire customers, it takes forever.
Report requirement is to find all customer which is using only 1 product in last 12 months.
So, I am counting (Distinct Product for the customer, and a month) and then if count is more than 1 then filter it OUT on next tabular model(Cognos report Studio). While doing this count and filtering, I am guessing it is doing full-table scan.
Thanks for your help. I will generate SQL from REPORT and sit with DBA to look at stat and let you all know.
Dipak
Report requirement is to find all customer which is using only 1 product in last 12 months.
So, I am counting (Distinct Product for the customer, and a month) and then if count is more than 1 then filter it OUT on next tabular model(Cognos report Studio). While doing this count and filtering, I am guessing it is doing full-table scan.
Thanks for your help. I will generate SQL from REPORT and sit with DBA to look at stat and let you all know.
Dipak
dipakpaudel- Posts : 4
Join date : 2012-05-25
Re: index
A bitmap index on a PK is useless. All PK values are unique, a bitmap only functions well with non-unique values, and the fewer the better.
As far as run times go, they seem pretty long. How often to you rebuild the bitmap indexes? They have a tendancy to bloat as tables are updated. Assuming you are partitioning the tables, are the indexes local or global? They should be local indexes.
As far as run times go, they seem pretty long. How often to you rebuild the bitmap indexes? They have a tendancy to bloat as tables are updated. Assuming you are partitioning the tables, are the indexes local or global? They should be local indexes.
Re: index
LAndrews is on the right track here. There's no way for anyone to tune your query without looking at the explain plan and underlying table structure. That's what you and your DBA should be doing as well.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: index
Hi Dipak,
I haven't worked with Cognos for a while, now but this scenario is so so familiar ...
The Cognos engine typically resolves reports using Cognos-SQL functions (e.g. Count Distinct ... For ... ) by pulling all the required rows out of the RDBMS into cache / temp files and then reprocessing those within the Cognos engine to work out the result. As your filter depends on a Cognos-SQL function it will probably be pulling out the entire fact table. This is perhaps why your report sort of runs for a very limited set of data (I'd say 30 secs for 30 customers is still very slow - a big red flag), but totally blows up when you try to scale it.
I suggest you write a pure Oracle SQL query that will return the results needed for your report, then Edit your report query to use that SQL, and ignore the generated SQL. The Oracle SQL may be complex to write, but the performance should be much better, and it can leverage whatever your index strategy ends up being.
In my experience this is the typical shortcoming of all the SQL-generating tools (Cognos, BO, Microstrategy etc) - once you get into the complexity of "real world" reporting requirements they dont scale.
Good luck!
Mike
I haven't worked with Cognos for a while, now but this scenario is so so familiar ...
The Cognos engine typically resolves reports using Cognos-SQL functions (e.g. Count Distinct ... For ... ) by pulling all the required rows out of the RDBMS into cache / temp files and then reprocessing those within the Cognos engine to work out the result. As your filter depends on a Cognos-SQL function it will probably be pulling out the entire fact table. This is perhaps why your report sort of runs for a very limited set of data (I'd say 30 secs for 30 customers is still very slow - a big red flag), but totally blows up when you try to scale it.
I suggest you write a pure Oracle SQL query that will return the results needed for your report, then Edit your report query to use that SQL, and ignore the generated SQL. The Oracle SQL may be complex to write, but the performance should be much better, and it can leverage whatever your index strategy ends up being.
In my experience this is the typical shortcoming of all the SQL-generating tools (Cognos, BO, Microstrategy etc) - once you get into the complexity of "real world" reporting requirements they dont scale.
Good luck!
Mike
Re: index
I don't see how indexes will help this query.
I think that since your query comes back quickly when you filter for 20-30 customers that suggests your indexes are working well. It's the counting distinct on everything and then applying the filter on the results that is taking time.
I didn't see a record count for the fact table but since the customer dimension has 9,000,000, I'm assuming the fact table is much, much bigger. The query is counting distinct for at least 10s of millions of records by 9 million records. That in itself is huge. Then it's doing a filter on the results. This is using Cognos Report Studio so it isn't the same thing as writing 1 sql statement with "Having distinct product count = 1". Report studio has it's own way of doing it.
I would do it outside of Cognos and if that isn't an option, I would split it into 2 querys. The first one counting distinct. The second query uses the first query as the query subject and filters on distinct product count = 1.
I think that since your query comes back quickly when you filter for 20-30 customers that suggests your indexes are working well. It's the counting distinct on everything and then applying the filter on the results that is taking time.
I didn't see a record count for the fact table but since the customer dimension has 9,000,000, I'm assuming the fact table is much, much bigger. The query is counting distinct for at least 10s of millions of records by 9 million records. That in itself is huge. Then it's doing a filter on the results. This is using Cognos Report Studio so it isn't the same thing as writing 1 sql statement with "Having distinct product count = 1". Report studio has it's own way of doing it.
I would do it outside of Cognos and if that isn't an option, I would split it into 2 querys. The first one counting distinct. The second query uses the first query as the query subject and filters on distinct product count = 1.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: index
Thank you all for your reply.
Cognos Generated SQL was taking too long so we built a MV based on this critirea and wrote reports against it. Since there were 10-12 reports using this MV, I was able to convince my DBA to create a MV. Now all reports runs in less than 3-4 minutes. I also suggested him to remove bitmap index on PKs and he agreed.
Cheers!
Dipak
Cognos Generated SQL was taking too long so we built a MV based on this critirea and wrote reports against it. Since there were 10-12 reports using this MV, I was able to convince my DBA to create a MV. Now all reports runs in less than 3-4 minutes. I also suggested him to remove bitmap index on PKs and he agreed.
Cheers!
Dipak
dipakpaudel- Posts : 4
Join date : 2012-05-25
Re: index
For dimensions you should have a PK index on the surrogate key and an index for the natural key although the latter is optional if you are using a ETL tool and intend to cache dimension lookups. Individual attribute indexes are optional.
bela374- Posts : 1
Join date : 2013-06-22
Similar topics
» ETL Load - Dropping Indexes and Constraints
» My Index Strategy
» Primary Key of Fact Table
» Index Strategy on FACT Table with 300 Million records
» Is a Master Person Index (MPI) in conflict with the concepts of a dimensional structure?
» My Index Strategy
» Primary Key of Fact Table
» Index Strategy on FACT Table with 300 Million records
» Is a Master Person Index (MPI) in conflict with the concepts of a dimensional structure?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum