How to write SQL query to flatten the hierarchy
3 posters
Page 1 of 1
How to write SQL query to flatten the hierarchy
Hi,
I have person table which has recursive hierarchy and I wish to flatten it upto 5 levels.
I need to do this by using SQL query and I would be using it in SAP Business Objects Universe.
I thought users of this forum might help me in writing such a query.
I am using Oracle10g and I have written following query to flatten the hierarchy
SELECT
ID,
level lvl,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 1) AS level_1,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 2) AS level_2,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 3) AS level_3,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 4) AS level_4,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 5) AS level_5
FROM cmt_person
CONNECT BY manager_id = PRIOR id
and level<=5
Here I am getting the correct output but this query is taking a lot of time to run.
I wrote a new query without using regular expressions and connect by and here is the code:
SELECT
cmt_person.id ,
cmt_person.fname as mgr1,
case when cmt_person2.manager_id=cmt_person.id then (cmt_person2.fname ) end as mgr2,
case when cmt_person3.manager_id=cmt_person2.id then (cmt_person3.fname ) end as mgr3,
case when cmt_person4.manager_id=cmt_person3.id then cmt_person4.fname end as mgr4,
case when cmt_person5.manager_id=cmt_person4.id then cmt_person5.fname end as mgr5
FROM
cmt_person,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4,
cmt_person cmt_person5
WHERE
cmt_person2.manager_id(+)=cmt_person.id and
cmt_person3.manager_id(+)=cmt_person2.id and
cmt_person4.manager_id(+)=cmt_person3.id and
cmt_person5.manager_id(+)=cmt_person4.id
And I got following output:
emplo000000000200100 Bobby Khasha Rahul Rajesh
emplo000000000200099 Bobby Khasha Rahul Ajay
emplo000000000200101 Bobby Khasha Rahul Swati
emplo000000000200320 Bobby Khasha Rahul Jinesh
emplo000000000201231 Bobby Khasha Test1
emplo000000000201230 Bobby Khasha User1
emplo000000000201227 Bobby Khasha User2
emplo000000000200104 Bobby Khasha Yitzik Natalia
emplo000000000200103 Bobby Khasha Yitzik Andrew
total 9 rows
But this is not correct output.
Here is the expected output:
emplo000000000200097 Bobby
emplo000000000200087 Bobby Khasha
emplo000000000200102 Bobby Khasha Yitzik
emplo000000000200103 Bobby Khasha Yitzik Andrew
emplo000000000200104 Bobby Khasha Yitzik Natalia
emplo000000000201227 Bobby Khasha User2
emplo000000000201231 Bobby Khasha Test1
emplo000000000201230 Bobby Khasha User1
emplo000000000200098 Bobby Khasha Rahul
emplo000000000200099 Bobby Khasha Rahul Ajay
emplo000000000200100 Bobby Khasha Rahul Rajesh
emplo000000000200320 Bobby Khasha Rahul Jinesh
emplo000000000200101 Bobby Khasha Rahul Swati
total 13 rows
Do you know what I should do to get this output.
Thanks,
Raghvendra
I have person table which has recursive hierarchy and I wish to flatten it upto 5 levels.
I need to do this by using SQL query and I would be using it in SAP Business Objects Universe.
I thought users of this forum might help me in writing such a query.
I am using Oracle10g and I have written following query to flatten the hierarchy
SELECT
ID,
level lvl,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 1) AS level_1,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 2) AS level_2,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 3) AS level_3,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 4) AS level_4,
REGEXP_SUBSTR (SYS_CONNECT_BY_PATH (fname||' '||lname, '/'), '[^/]+', 1, 5) AS level_5
FROM cmt_person
CONNECT BY manager_id = PRIOR id
and level<=5
Here I am getting the correct output but this query is taking a lot of time to run.
I wrote a new query without using regular expressions and connect by and here is the code:
SELECT
cmt_person.id ,
cmt_person.fname as mgr1,
case when cmt_person2.manager_id=cmt_person.id then (cmt_person2.fname ) end as mgr2,
case when cmt_person3.manager_id=cmt_person2.id then (cmt_person3.fname ) end as mgr3,
case when cmt_person4.manager_id=cmt_person3.id then cmt_person4.fname end as mgr4,
case when cmt_person5.manager_id=cmt_person4.id then cmt_person5.fname end as mgr5
FROM
cmt_person,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4,
cmt_person cmt_person5
WHERE
cmt_person2.manager_id(+)=cmt_person.id and
cmt_person3.manager_id(+)=cmt_person2.id and
cmt_person4.manager_id(+)=cmt_person3.id and
cmt_person5.manager_id(+)=cmt_person4.id
And I got following output:
emplo000000000200100 Bobby Khasha Rahul Rajesh
emplo000000000200099 Bobby Khasha Rahul Ajay
emplo000000000200101 Bobby Khasha Rahul Swati
emplo000000000200320 Bobby Khasha Rahul Jinesh
emplo000000000201231 Bobby Khasha Test1
emplo000000000201230 Bobby Khasha User1
emplo000000000201227 Bobby Khasha User2
emplo000000000200104 Bobby Khasha Yitzik Natalia
emplo000000000200103 Bobby Khasha Yitzik Andrew
total 9 rows
But this is not correct output.
Here is the expected output:
emplo000000000200097 Bobby
emplo000000000200087 Bobby Khasha
emplo000000000200102 Bobby Khasha Yitzik
emplo000000000200103 Bobby Khasha Yitzik Andrew
emplo000000000200104 Bobby Khasha Yitzik Natalia
emplo000000000201227 Bobby Khasha User2
emplo000000000201231 Bobby Khasha Test1
emplo000000000201230 Bobby Khasha User1
emplo000000000200098 Bobby Khasha Rahul
emplo000000000200099 Bobby Khasha Rahul Ajay
emplo000000000200100 Bobby Khasha Rahul Rajesh
emplo000000000200320 Bobby Khasha Rahul Jinesh
emplo000000000200101 Bobby Khasha Rahul Swati
total 13 rows
Do you know what I should do to get this output.
Thanks,
Raghvendra
Raghvendra Deshpande- Posts : 4
Join date : 2012-01-05
Re: How to write SQL query to flatten the hierarchy
1) add a Materialized View for Connect By query — a table storing this query results will be recalculated automagically on data refresh and will be used to answer your query automagically
2) create a proper hierarchy table with required number of levels and refresh it in ETL
There's an upper limit in where tuning recursive queries will bring you, so creating a table to store this hierarchy is a good idea. Check that ID field has an index anyway.
2) create a proper hierarchy table with required number of levels and refresh it in ETL
There's an upper limit in where tuning recursive queries will bring you, so creating a table to store this hierarchy is a good idea. Check that ID field has an index anyway.
Re: How to write SQL query to flatten the hierarchy
Thanks for your reply.
Actually I intend to use this query in SAP Business Objects semantic layer. There I would use this query to flatten the hierarchy "on the fly". We have OLTP schema, no ETL tool and we are directly reporting on top of OLTP using SAP Business Objects tools.
Hence we can't create materialized views or any table in database. I need to do whatever I want in semantic layer or at report level only.
Anyway, I got a query which serves this purpose and doesn't use connect by. Here is the query:
SELECT /* 5 levels */
coalesce(cmt_person5.id ,cmt_person4.id,cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
cmt_person4.fname as mgr4,
cmt_person5.fname as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4,
cmt_person cmt_person5
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id and
cmt_person4.manager_id=cmt_person3.id and
cmt_person5.manager_id=cmt_person4.id
UNION /* Results in distinct values */
SELECT /* 4 levels */
coalesce(cmt_person4.id,cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
cmt_person4.fname as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id and
cmt_person4.manager_id=cmt_person3.id
UNION /* Results in distinct values */
SELECT /* 3 levels */
coalesce(cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id
UNION /* Results in distinct values */
SELECT /* 2 levels */
coalesce(cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
to_char(NULL) as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2
WHERE
cmt_person2.manager_id=cmt_person1.id
UNION /* Results in distinct values */
SELECT /* 1 level (managers only) */
cmt_person1.id as id,
cmt_person1.fname as mgr1,
to_char(NULL) as mgr2,
to_char(NULL) as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2
WHERE
cmt_person2.manager_id=cmt_person1.id
I tested this query on a table having just few thousand records and the query is running quick.
Need to know what would happen if there are more than a million records to flatten the hierarchy.
Our management would consider creating materialized views or tables only if there are serious performance problems.
Any comments/suggestions on this query are most welcome.
Thanks,
Raghvendra Deshpande
Actually I intend to use this query in SAP Business Objects semantic layer. There I would use this query to flatten the hierarchy "on the fly". We have OLTP schema, no ETL tool and we are directly reporting on top of OLTP using SAP Business Objects tools.
Hence we can't create materialized views or any table in database. I need to do whatever I want in semantic layer or at report level only.
Anyway, I got a query which serves this purpose and doesn't use connect by. Here is the query:
SELECT /* 5 levels */
coalesce(cmt_person5.id ,cmt_person4.id,cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
cmt_person4.fname as mgr4,
cmt_person5.fname as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4,
cmt_person cmt_person5
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id and
cmt_person4.manager_id=cmt_person3.id and
cmt_person5.manager_id=cmt_person4.id
UNION /* Results in distinct values */
SELECT /* 4 levels */
coalesce(cmt_person4.id,cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
cmt_person4.fname as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3,
cmt_person cmt_person4
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id and
cmt_person4.manager_id=cmt_person3.id
UNION /* Results in distinct values */
SELECT /* 3 levels */
coalesce(cmt_person3.id,cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
cmt_person3.fname as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2,
cmt_person cmt_person3
WHERE
cmt_person2.manager_id=cmt_person1.id and
cmt_person3.manager_id=cmt_person2.id
UNION /* Results in distinct values */
SELECT /* 2 levels */
coalesce(cmt_person2.id,cmt_person1.id) as id,
cmt_person1.fname as mgr1,
cmt_person2.fname as mgr2,
to_char(NULL) as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2
WHERE
cmt_person2.manager_id=cmt_person1.id
UNION /* Results in distinct values */
SELECT /* 1 level (managers only) */
cmt_person1.id as id,
cmt_person1.fname as mgr1,
to_char(NULL) as mgr2,
to_char(NULL) as mgr3,
to_char(NULL) as mgr4,
to_char(NULL) as mgr5
FROM
cmt_person cmt_person1,
cmt_person cmt_person2
WHERE
cmt_person2.manager_id=cmt_person1.id
I tested this query on a table having just few thousand records and the query is running quick.
Need to know what would happen if there are more than a million records to flatten the hierarchy.
Our management would consider creating materialized views or tables only if there are serious performance problems.
Any comments/suggestions on this query are most welcome.
Thanks,
Raghvendra Deshpande
Raghvendra Deshpande- Posts : 4
Join date : 2012-01-05
Persist the data
I concur with ykud...
If performance is an issue, you should persist the output of the query in a materialized view or an actual table. Then reference that table in your universe. A once a day refresh should suffice, unless your hierarchy is very dynamic. If so, you can refresh intraday as well.
If the issue is that you can not put the table/mat. view alongside the OLTP tables (i.e. you will violate some type of license agreement), use another database schema, one specifically designed for these types of scenarios.
Otherwise, your users are going to have to suffer through the performance problems.
Indexing may help, but sounds like your DB is locked down where new indexes are also not allowed...
If performance is an issue, you should persist the output of the query in a materialized view or an actual table. Then reference that table in your universe. A once a day refresh should suffice, unless your hierarchy is very dynamic. If so, you can refresh intraday as well.
If the issue is that you can not put the table/mat. view alongside the OLTP tables (i.e. you will violate some type of license agreement), use another database schema, one specifically designed for these types of scenarios.
Otherwise, your users are going to have to suffer through the performance problems.
Indexing may help, but sounds like your DB is locked down where new indexes are also not allowed...
elmorejr- Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol
Re: How to write SQL query to flatten the hierarchy
Raghvendra Deshpande wrote:
I tested this query on a table having just few thousand records and the query is running quick.
Need to know what would happen if there are more than a million records to flatten the hierarchy.
Our management would consider creating materialized views or tables only if there are serious performance problems.
Five self-joins of a million rows table won't be “quick”, unless you have a totally massive hardware. Persisting this query is really the only sensible way to go.
And you should use union all instead of union, since you're implying a very performance costy distinct select (usually requires full sort & merge) over all results by using “union”.
Re: How to write SQL query to flatten the hierarchy
Thanks for your suggestions.
Let me give you more idea about how we are going to use these queries. My company has a product and we will ship SAP Business Objects universe and reports along with our product. We have more than 1500 clients. So every client has a database on top of which the SAP Business Objects reports are created. So if we decide to add a table or materialized view, all our existing customers will have to do the same and they will have to have ETL to load these tables. May be our product needs to be modified to accommodate this change. And we are trying not to do any changes in database and just trying to report off the existing normalized database.
Till now we used to ship Crystal Reports(which is not ad hoc reporting tool) along with our product. But now we are moving towards ad hoc SAP Web Intelligence tool. And the expectations from ad hoc reporting are very high. It should be simple to use for end business users, should support drill down(on normal and recursive hierarchies) and performance should be good.
In absence of dimensional model, we are finding it hard to achieve our goals particularly performance related issues and simplicity of creating reports. I will definitely try to make our management think about creating tables or materialized views in order to improve the performance.
Please share your views on sticking to normalized schema with over 1000 tables in database. Should we go for dimensional modeling?
Thanks,
Raghvendra
Let me give you more idea about how we are going to use these queries. My company has a product and we will ship SAP Business Objects universe and reports along with our product. We have more than 1500 clients. So every client has a database on top of which the SAP Business Objects reports are created. So if we decide to add a table or materialized view, all our existing customers will have to do the same and they will have to have ETL to load these tables. May be our product needs to be modified to accommodate this change. And we are trying not to do any changes in database and just trying to report off the existing normalized database.
Till now we used to ship Crystal Reports(which is not ad hoc reporting tool) along with our product. But now we are moving towards ad hoc SAP Web Intelligence tool. And the expectations from ad hoc reporting are very high. It should be simple to use for end business users, should support drill down(on normal and recursive hierarchies) and performance should be good.
In absence of dimensional model, we are finding it hard to achieve our goals particularly performance related issues and simplicity of creating reports. I will definitely try to make our management think about creating tables or materialized views in order to improve the performance.
Please share your views on sticking to normalized schema with over 1000 tables in database. Should we go for dimensional modeling?
Thanks,
Raghvendra
Raghvendra Deshpande- Posts : 4
Join date : 2012-01-05
Re: How to write SQL query to flatten the hierarchy
Raghvendra Deshpande wrote:
In absence of dimensional model, we are finding it hard to achieve our goals particularly performance related issues and simplicity of creating reports. I will definitely try to make our management think about creating tables or materialized views in order to improve the performance.
Please share your views on sticking to normalized schema with over 1000 tables in database. Should we go for dimensional modeling?
If you're shipping some kind of OLTP system and want to give user ad-hoc reporting capabilities on the same normalised schema, chances are that you're doing them more of a bad thing. Unless data volumes are really small (which is unlikely given your quote of millions of employees), these ad-hoc query will severely diminish OLTP performance of the system. Reporting over normalised schemas should be supported by a number of specifically created indexes, materialised views and automatically updated end-of-period balance tables. Without all that — it'll be hundreds of joins and full-scans all around.
Since you're the owners of the product development, you can create a reporting package with a separate dimensional model and all the procedures required to fill the data. If you're on Oracle, you can set up CDC integration between these schemes to allow “near-realtime” reporting.
Re: How to write SQL query to flatten the hierarchy
Thanks ykud. It really makes sense to go for dimensional model which will improve not only performance but will be easy from design standpoint to end ad hoc business users.
Thanks,
Raghvendra
Thanks,
Raghvendra
Raghvendra Deshpande- Posts : 4
Join date : 2012-01-05
Similar topics
» How can I write an MDX statement or query that selects the clients who paid last month but have not paid this(Current Month) month?
» Hierarchies: Flatten vs. Bridge
» Design Query on Dimensions
» Single Attribute Dimensions - why not instead write textual attribute into Fact table?
» Query on Dimensions
» Hierarchies: Flatten vs. Bridge
» Design Query on Dimensions
» Single Attribute Dimensions - why not instead write textual attribute into Fact table?
» Query on Dimensions
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum