data puzzle - looking for a query to solve this
3 posters
Page 1 of 1
data puzzle - looking for a query to solve this
i have rows in a table that i need to convert to a type 2 dimension, looking for a query to do this (in bulk, no cursors, looping).
simple thought is to group on value and take min/max of date, but the values can and will repeat.
current data is rows with a date and a value, assume each day has a value. ie
date, value
01/01/15, 10
01/02/15, 10
01/03/15, 10
01/04/15, 10
01/05/15, 12
01/06/15, 12
01/07/15, 12
01/08/15, 09
01/09/15, 10
01/10/15, 10
want to convert this to value and start/end dates, ie.
value, start date, end date
10, 01/01/15, 01/04/15
12, 01/05/15, 01/07/15
09, 01/08/15, 01/08/15
10, 01/09/15, 01/10/15
help me solve this puzzle!
simple thought is to group on value and take min/max of date, but the values can and will repeat.
current data is rows with a date and a value, assume each day has a value. ie
date, value
01/01/15, 10
01/02/15, 10
01/03/15, 10
01/04/15, 10
01/05/15, 12
01/06/15, 12
01/07/15, 12
01/08/15, 09
01/09/15, 10
01/10/15, 10
want to convert this to value and start/end dates, ie.
value, start date, end date
10, 01/01/15, 01/04/15
12, 01/05/15, 01/07/15
09, 01/08/15, 01/08/15
10, 01/09/15, 01/10/15
help me solve this puzzle!
topcat- Posts : 19
Join date : 2012-08-09
Re: data puzzle - looking for a query to solve this
Hi - got a couple of questions:
1. If you are trying to create an SCD2 dimension, and presumably these are the effective start and end dates, then what is the business key of the dimension? The only other field you've given is the value but this can't be the BK as in the example you've given the dates for the same value are not contiguous e.g. for value 10 you've got:
10, 01/01/15, 01/04/15
10, 01/09/15, 01/10/15
whereas the end date of the the 1st row would need to be 1 less that the start date of the 2nd row if this was an SCD2 with the value as the BK
2. Which DB are you using? There may well be solutions to this that use db-specific SQL and, if I can come up with a solution, I don't want to give you a solution that only works with Oracle if you are using SQL Server.
Regards
1. If you are trying to create an SCD2 dimension, and presumably these are the effective start and end dates, then what is the business key of the dimension? The only other field you've given is the value but this can't be the BK as in the example you've given the dates for the same value are not contiguous e.g. for value 10 you've got:
10, 01/01/15, 01/04/15
10, 01/09/15, 01/10/15
whereas the end date of the the 1st row would need to be 1 less that the start date of the 2nd row if this was an SCD2 with the value as the BK
2. Which DB are you using? There may well be solutions to this that use db-specific SQL and, if I can come up with a solution, I don't want to give you a solution that only works with Oracle if you are using SQL Server.
Regards
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: data puzzle - looking for a query to solve this
You can use the LEAD and LAG functions. They are categorized as 'windowing' functions that are in the SQL standard but are not implemented by everybody (or may have a variation). It allows you to look at another row based on a grouping and sequence.
With LEAD you can get the next value to determine if it is the last instance of the current value. With LAG you get the previous value so you can determine the first instance of the current value. You should then be able to figure out your date ranges.
Assuming your table had column D containing the date and column V containing the value of interest, the query would look like this:
SELECT D, V,
LEAD(V) OVER (ORDER BY D) NEXT_V,
LAG(V) OVER (ORDER BY D) PREV_V
FROM…
Then, if you filter out rows where V, NEXT_V and PREV_V are the same, use another LEAD function to combine the begin and end dates into a single row. If you embed the first query you can do it in a single statement.
With LEAD you can get the next value to determine if it is the last instance of the current value. With LAG you get the previous value so you can determine the first instance of the current value. You should then be able to figure out your date ranges.
Assuming your table had column D containing the date and column V containing the value of interest, the query would look like this:
SELECT D, V,
LEAD(V) OVER (ORDER BY D) NEXT_V,
LAG(V) OVER (ORDER BY D) PREV_V
FROM…
Then, if you filter out rows where V, NEXT_V and PREV_V are the same, use another LEAD function to combine the begin and end dates into a single row. If you embed the first query you can do it in a single statement.
Its not really a type 2 dim, but it was easier to explain this way
this is for a retail site where we get price feeds on a daily basis. there are multiple type of prices, for example "standard", "sale", "one day sale", and each of these has a precedence. sale overrides standard, one day sale overrides both sale and standard. and i am simplifying this somewhat but essentially this is what we are provided.
product A, price type Standard, start date 1/1/15, end date, 3/31/15, price 20
product A, price type Sale, start date 1/5/15, end date, 1/10/15, price 15
product A, price type One day dale, start date 1/7/15, end date, 1/17/15, price 10
now i store all the data in a price fact, keys to the dates, product, and price type. so in this case there would be 3 fact records.
to make this data meaningful to our users, we create a rollup/agg that will convert the data into the following, kinda like a scd2.
product A, price type Standard, start date 1/1/15, end date, 1/4/15, price 20
product A, price type Sale, start date 1/5/15, end date, 1/6/15, price 15
product A, price type One day dale, start date 1/7/15, end date, 1/7/15, price 10
product A, price type Sale, start date 1/8/15, end date, 1/10/15, price 15
product A, price type Standard, start date 1/11/15, end date, 3/31/15, price 20
Note that these prices are sent to 3rd parties who want the data this way ...
And also note that there can be all type of overlaps, sale prices spanning multiple standard prices, and more than 3 tiers of pricing (standard, sale, one day sale is just my example).
what i do know, is that i can assign a priority to the prices, meaning that i know that "one day sale" is higher priority than "sale" which is a higher priority than "standard".
What i can do with the given data, is blow the data down the daily level using the priorities and end up with a single price per day. then i need to roll the data back up as shown above with start end dates. its the rollup that i have been struggling with. I have used lead/lag before, but couldn't figure out how to apply it here.
i will take a look at it again.
also the database is Netezza which works really well with bulk queries but is horrible with any kind of looping, in fact any kind of looping is not even an option. it does not have connect-by but does have most analytic functions.
thanks!
product A, price type Standard, start date 1/1/15, end date, 3/31/15, price 20
product A, price type Sale, start date 1/5/15, end date, 1/10/15, price 15
product A, price type One day dale, start date 1/7/15, end date, 1/17/15, price 10
now i store all the data in a price fact, keys to the dates, product, and price type. so in this case there would be 3 fact records.
to make this data meaningful to our users, we create a rollup/agg that will convert the data into the following, kinda like a scd2.
product A, price type Standard, start date 1/1/15, end date, 1/4/15, price 20
product A, price type Sale, start date 1/5/15, end date, 1/6/15, price 15
product A, price type One day dale, start date 1/7/15, end date, 1/7/15, price 10
product A, price type Sale, start date 1/8/15, end date, 1/10/15, price 15
product A, price type Standard, start date 1/11/15, end date, 3/31/15, price 20
Note that these prices are sent to 3rd parties who want the data this way ...
And also note that there can be all type of overlaps, sale prices spanning multiple standard prices, and more than 3 tiers of pricing (standard, sale, one day sale is just my example).
what i do know, is that i can assign a priority to the prices, meaning that i know that "one day sale" is higher priority than "sale" which is a higher priority than "standard".
What i can do with the given data, is blow the data down the daily level using the priorities and end up with a single price per day. then i need to roll the data back up as shown above with start end dates. its the rollup that i have been struggling with. I have used lead/lag before, but couldn't figure out how to apply it here.
i will take a look at it again.
also the database is Netezza which works really well with bulk queries but is horrible with any kind of looping, in fact any kind of looping is not even an option. it does not have connect-by but does have most analytic functions.
thanks!
topcat- Posts : 19
Join date : 2012-08-09
Lag/Lead solution is very close
there are cases where the lag/lead value will be the same but should not be filtered out.
if a product is put on a one day sale. then the lag (yesterday's price) and lead (tomorrow's price) will be the same.
if a product is put on a one day sale. then the lag (yesterday's price) and lead (tomorrow's price) will be the same.
topcat- Posts : 19
Join date : 2012-08-09
whoops, my mistake
i didnt read the solution properly, says "V, lag V, and lead V" to be filtered out ...
this does seem to work!
thanks!
this does seem to work!
thanks!
topcat- Posts : 19
Join date : 2012-08-09
Similar topics
» Query Using Aggregate or Detail Data
» Combining data from different fact tables in a query
» Data Integration and ETL is Shifting to the End-User
» Query on Dimensions
» Unable to update BE using sql query
» Combining data from different fact tables in a query
» Data Integration and ETL is Shifting to the End-User
» Query on Dimensions
» Unable to update BE using sql query
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum