How to lookup a small group of records as one
4 posters
Page 1 of 1
How to lookup a small group of records as one
Hi,
I have source data which contains [Customer], [EventKey] and [ProcedureType], and the grain of this table is Procedure. [EventKey] is sometimes repeated, and this means there were many Procedures at one time. I need to fill a fact table that has Procedure level grain, and I need to fill a cost field which is the overall cost for the [EventKey] divided by the number of Procedures.
The problem is that the cost is determined by the set of procedures that happened at one time. E.g. Procedure A has one cost, B has another, and A & B combined do not have the costs of A and B added, they have some arbitrary cost from a table that I was given.
So I need somehow to lookup the set of several ProcedureTypes to find the cost. How would you do this? I'm hoping to use SSIS, but its lookup gadget seems to use only one record at a time.
Al Wood
I have source data which contains [Customer], [EventKey] and [ProcedureType], and the grain of this table is Procedure. [EventKey] is sometimes repeated, and this means there were many Procedures at one time. I need to fill a fact table that has Procedure level grain, and I need to fill a cost field which is the overall cost for the [EventKey] divided by the number of Procedures.
The problem is that the cost is determined by the set of procedures that happened at one time. E.g. Procedure A has one cost, B has another, and A & B combined do not have the costs of A and B added, they have some arbitrary cost from a table that I was given.
So I need somehow to lookup the set of several ProcedureTypes to find the cost. How would you do this? I'm hoping to use SSIS, but its lookup gadget seems to use only one record at a time.
Al Wood
Al Wood- Posts : 46
Join date : 2010-12-08
Re: How to lookup a small group of records as one
I think it comes down to an SQL script that picks up only one record out of many records. I can think of a couple of options to achieve that depending on the business requirement.
1. Use max(PK) or min(PK) aggregate SQL function to pick up the record with highest or lowest PK.
select *
from tableA
where PK=(select max(PK) from tableA)
2. Use row_number function with window clause '[partition by (column list)] order by (sort column list)', to pick up Nth record in a specified sequence, as follows:
select * from
(
select row_number() over (order by fieldX desc/asc) as rec_order, *
from tableA
) as A
where rec_order=N
1. Use max(PK) or min(PK) aggregate SQL function to pick up the record with highest or lowest PK.
select *
from tableA
where PK=(select max(PK) from tableA)
2. Use row_number function with window clause '[partition by (column list)] order by (sort column list)', to pick up Nth record in a specified sequence, as follows:
select * from
(
select row_number() over (order by fieldX desc/asc) as rec_order, *
from tableA
) as A
where rec_order=N
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
I don't understand
Hi,
I don't understand how your suggestion will help; maybe I didn't explain my problem very well. In the source data there are many sets of [ProcedureTypes]. e.g.
I have cost information like this:
TypeA & TypeB and TypeC = 25
TypeA = 17
TypeA & TypeD = 18
TypeD = 11
I need to gather all the Procedure Types for each Event Key, whether it's one Procedure Type or many, and lookup the cost up for all of them as a bunch.
Many Thanks,
Al Wood
I don't understand how your suggestion will help; maybe I didn't explain my problem very well. In the source data there are many sets of [ProcedureTypes]. e.g.
EventKey | ProcedureType |
1 | TypeA |
1 | TypeB |
1 | TypeC |
5 | TypeA |
2 | TypeA |
2 | TypeD |
3 | TypeD |
4 | TypeA |
4 | TypeB |
4 | TypeC |
I have cost information like this:
TypeA & TypeB and TypeC = 25
TypeA = 17
TypeA & TypeD = 18
TypeD = 11
I need to gather all the Procedure Types for each Event Key, whether it's one Procedure Type or many, and lookup the cost up for all of them as a bunch.
Many Thanks,
Al Wood
Al Wood- Posts : 46
Join date : 2010-12-08
Re: How to lookup a small group of records as one
Ok, obviously it's something else. But the solution is still relevant. So basically, you would have following lookup table:
ProcedureTypeStr - Cost
ABC - 25
A - 17
AD - 18
D - 11
You need to firstly convert your EventKey-ProedureType table into EventKey-ProcedureTypeStr using row_Number() as in my previous post, and perhaps using cursor if it's too hard by a SQL script. Then you should have one to one lookup.
ProcedureTypeStr - Cost
ABC - 25
A - 17
AD - 18
D - 11
You need to firstly convert your EventKey-ProedureType table into EventKey-ProcedureTypeStr using row_Number() as in my previous post, and perhaps using cursor if it's too hard by a SQL script. Then you should have one to one lookup.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
What we need..
Thanks for the reply.
I was afraid you'd say something like that! I was hoping to use a SSIS toolkit component, but now I think I'll have to start coding SQL.
What I need is someone to write a SSIS toolkit component like the KSCD!
Al
I was afraid you'd say something like that! I was hoping to use a SSIS toolkit component, but now I think I'll have to start coding SQL.
What I need is someone to write a SSIS toolkit component like the KSCD!
Al
Al Wood- Posts : 46
Join date : 2010-12-08
Re: How to lookup a small group of records as one
I would never count on a component in SSIS to completely resolve issues like what you described. Most of the time in real practice, you have to resort to some SQL scripts as up-stream or down-stream process around the component to complete the task. It's worth checking with Joy and Warren, two SQL Sever BI experts in the forum, to see whether my point is valid or not.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: How to lookup a small group of records as one
Hi,
Why not to create another fact table with event granularity ?
You will have a dimension Event Type
Key Name Cost No of procedures
1 TypeA 17 1
2 TypeB 18 1
3 Type D 19 1
4 TypeA & TypeB & TypeC 25 3
5 TypeA & TypeD 18 2
From the first 3 rows in your fact table you will load 1 event row with Type key=4 .
Of course you will need to gather all the Procedure Types for each Event Key during loading the new Event fact table, but later during reporting and analysis you will have the proper data. ( It is a little bit similar with summary management)
Why not to create another fact table with event granularity ?
You will have a dimension Event Type
Key Name Cost No of procedures
1 TypeA 17 1
2 TypeB 18 1
3 Type D 19 1
4 TypeA & TypeB & TypeC 25 3
5 TypeA & TypeD 18 2
From the first 3 rows in your fact table you will load 1 event row with Type key=4 .
Of course you will need to gather all the Procedure Types for each Event Key during loading the new Event fact table, but later during reporting and analysis you will have the proper data. ( It is a little bit similar with summary management)
gvarga- Posts : 43
Join date : 2010-12-15
The problem of how to perform the lookup remains
Hi,
I might consider the aggregate fact table that you suggest; in any case, the problem of how to perform the lookup remains.
http://www.projectdmx.com/tsql/rowconcatenate.aspx
This website has a number of methods to concatenate rows into strings. I hope to modify one of them to produce a sorted list of procedure IDs for each [EventRef], which I could then use in a normal SSIS lookup.
The result would be at the granularity of the aggregate fact table that you suggest. I could then use it to update the cost in my most granular fact table, even if I don't retain the aggregate fact table.
Al Wood
I might consider the aggregate fact table that you suggest; in any case, the problem of how to perform the lookup remains.
http://www.projectdmx.com/tsql/rowconcatenate.aspx
This website has a number of methods to concatenate rows into strings. I hope to modify one of them to produce a sorted list of procedure IDs for each [EventRef], which I could then use in a normal SSIS lookup.
The result would be at the granularity of the aggregate fact table that you suggest. I could then use it to update the cost in my most granular fact table, even if I don't retain the aggregate fact table.
Al Wood
Al Wood- Posts : 46
Join date : 2010-12-08
Re: How to lookup a small group of records as one
Did you ever come up with a solution?
If you could get your data into a format something like this then you could easily look up your costs:
Transforming the data into this format isn't too hard, but probably requires a cursor (or better yet looping through a table variable if using SQL Server).
Here's another idea... How many unique procedures are there? If there aren't too many, you could use powers-of-two as numeric identifiers for each procedure:
Type A = 1
Type B = 2
Type C = 4
Type D = 8
etc.
The sum of the identifiers would represent combinations of procedures, i.e. A+B+C = 7, B+D = 10. Each possible combination would have a unique value you could use to look up the costing. As long as there are less than 32 individual procedures, the sum would always fit in a 32-bit integer.
The main advantage of this approach is that you could take your original data, easily subsititue the power-of-2 procedure identifiers, and then use a SQL GROUP BY query on EVENT and take the SUM of the procedure ids to get the identifier for the combination of procedures for each EVENT. No cursor or looping required!
In essense this approach is bit-mapping your procedures!
If you could get your data into a format something like this then you could easily look up your costs:
EventKey | ProcedureType |
1 | ABC |
5 | A |
2 | AD |
3 | D |
4 | ABC |
Transforming the data into this format isn't too hard, but probably requires a cursor (or better yet looping through a table variable if using SQL Server).
Here's another idea... How many unique procedures are there? If there aren't too many, you could use powers-of-two as numeric identifiers for each procedure:
Type A = 1
Type B = 2
Type C = 4
Type D = 8
etc.
The sum of the identifiers would represent combinations of procedures, i.e. A+B+C = 7, B+D = 10. Each possible combination would have a unique value you could use to look up the costing. As long as there are less than 32 individual procedures, the sum would always fit in a 32-bit integer.
The main advantage of this approach is that you could take your original data, easily subsititue the power-of-2 procedure identifiers, and then use a SQL GROUP BY query on EVENT and take the SUM of the procedure ids to get the identifier for the combination of procedures for each EVENT. No cursor or looping required!
In essense this approach is bit-mapping your procedures!
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Intended solution
Hi,
Nice idea, but there easily could be more than 32 codes.
I think I will use concatenation. The sql turns out to be not too complex and quite quick for small datasets.
I realised that there is possibly a bug in the idea of concatenating the procedure codes themselves. They are typed in by users, and what if they insert a comma into one of them? E.g. these procedure codes:
These concatenate to "TypeA,TypeB,Type,E" which looks like four codes and can't be seperated from the case where "Type" or "E" are codes in themselves.
So I'm going to try to find a character that is forbidden in the source system, e.g. "¬" and concatenate with that, e.g. "TypeA¬TypeB¬Type,E". If I cant, I will use a small permanent table, in the data prepearation area, that auto-numbers any new codes:
and I will concatenate the ID values, e.g. "1,2,3"
Then I create an SSIS lookup using the "1,2,3" string, remembering to sort them first, so I don't get "1,3,2"
The only slight doubt I have is what happens when the Data Steward has to maintain the cost lookup table with "1,2,3" = 872.33. Will they cope, or will I need to write some kind of Access Form or other frontend to make it plain as day for them. It should be fun!
Nice idea, but there easily could be more than 32 codes.
I think I will use concatenation. The sql turns out to be not too complex and quite quick for small datasets.
I realised that there is possibly a bug in the idea of concatenating the procedure codes themselves. They are typed in by users, and what if they insert a comma into one of them? E.g. these procedure codes:
TypeA |
TypeB |
Type,E |
These concatenate to "TypeA,TypeB,Type,E" which looks like four codes and can't be seperated from the case where "Type" or "E" are codes in themselves.
So I'm going to try to find a character that is forbidden in the source system, e.g. "¬" and concatenate with that, e.g. "TypeA¬TypeB¬Type,E". If I cant, I will use a small permanent table, in the data prepearation area, that auto-numbers any new codes:
1 | TypeA |
2 | TypeB |
3 | Type,E |
and I will concatenate the ID values, e.g. "1,2,3"
Then I create an SSIS lookup using the "1,2,3" string, remembering to sort them first, so I don't get "1,3,2"
The only slight doubt I have is what happens when the Data Steward has to maintain the cost lookup table with "1,2,3" = 872.33. Will they cope, or will I need to write some kind of Access Form or other frontend to make it plain as day for them. It should be fun!
Last edited by Al Wood on Thu Jan 06, 2011 6:36 am; edited 2 times in total (Reason for editing : Typos)
Al Wood- Posts : 46
Join date : 2010-12-08
Re: How to lookup a small group of records as one
Good idea to use delimiters in your concatinated string! I frequently use a pipe character "|" as a delimiter; users seem to undertand a pipe-delimited list: TypeA|TypeB|TypeC. However, if your source system doesn't prevent them from typing that character it would still be possible to get into trouble.
You might also want to validate the user-entered values against a list of legal values during your ETL process. At the very least you could produce an exception report to let you know that you are loading some bad values.
You might also want to validate the user-entered values against a list of legal values during your ETL process. At the very least you could produce an exception report to let you know that you are loading some bad values.
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: How to lookup a small group of records as one
Why do you need to prefix the the string with 'Type' and delimit each character anyway? The type string is an open ended lookup code. The worst thing is to make it unnecessarily long, and in this case it's 5 times longer, unless you could have event type with more than 1 character.VHF wrote:users seem to undertand a pipe-delimited list: TypeA|TypeB|TypeC.
I'm not sure if the users ever need to see the type string code. Even if they do, the column name should indicate word 'Type' and I can't see the code 'TypeA|TypeB|TypeC' is more descriptive than 'ABC' under the column name 'Event type string'.
If there is any data quality issue, it should be fixed up by a process called ETL and it's hard and dirty. Using cursor or whatever means to get data in order could consume a lot of time, but that's what ETL used for, and there is plenty time during nightly load.
I think the core issue in the problem is how to structure and load the lookup table and the tricky bit is to prepare the event type character list in proper order before concatenating them.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Similar topics
» What's the best sorting utility out there for a small windows system? (less expensive than SyncSort)
» Using Tableau for small DW (less than 1 TB)
» Too many small dimension tables
» Very Small Dimension Table
» Fact table granulartity to small or just built wrong
» Using Tableau for small DW (less than 1 TB)
» Too many small dimension tables
» Very Small Dimension Table
» Fact table granulartity to small or just built wrong
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum