Large volume of hospital data into fact table
3 posters
Page 1 of 1
Large volume of hospital data into fact table
Hi all,
I am working in INDIA for a hospital network, my requirement is create a data model to capture hospital patient register data, the hospital has chain of sub hospital(around 15) and it's spread across all state(25), single state data volume is around 6 mil row, if I bring all state data then it will come around 1 to 2TB data. Please advise me as how to handle this volume of data.
Here are the kind of info I will capture
patient info(name, where he or she from,...)
diagnosis
procedure
I am thinking about creating fact tables as
patient diagnosis/procedure as separate
I am working in INDIA for a hospital network, my requirement is create a data model to capture hospital patient register data, the hospital has chain of sub hospital(around 15) and it's spread across all state(25), single state data volume is around 6 mil row, if I bring all state data then it will come around 1 to 2TB data. Please advise me as how to handle this volume of data.
Here are the kind of info I will capture
patient info(name, where he or she from,...)
diagnosis
procedure
I am thinking about creating fact tables as
patient diagnosis/procedure as separate
inddatadm- Posts : 3
Join date : 2015-10-15
Large volume of hospital data into fact table
Do you know what business user is asking to measure and the population for it ....all states or just one state? I am not worry about the data volume but the requirements to built on. Your dim tables are going to be smaller than the Fact table? If you partition Fact table on patient visit's date MONTH then performance is going to be fast if you bring all states data. BTW, keep diagnosis and procedure together. A doctor wants to see them together. Keeping them separated would cause performance issue and joining them together, if kept separate, would be a nightmare.
I do not know if you have done any data profile on the patient data, but a very sick patient can have multiple diagnosis and procedures in a day. Please read Kimball book on how to create bridge table for that purpose. He described it in detail on how to do data modeling for Health care industry.
Data volume is the least challenging part when working with patient's data. The most challenging part is how to de-dup patient's demographic data to create a master/unique patient dim.
I do not know if you have done any data profile on the patient data, but a very sick patient can have multiple diagnosis and procedures in a day. Please read Kimball book on how to create bridge table for that purpose. He described it in detail on how to do data modeling for Health care industry.
Data volume is the least challenging part when working with patient's data. The most challenging part is how to de-dup patient's demographic data to create a master/unique patient dim.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Large volume of hospital data into fact table
The requirement is to bring meausre from all state, which is why I am worried about query performance. Currentnly we have diagnosis and procedure as separate fact with each diagnosis or procedure as one row (basically if the patient has 10 diagnosis then , in the diagnosis table we will have 10 rows with sequence number) same way for procedures.
where as I have separate table for visit which is 1 row for each visit.
where as I have separate table for visit which is 1 row for each visit.
inddatadm- Posts : 3
Join date : 2015-10-15
Re: Large volume of hospital data into fact table
Generally, diagnosis is modeled as a multi-valued dimension, with a bridge table between a fact and the dimension table. Procedures, on the other hand are stored as a fact at the procedure grain(procedures usually have a lot of measures associated with them). Both visit and procedure facts would reference diagnosis through the bridge.
There are two ways to store diagnosis. You can either have a bridge keyed by visit and diagnosis, or keyed by diagnosis group and diagnosis. The diagnosis group is a derived value based on the unique combination of diagnoses. The latter is more difficult to construct, but my experience doing this with ICD-9 coding was a significantly smaller bridge table (approximately 90% smaller). This was due to the fact doctors usually don't code very many, there is a high correlation between diagnosis used and most of the time a large numbers of patients have the same diagnosis. By its nature, the number of groups grows very quickly when initially constructed, but then grows very slowly as most diagnosis combinations have been encountered.
Note that this grouping is based solely on the combination of codes and should not be confused with diagnosis groups, such as the John Hopkins algorithm, which consider patient demographics and severity of the diagnosis. Building groups is a matter of constructing a natural key made up of the primary diagnosis followed by the remaining diagnoses sorted in ascending sequence. The bridge should include a 'is primary' flag to identify the primary diagnosis or you can place a separate primary diagnosis key on the facts if that is commonly used in analysis to avoid using the bridge.
However, with the higher precision built into ICD-10 coding, it is not clear if you would experience as significant reduction in size, but it may be worth looking into.
Also, considering the population you need to deal with, it may be worth looking into high power platforms, such as Netezza.
There are two ways to store diagnosis. You can either have a bridge keyed by visit and diagnosis, or keyed by diagnosis group and diagnosis. The diagnosis group is a derived value based on the unique combination of diagnoses. The latter is more difficult to construct, but my experience doing this with ICD-9 coding was a significantly smaller bridge table (approximately 90% smaller). This was due to the fact doctors usually don't code very many, there is a high correlation between diagnosis used and most of the time a large numbers of patients have the same diagnosis. By its nature, the number of groups grows very quickly when initially constructed, but then grows very slowly as most diagnosis combinations have been encountered.
Note that this grouping is based solely on the combination of codes and should not be confused with diagnosis groups, such as the John Hopkins algorithm, which consider patient demographics and severity of the diagnosis. Building groups is a matter of constructing a natural key made up of the primary diagnosis followed by the remaining diagnoses sorted in ascending sequence. The bridge should include a 'is primary' flag to identify the primary diagnosis or you can place a separate primary diagnosis key on the facts if that is commonly used in analysis to avoid using the bridge.
However, with the higher precision built into ICD-10 coding, it is not clear if you would experience as significant reduction in size, but it may be worth looking into.
Also, considering the population you need to deal with, it may be worth looking into high power platforms, such as Netezza.
Large volume of hospital data into fact table
Thanks for your input, one of the other problem is the registration id(which I am storing as degenerated dimension in the fact table, because each hospital has it's own registration number and it can be same for other hospital) so I am leaving that as DD and storing in Fact table, my problem is now we have all other attributes related to registrationn(like when patient admit, discharge dates and other comment fields) are going to come in, if I don't get them into a dimension table then all other attributes will also end up in fact table , any input ?
if I bring these registration id by each hospital , the dimension table will become like a transaction table .. so not sure what I should di
if I bring these registration id by each hospital , the dimension table will become like a transaction table .. so not sure what I should di
inddatadm- Posts : 3
Join date : 2015-10-15
Re: Large volume of hospital data into fact table
Registration ID does not have to be a dimension table. The ID itself can be degenerate. You can represent attributes associated with the registration as other dimensions of the fact (such as admit date), either conforming or junk, as needed.
Re: Large volume of hospital data into fact table
Dates should exist in the fact table. I resolved that "comments" issue by creating a bridge table. The bridge table had all the procedure done for a patient and it also had the comments for a procedure. Below is what I had as a design.... an arrow -> means 1:m . For example 1 or many patient rows exists in the fact table.
Fact table <- patient dim -> patient bridge <- procedure codes
This is a workable solutions for me. Like I mentioned before if you partition fact table on a date, performance should not be an issue.
Fact table <- patient dim -> patient bridge <- procedure codes
This is a workable solutions for me. Like I mentioned before if you partition fact table on a date, performance should not be an issue.
zoom- Posts : 97
Join date : 2010-08-23
Location : Chicago
Similar topics
» Daily snapshot fact table-any chance to reduce data volume?
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
» Volume and Weight in Same Fact Table
» Large Dimension table compared to fact table?
» Healthcare Hospital Stay fact table - transaction fact table - a row for In and a row for Out?
» Design a snapshot Fact table in data mart from compliance-enable fact table in enterprise data warehouse
» Volume and Weight in Same Fact Table
» Large Dimension table compared to fact table?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum