Health Catalyst - Healthcare DW/BI
5 posters
Page 1 of 1
Health Catalyst - Healthcare DW/BI
Has anyone had exposure to this organization? They make intriguing claims about a "late binding bus architecture"
Any thoughts ?
http://www.healthcatalyst.com/company/
After determining that the predominant approaches to data modeling weren’t effective for healthcare data, they discovered the solution, which is now known as the Adaptive Data Architecture. Using a late-binding bus architecture, Catalyst’s adaptive data model is agile, flexible, and can be implemented in a matter of weeks compared to the months or years traditional approaches require
brownp123me- Posts : 2
Join date : 2013-02-05
Re: Health Catalyst - Healthcare DW/BI
What's "Late-Binding" mean?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Health Catalyst - Healthcare DW/BI
that's what i'm trying to figure out. I have asked someone from the company and will post when i get it. Should be interesting
brownp123me- Posts : 2
Join date : 2013-02-05
Re: Health Catalyst - Healthcare DW/BI
If they are talking about fact/dimension relationships, they are probably using a timestamp qualifier in joins.
In a strict dimensional model, it uses 'early binding'. When a fact table is loaded, FK relationships to type 2 dimensions are assigned at the time of load. This becomes a fixed, non-ambiguous, relationship. It identifies the member as well as the version of the member that is associated to the fact.
In a 'late binding' scenario, the fact is associate with the member, but not a specific version (i.e. it stores a type 1 FK with the fact). When the dimension is referenced, a timestamp associated with the fact is used to locate the proper version of the dimensions... it uses a composite key (type 1 key and timestamp) to access the dimension.
In late binding, late arriving dimension data is not a problem because the fact/dimension binding occurs at the time of use, rather than the time of load. If a retroactive dimension update occurs, subsequent queries for facts in that timeframe would carry the changed attributes without needing to rekey the facts.
In a strict dimensional model, it uses 'early binding'. When a fact table is loaded, FK relationships to type 2 dimensions are assigned at the time of load. This becomes a fixed, non-ambiguous, relationship. It identifies the member as well as the version of the member that is associated to the fact.
In a 'late binding' scenario, the fact is associate with the member, but not a specific version (i.e. it stores a type 1 FK with the fact). When the dimension is referenced, a timestamp associated with the fact is used to locate the proper version of the dimensions... it uses a composite key (type 1 key and timestamp) to access the dimension.
In late binding, late arriving dimension data is not a problem because the fact/dimension binding occurs at the time of use, rather than the time of load. If a retroactive dimension update occurs, subsequent queries for facts in that timeframe would carry the changed attributes without needing to rekey the facts.
Re: Health Catalyst - Healthcare DW/BI
How is the performance with such a database? Do the complex joins cause performance to drag? I would think that since you can't define a Primary Key performance would suffer. You really couldn't even build a cluster index on the dimension key and the dates fields without really slowing load performance.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Health Catalyst - Healthcare DW/BI
If you are using a strict dimensional pattern, no, the impact is usually not significant. Netezza is very effective with star schema. Generally, dimensions tend to be small (under 1GB or so), so Netezza will pull these into memory and use the memory image (only containing needed columns) to join en-masse with the rows in the fact table. Its a giant merge in one pass through the data.
If there is a large dimension that is commonly used in queries, it is usually beneficial to distribute both the dimension and the fact table by the same key. This means joins between these two large tables (large dimension table and fact table) is performed on the same SPU. Another strategy is to use a common group-by dimension key. When aggregations are performed, the aggregations will operate in parallel, which can significantly improve performance of aggregate queries that use that dimension. The challenge with this strategy is getting an appropriately smooth distribution. The more columns you organize on, the less likely that organization is useful to a query.
Also, Netezza has no indexes. It cannot enforce PK or FK constraints, so it doesn't. It allows you to declare them for documentary purposes. Some BI tools and other query tools may use this information to support the user experience.
If there is a large dimension that is commonly used in queries, it is usually beneficial to distribute both the dimension and the fact table by the same key. This means joins between these two large tables (large dimension table and fact table) is performed on the same SPU. Another strategy is to use a common group-by dimension key. When aggregations are performed, the aggregations will operate in parallel, which can significantly improve performance of aggregate queries that use that dimension. The challenge with this strategy is getting an appropriately smooth distribution. The more columns you organize on, the less likely that organization is useful to a query.
Also, Netezza has no indexes. It cannot enforce PK or FK constraints, so it doesn't. It allows you to declare them for documentary purposes. Some BI tools and other query tools may use this information to support the user experience.
Late Binding in data warehousing
ngalemmo captures the essence of our late binding methodology, very well. I'm a senior VP with Health Catalyst, but a CIO and data warehousing guy, first. If you would like to learn more about our methodology, please give me a shout. You can also Google "late binding data warehouse slideshare" for a slide deck that provides an overview.
Dale Sanders, dale.sanders@healthcatalyst.com
Dale Sanders, dale.sanders@healthcatalyst.com
drsanders- Posts : 1
Join date : 2013-05-02
Re: Health Catalyst - Healthcare DW/BI
On the same lines as ngalemmo mentioned, we have used a similar approach of defining the dimension with composite key something like say (Dim_id and version_number).
Dim_id is a surrogate key which remains same for every natural key but version increases each time there is a change detection on the natural key. eg:
For natural key (employee_id = 1000), the EMP_DIM_ID = 80 would always be remain same as 80. But everytime any attribute of employee id=1000 changes the version number increases.
Now while reporting the business user would always point to the latest record of employee using the below query:
Select *
from fact f, emp_dim e
where f.emp_dim_id = e.emp_dim_id
and e.emp_current_rec = 1 (the record is most current and not expired)
Dim_id is a surrogate key which remains same for every natural key but version increases each time there is a change detection on the natural key. eg:
For natural key (employee_id = 1000), the EMP_DIM_ID = 80 would always be remain same as 80. But everytime any attribute of employee id=1000 changes the version number increases.
Now while reporting the business user would always point to the latest record of employee using the below query:
Select *
from fact f, emp_dim e
where f.emp_dim_id = e.emp_dim_id
and e.emp_current_rec = 1 (the record is most current and not expired)
sachij3u- Posts : 19
Join date : 2013-07-11
Age : 43
Location : Herndon, VA
Similar topics
» Health Care Dimensional Modelling
» newbie question on health care modeling
» Data Modeler / ETL / BI needed at ODS Health in Portland, OR
» Best practise in creating fact tables for health care
» healthcare screening
» newbie question on health care modeling
» Data Modeler / ETL / BI needed at ODS Health in Portland, OR
» Best practise in creating fact tables for health care
» healthcare screening
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|