Modeling In-Memory
4 posters
Page 1 of 1
Modeling In-Memory
Hello,
I am an avid supporter of dimensional modeling, however there we are running into design concerns when applying principles/practice to a completely in-memory solution. Our initial design is such that source system tables are replicated into an appliance and the modeling is entirely virtual (versus ETL in a typical solution). This means basic principles, such as surrogate keys, are no longer practical.
Has anyone had success extending dimensional methodologies in an entirely virtual layer? The simplest approach is to duplicate the data into dims/facts, however this is not cost effective (memory storage is very expensive), and the base tables are required to be stored for extraneous reasons. Any ideas on how to approach this, or is dimensional modeling the wrong approach in this scenario?
Thanks!
I am an avid supporter of dimensional modeling, however there we are running into design concerns when applying principles/practice to a completely in-memory solution. Our initial design is such that source system tables are replicated into an appliance and the modeling is entirely virtual (versus ETL in a typical solution). This means basic principles, such as surrogate keys, are no longer practical.
Has anyone had success extending dimensional methodologies in an entirely virtual layer? The simplest approach is to duplicate the data into dims/facts, however this is not cost effective (memory storage is very expensive), and the base tables are required to be stored for extraneous reasons. Any ideas on how to approach this, or is dimensional modeling the wrong approach in this scenario?
Thanks!
real_mhardy- Posts : 4
Join date : 2014-06-13
Re: Modeling In-Memory
What are you asking? What do you mean by 'virtual'? Surely the data exists in some structure in the memory based repository. Assuming the data is normalized (to reduce storage requirements), do you mean create views to give the appearance of a dimensional model? Sure, why not?
I mean, a materialized dimensional model on traditional storage devices provides some performance advantages for large analytic queries. But, if the data resides in a high speed memory device, the performance difference (if any) becomes negligible.
I mean, a materialized dimensional model on traditional storage devices provides some performance advantages for large analytic queries. But, if the data resides in a high speed memory device, the performance difference (if any) becomes negligible.
Re: Modeling In-Memory
The data would physically exist in the memory based repository replicated 1-for-1 as it exists in the source system (with the exception of additional metadata columns). The dimensionaliztion and conformation would happen as a set of views built on top of the source system data model. However, using this approach a I believe a number of best practices no longer make sense. For example, surrogate keys no longer are possible and junk dimensions/attributes should go on the fact table directly. I don't even know how to even approach type 2 history. Any suggestions on how to proceed?
Thanks!
Thanks!
real_mhardy- Posts : 4
Join date : 2014-06-13
Re: Modeling In-Memory
Don't confuse hardware with data modeling. The reasons why models are the way they are have nothing to do with the hardware.
The ideas behind dimensional models and surrogate keys have to do with the long term stability of the relationships among data in the model. A copy of the operational tables does nothing to ensure this stability, or history for that matter.
What you are doing is replicating a bunch of tables and throwing views over them. Which, for what it is, is not a bad thing, but it is not a data warehouse (or at least not a proper one). I doesn't matter what media you are using to store the data.
If you are concerned about type 2 history, you need to build a proper model. What you intend to do won't cut it.
The ideas behind dimensional models and surrogate keys have to do with the long term stability of the relationships among data in the model. A copy of the operational tables does nothing to ensure this stability, or history for that matter.
What you are doing is replicating a bunch of tables and throwing views over them. Which, for what it is, is not a bad thing, but it is not a data warehouse (or at least not a proper one). I doesn't matter what media you are using to store the data.
If you are concerned about type 2 history, you need to build a proper model. What you intend to do won't cut it.
Re: Modeling In-Memory
I'm with ngalemmo. You still have to load the tables to memory. Creating one super wide table with all columns has the same drawbacks whether it resides in-memory or on disk. Your in memory dimensional model should run exceptionally fast as disk access is still the slowest part of a given query.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Modeling In-Memory
Thanks ngalemmo and BoxesAndLines for your replies.
One thing to note is that we have a requirement to load the source system tables as-is into the memory appliance. We are trying to avoid instantiating the data physically a second time, hence why we are considering a virtual data warehouse through views build on replicated source system tables. Sorry to sound naive, but why can't the model data not exist physically, but virtually through views? Wouldn't the same principles apply?
ngalemmo: I don't think I'm confusing hardware with data modeling. Hardware allows for different approaches and/or removes previous limitations. The model is still the model and should look the same regardless of technology.
BoxesAndLines: I wasn't thinking of creating super wide views as it limits the ability to reuse and conform.
Thanks!
One thing to note is that we have a requirement to load the source system tables as-is into the memory appliance. We are trying to avoid instantiating the data physically a second time, hence why we are considering a virtual data warehouse through views build on replicated source system tables. Sorry to sound naive, but why can't the model data not exist physically, but virtually through views? Wouldn't the same principles apply?
ngalemmo: I don't think I'm confusing hardware with data modeling. Hardware allows for different approaches and/or removes previous limitations. The model is still the model and should look the same regardless of technology.
BoxesAndLines: I wasn't thinking of creating super wide views as it limits the ability to reuse and conform.
Thanks!
real_mhardy- Posts : 4
Join date : 2014-06-13
Re: Modeling In-Memory
real_mhardy wrote:Thanks ngalemmo and BoxesAndLines for your replies.
One thing to note is that we have a requirement to load the source system tables as-is into the memory appliance...
I'm beginning to see the problem. As a data modeling professional, nobody in the business is qualified to make this requirement. That said, if you can virtualize a dimensional model off of randomly normalized source tables, that's great! I know Teradata's been claiming to be able to do that for years but I've yet to see it done successfully (i.e. good performance).
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Modeling In-Memory
If the data model is properly normalized you should have no problem creating a dimensional view of the same.
You cannot support change history unless you make changes to the data model.
Why would building a proper model incur another instance of the data? You have an operational instance and an analytic instance. Why would using a different model for the analytics instance result in a third copy?
You cannot support change history unless you make changes to the data model.
Why would building a proper model incur another instance of the data? You have an operational instance and an analytic instance. Why would using a different model for the analytics instance result in a third copy?
Re: Modeling In-Memory
ngalemmo: We are hoping to only have one physical instance of the data, the operational instance (source system model). The analytical instance would not exist physically, but as a collection of views sitting ontop denormalizing, formatting, and adding business rules.
Is this a viable approach? I think we recognise that for certain requirements, such as type 2 history, storing the data a second time may be necessary. Do you know if there is any literature on the subject or is this a "bad idea"?
Thanks!
Is this a viable approach? I think we recognise that for certain requirements, such as type 2 history, storing the data a second time may be necessary. Do you know if there is any literature on the subject or is this a "bad idea"?
Thanks!
real_mhardy- Posts : 4
Join date : 2014-06-13
Re: Modeling In-Memory
real_mhardy wrote:
Is this a viable approach? I think we recognise that for certain requirements, such as type 2 history, storing the data a second time may be necessary. Do you know if there is any literature on the subject or is this a "bad idea"?
From talking to my SAP HANA colleagues, this isn't necessarily seen as an issue yet so there isn't literature on it. Thinking deeper about it after these conversations there are two cases. (I recognise you might not be talking HANA, but it still should be relevant).
1) Live analytics within the application. Based on live data, no need for history. One of the biggest sales pitches for HANA is removing the layers and latency within this model to enable operational reports (and remove the need for a separate operational data store)
2) Historic analytics - needs 'as-was' data for analyitc and reporting purposes.
Both can run in memory on a database that is supporting ACID (again, e.g. HANA)
Case 1) is the usual sales pitch for HANA - running analytics based on live data, which doesn't have the need for history and as so, can be run on top of the views on the ERP.
2) is the hard case that isn't often part of the sales pitch and involves reengineering the data warehouse/BW to strip out the view and summary layers that make up the current architecture. This is where history tracking and dimension conformance comes in. History tracking as done in either the CIF approach, data vault or SCDs involves persistent inserts and updates to some table that is different to the live production tables (whether CRM, ERP or core platform). This clearly has to be a different functional requirement and piece of functionality. Analytic results in case 2) may be fed back into case 1) for live analytics. If we are in case 2), then, say the SCD customer is maintained separately from the source customer and similarly the fact tables are maintained separately as otherwise, key management is not implemented properly and you don't have history.
From a dimensional modelling perspective, you can certainly model the views in case 1 dimensionally but don't expect history tracking (or conformance if you're not in a full SAP world (I'm simplifying dramatically here) . Case 2, is simply a data warehouse - dimensional or CIF as you wish to model (and obviously here isn't the place to argue for the second of those ) and has to manage all the things a data warehouse manages. Given the cost of RAM, I imagine that there also needs to be some dynamic caching physically in order to host some (older, less used) data on slower disk and SSD rather than RAM.
nathanjones77- Posts : 11
Join date : 2014-06-03
Location : Zurich
Similar topics
» Unified Information System
» Modeling demographics
» In-Memory Database - Does star schema still relevant ?
» DW Modeling Help
» Conformed "series" dimension / survey modeling?
» Modeling demographics
» In-Memory Database - Does star schema still relevant ?
» DW Modeling Help
» Conformed "series" dimension / survey modeling?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum