Real time DWH - Choosing Right Technologies
3 posters
Page 1 of 1
Real time DWH - Choosing Right Technologies
Hi
Bascially we are in need of choosing a suitable database for our Data warehouse requirement.
For this i need a template where comparitive study carried across different databases based on different parameters.
If you have that kind of template please pass it on..
I would like to give you a brief on what is the requirement about.
Input to the system => SWIFT MESSAGES + INTERNAL CASHFLOW MESSAGES + EXTERNAL MESSAGES
The above inputs land in Java MEssaging Service Queues
SOURCE(QUEUES)----->Target DB(May be Teradata or any other database )----->Reporting
These messages are completely online and their frequency is in terms of secs/milliseconds.
Currently they are running online reports by using the data in in-memory DB. They calculate
Projections and various other business info on the fly and show it in screen.
Requirement:
They need a kind of REAL TIMe Datawarehouse to store the transformed data online with the source information as Messaging Queue.
They want to compare this real time transformed data with previous months transformed data i.e
they want to compare the historical information with the current transformed online data.
Details available:
Curent source : MEssage Queues (where message is stored)
Volume : 3 million recs/day
Size in Bytes : 18 - 20 GB per day
Expected Growth in Vol : 18 million in next 3 years
Expected volume in bytes: 70 GB per day
We feel that the current OLTP database ORACLE 11g won't be suitable for building a new Dataware house with this volume.
We need facts/details as to what parameters that we need to prove that ORACLE won't work out..
Latency provided to land to the target DB from source MEssage Queue is just 3 minutes..
I am feeling that ETL tool which is traditionally a Batch processing tools might not fit here..
I am suggesting JAVA Spring/Hibernate here to pull the messages and put it to the TARGET so that we get the transformed info immediately..
KINDLY HELP. Let me know how to narrow down to correct tools and technologies
Bascially we are in need of choosing a suitable database for our Data warehouse requirement.
For this i need a template where comparitive study carried across different databases based on different parameters.
If you have that kind of template please pass it on..
I would like to give you a brief on what is the requirement about.
Input to the system => SWIFT MESSAGES + INTERNAL CASHFLOW MESSAGES + EXTERNAL MESSAGES
The above inputs land in Java MEssaging Service Queues
SOURCE(QUEUES)----->Target DB(May be Teradata or any other database )----->Reporting
These messages are completely online and their frequency is in terms of secs/milliseconds.
Currently they are running online reports by using the data in in-memory DB. They calculate
Projections and various other business info on the fly and show it in screen.
Requirement:
They need a kind of REAL TIMe Datawarehouse to store the transformed data online with the source information as Messaging Queue.
They want to compare this real time transformed data with previous months transformed data i.e
they want to compare the historical information with the current transformed online data.
Details available:
Curent source : MEssage Queues (where message is stored)
Volume : 3 million recs/day
Size in Bytes : 18 - 20 GB per day
Expected Growth in Vol : 18 million in next 3 years
Expected volume in bytes: 70 GB per day
We feel that the current OLTP database ORACLE 11g won't be suitable for building a new Dataware house with this volume.
We need facts/details as to what parameters that we need to prove that ORACLE won't work out..
Latency provided to land to the target DB from source MEssage Queue is just 3 minutes..
I am feeling that ETL tool which is traditionally a Batch processing tools might not fit here..
I am suggesting JAVA Spring/Hibernate here to pull the messages and put it to the TARGET so that we get the transformed info immediately..
KINDLY HELP. Let me know how to narrow down to correct tools and technologies
rkraj- Posts : 12
Join date : 2012-06-29
Re: Real time DWH - Choosing Right Technologies
I don't think your issue is technology, but rather the implied architecture of your approach.
You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis? Where does the 'real time' data warehouse come in? If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed. If you are simply looking for a platform to hold the previous month's data for query, then you are not talking about a 'real time' data warehouse.
Given a throughput level of 3MM incoming transactions a day, you need something that can handle roughly 60-70 queries/second. How much data is in a previous month, and can it be aggregated? And what's wrong with the way they are doing it now?
You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis? Where does the 'real time' data warehouse come in? If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed. If you are simply looking for a platform to hold the previous month's data for query, then you are not talking about a 'real time' data warehouse.
Given a throughput level of 3MM incoming transactions a day, you need something that can handle roughly 60-70 queries/second. How much data is in a previous month, and can it be aggregated? And what's wrong with the way they are doing it now?
Re: Real time DWH - Choosing Right Technologies
Hi
Thanks for your reply... Here is the answers for your question
Q1. You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis?
Though its a 100% valid question which i am not able imagine if they can do that kind of transaction by transaction comparison.
Taking some kind of eg) Say at 10:31 AM on Mar 21 2014 they had an Account A1 for which they arrive at a Liqudity ($X) of an account on that day.. and today say Jun 07th 2014 at approx 11:00 AM i had the same account A1 which had a $Y. So they we cannot
expect transaction happening at the same time , but approximately within that day any time i would take the scenario and compare with
the MArch 21st day transaction. Yes they want to store at a transaction level..that is 100% sure.
Let me know if you have questions.
Q2. Where does the 'real time' data warehouse come in?
The incoming messages are coming 24 hrs (due to varous regions like Europe, US, ASIA), So we need to read them and do a
transform and store it immediately(do calculations like intraday liquidity ,projection etc). So that the users can read that transformed message from the Database.The latency given is 2mts. We can say "Near- real time" and NOT REAL TIME
Q3.And what's wrong with the way they are doing it now?
They don;t have a DWH currently. So they don't have the ability to store a historical comparison.They have currently data stored for only 4 days of data and that too in an in-memory database. They use this in-memory database for online reporting.
Q4. How much data is in a previous month, and can it be aggregated?
Previous month vol i have not checked, but little bit lesser or more than 3MM.Yes it can be aggregated.
NOt able to understand your statement
"you need something that can handle roughly 60-70 queries/second"
and how do you say that
"If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed"
Regards
raj_rk
Thanks for your reply... Here is the answers for your question
Q1. You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis?
Taking some kind of eg) Say at 10:31 AM on Mar 21 2014 they had an Account A1 for which they arrive at a Liqudity ($X) of an account on that day.. and today say Jun 07th 2014 at approx 11:00 AM i had the same account A1 which had a $Y. So they we cannot
expect transaction happening at the same time , but approximately within that day any time i would take the scenario and compare with
the MArch 21st day transaction. Yes they want to store at a transaction level..that is 100% sure.
Let me know if you have questions.
Q2. Where does the 'real time' data warehouse come in?
transform and store it immediately(do calculations like intraday liquidity ,projection etc). So that the users can read that transformed message from the Database.The latency given is 2mts. We can say "Near- real time" and NOT REAL TIME
Q3.And what's wrong with the way they are doing it now?
Q4. How much data is in a previous month, and can it be aggregated?
NOt able to understand your statement
"you need something that can handle roughly 60-70 queries/second"
and how do you say that
"If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed"
Regards
raj_rk
rkraj- Posts : 12
Join date : 2012-06-29
Re: Real time DWH - Choosing Right Technologies
The query/sec comment was a rough estimate of the capacity you would need to support the specific application given the 3MM/day volume you discussed. It is a very significant workload for a typical DW environment.
You should consider a two tiered architecture. Basically an ODS to support the specific application and a traditional DW for everything else. The ODS would serve multiple purposes, which include:
1. A means to stage incoming transactions for periodic load into the DW
2. A location to store staged historical data specific to the real-time analysis you need to perform. Source the data from the DW and keep it to the absolute minimum you need to support the application. If the database platform allows, you should force the entire table into memory or locate it on a memory (RAM) based device to increase performance (SSD's, while faster than disk, may not cut it). Structure the data for the specific use, not a more generalized data model.
A dimensional model is a general purpose model geared toward handling large analytic queries. It simply doesn't work well against very high volumes of very small queries. You need to design a data store for the specific application and use the DW to support it.
You should consider a two tiered architecture. Basically an ODS to support the specific application and a traditional DW for everything else. The ODS would serve multiple purposes, which include:
1. A means to stage incoming transactions for periodic load into the DW
2. A location to store staged historical data specific to the real-time analysis you need to perform. Source the data from the DW and keep it to the absolute minimum you need to support the application. If the database platform allows, you should force the entire table into memory or locate it on a memory (RAM) based device to increase performance (SSD's, while faster than disk, may not cut it). Structure the data for the specific use, not a more generalized data model.
A dimensional model is a general purpose model geared toward handling large analytic queries. It simply doesn't work well against very high volumes of very small queries. You need to design a data store for the specific application and use the DW to support it.
Re: Real time DWH - Choosing Right Technologies
Regarding the question of streaming, my colleagues have had good experience in a similar environment (on line, real time risk management) with a stack including sybase event stream processor, sap hana and panopticon.
nathanjones77- Posts : 11
Join date : 2014-06-03
Location : Zurich
Similar topics
» Real time datawarehousing
» Near Real time ETL and ETL cloud
» Real Time Data Aggregation
» ETL Design Problems for Real time
» Buildiong Star Schema in real time ?
» Near Real time ETL and ETL cloud
» Real Time Data Aggregation
» ETL Design Problems for Real time
» Buildiong Star Schema in real time ?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum