Real time DWH - Choosing Right Technologies

View previous topic View next topic Go down

Real time DWH - Choosing Right Technologies

Post  rkraj on Sun Jun 08, 2014 12:46 am

Hi



Bascially we are in need of choosing a suitable database for our Data warehouse requirement.
For this i need a template where comparitive study carried across different databases based on different parameters.
If you have that kind of template please pass it on..



I would like to give you a brief on what is the requirement about.
Input to the system => SWIFT MESSAGES + INTERNAL CASHFLOW MESSAGES + EXTERNAL MESSAGES
The above inputs land in Java MEssaging Service Queues

SOURCE(QUEUES)----->Target DB(May be Teradata or any other database )----->Reporting

These messages are completely online and their frequency is in terms of secs/milliseconds.

Currently they are running online reports by using the data in in-memory DB. They calculate
Projections and various other business info on the fly and show it in screen.

Requirement:
They need a kind of REAL TIMe Datawarehouse to store the transformed data online with the source information as Messaging Queue.
They want to compare this real time transformed data with previous months transformed data i.e
they want to compare the historical information with the current transformed online data.

Details available:
Curent source : MEssage Queues (where message is stored)
Volume : 3 million recs/day
Size in Bytes : 18 - 20 GB per day
Expected Growth in Vol : 18 million in next 3 years
Expected volume in bytes: 70 GB per day

We feel that the current OLTP database ORACLE 11g won't be suitable for building a new Dataware house with this volume.
We need facts/details as to what parameters that we need to prove that ORACLE won't work out..


Latency provided to land to the target DB from source MEssage Queue is just 3 minutes..
I am feeling that ETL tool which is traditionally a Batch processing tools might not fit here..
I am suggesting JAVA Spring/Hibernate here to pull the messages and put it to the TARGET so that we get the transformed info immediately..

KINDLY HELP. Let me know how to narrow down to correct tools and technologies





rkraj

Posts : 12
Join date : 2012-06-29

View user profile

Back to top Go down

Re: Real time DWH - Choosing Right Technologies

Post  ngalemmo on Sun Jun 08, 2014 3:21 am

I don't think your issue is technology, but rather the implied architecture of your approach.

You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis? Where does the 'real time' data warehouse come in? If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed. If you are simply looking for a platform to hold the previous month's data for query, then you are not talking about a 'real time' data warehouse.

Given a throughput level of 3MM incoming transactions a day, you need something that can handle roughly 60-70 queries/second. How much data is in a previous month, and can it be aggregated? And what's wrong with the way they are doing it now?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Real time DWH - Choosing Right Technologies

Post  rkraj on Sun Jun 08, 2014 7:05 am

Hi
Thanks for your reply... Here is the answers for your question

Q1. You say you want to compare incoming data against the previous month. Is that on a transaction-by-transaction basis?
Though its a 100% valid question which i am not able imagine if they can do that kind of transaction by transaction comparison.
Taking some kind of eg) Say at 10:31 AM on Mar 21 2014 they had an Account A1 for which they arrive at a Liqudity ($X) of an account on that day.. and today say Jun 07th 2014 at approx 11:00 AM i had the same account A1 which had a $Y. So they we cannot
expect transaction happening at the same time , but approximately within that day any time i would take the scenario and compare with
the MArch 21st day transaction. Yes they want to store at a transaction level..that is 100% sure.
Let me know if you have questions.

Q2. Where does the 'real time' data warehouse come in?
The incoming messages are coming 24 hrs (due to varous regions like Europe, US, ASIA), So we need to read them and do a
transform and store it immediately(do calculations like intraday liquidity ,projection etc). So that the users can read that transformed message from the Database.The latency given is 2mts. We can say "Near- real time" and NOT REAL TIME

Q3.And what's wrong with the way they are doing it now?
They don;t have a DWH currently. So they don't have the ability to store a historical comparison.They have currently data stored for only 4 days of data and that too in an in-memory database. They use this in-memory database for online reporting.

Q4. How much data is in a previous month, and can it be aggregated?
Previous month vol i have not checked, but little bit lesser or more than 3MM.Yes it can be aggregated.

NOt able to understand your statement
"you need something that can handle roughly 60-70 queries/second"

and how do you say that
"If your intent is to store the incoming data into the warehouse before you compare it to the previous month, your approach is flawed"


Regards
raj_rk


rkraj

Posts : 12
Join date : 2012-06-29

View user profile

Back to top Go down

Re: Real time DWH - Choosing Right Technologies

Post  ngalemmo on Fri Jun 13, 2014 12:36 pm

The query/sec comment was a rough estimate of the capacity you would need to support the specific application given the 3MM/day volume you discussed. It is a very significant workload for a typical DW environment.

You should consider a two tiered architecture. Basically an ODS to support the specific application and a traditional DW for everything else. The ODS would serve multiple purposes, which include:

1. A means to stage incoming transactions for periodic load into the DW
2. A location to store staged historical data specific to the real-time analysis you need to perform. Source the data from the DW and keep it to the absolute minimum you need to support the application. If the database platform allows, you should force the entire table into memory or locate it on a memory (RAM) based device to increase performance (SSD's, while faster than disk, may not cut it). Structure the data for the specific use, not a more generalized data model.

A dimensional model is a general purpose model geared toward handling large analytic queries. It simply doesn't work well against very high volumes of very small queries. You need to design a data store for the specific application and use the DW to support it.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Real time DWH - Choosing Right Technologies

Post  nathanjones77 on Thu Jun 19, 2014 8:42 am

Regarding the question of streaming, my colleagues have had good experience in a similar environment (on line, real time risk management) with a stack including sybase event stream processor, sap hana and panopticon.

nathanjones77

Posts : 11
Join date : 2014-06-03
Location : Zurich

View user profile

Back to top Go down

Re: Real time DWH - Choosing Right Technologies

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum