Accounts subject area ODS daily loading disaster!!
4 posters
Page 1 of 1
Accounts subject area ODS daily loading disaster!!
Dear All,
Greetings,
Kindly provide your support to solve my case:
- I'm now building my Accounts subject area for Banking Data Warehouse (ODS).
- When i have started just one source (Credit Cards) i found that i have about 85,000 accounts with changing balances daily.
- For my 3 years plan with data warehouse storage i will find:
85,000 * 365 = 31,025,000 records per year .....
and 93,075,000 records for the credit cards only changing balances.
I still have many different sources for banking accounts (Assets, Liabilities, ATM, ......etc.)
Please Advise how can i keep the logic of Operational Data Store in the data warehouse without facing this disaster of storage.
Great thanks in Advance
Galileo
Greetings,
Kindly provide your support to solve my case:
- I'm now building my Accounts subject area for Banking Data Warehouse (ODS).
- When i have started just one source (Credit Cards) i found that i have about 85,000 accounts with changing balances daily.
- For my 3 years plan with data warehouse storage i will find:
85,000 * 365 = 31,025,000 records per year .....
and 93,075,000 records for the credit cards only changing balances.
I still have many different sources for banking accounts (Assets, Liabilities, ATM, ......etc.)
Please Advise how can i keep the logic of Operational Data Store in the data warehouse without facing this disaster of storage.
Great thanks in Advance
Galileo
Galelio- Posts : 4
Join date : 2014-08-03
Re: Accounts subject area ODS daily loading disaster!!
Hi,
I'm not clear exactly what the problem is that you are asking for help with - so if you could clarify that would be great.
If you need to store a particular set of data then just size your platform appropriately; if you don't need to store data permanently then delete it once you have consumed it elsewhere in your system.
Just as an FYI, the system I am currently working on is looking to handle 30-35 million records per day and hold data for up to 7 years - so we have devised a DB storage strategy to cope with this volume of data - working with our DB vendor to understand the options. None of this is particularly challenging, we just need to pick the right tool for the job based on the tools already out there doing this kind of work already.
Regards,
I'm not clear exactly what the problem is that you are asking for help with - so if you could clarify that would be great.
If you need to store a particular set of data then just size your platform appropriately; if you don't need to store data permanently then delete it once you have consumed it elsewhere in your system.
Just as an FYI, the system I am currently working on is looking to handle 30-35 million records per day and hold data for up to 7 years - so we have devised a DB storage strategy to cope with this volume of data - working with our DB vendor to understand the options. None of this is particularly challenging, we just need to pick the right tool for the job based on the tools already out there doing this kind of work already.
Regards,
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Accounts subject area ODS daily loading disaster!!
Hi,
Great thanks for your reply,
To be clear, my question is: if i have about 1 Million changing records per day:
1) I'm confirming with you that i will use the SCD-2 (to keep the history)
2) As i understood from your reply, i shouldn't afraid of the storage if i managed it carefully with the DB Vendor.(Do you mean that the vendor can provide me with an administration and storing solutions like "Compression" or etc.?)
Thanks in advance,
Galileo
Great thanks for your reply,
To be clear, my question is: if i have about 1 Million changing records per day:
1) I'm confirming with you that i will use the SCD-2 (to keep the history)
2) As i understood from your reply, i shouldn't afraid of the storage if i managed it carefully with the DB Vendor.(Do you mean that the vendor can provide me with an administration and storing solutions like "Compression" or etc.?)
Thanks in advance,
Galileo
Galelio- Posts : 4
Join date : 2014-08-03
Re: Accounts subject area ODS daily loading disaster!!
If the business wants day-end balances for 3 years, it is what it is. Just wait until you need to maintain the transactional data, volumes will be a lot higher than that. The thing is, everything is doable, it just becomes a matter of time and money. Assuming you are an architect, your job would be to work out what is needed to implement the solution. It is up to the business to decide if the cost fits the perceived value. If it doesn't, then there needs to be some adjustment, such as reducing the amount of history, to fit the budget.
If you want to take a long term view, you need to assess what the end-state is going to look like and what the total data and service level requirements will be. Based on that you may want to evaluate your current platform and determine if it is the best choice going forward.
If you want to take a long term view, you need to assess what the end-state is going to look like and what the total data and service level requirements will be. Based on that you may want to evaluate your current platform and determine if it is the best choice going forward.
Re: Accounts subject area ODS daily loading disaster!!
Hi,
Great thanks for your reply,
You told us sir "If the business wants day-end balances for 3 years, it is what it is"
I'd like to let you be informed that We are Involving our business in Data mart modeling as in the the Data mart we put the data in representative form for the business needs.
But, our policy in building the ODS is to collect all the available sources and model it to be available when the business ask for it after the awareness by the DWH value.
So, we are as a DWH architect modelers say "It is a shame on us if the Business asked to get a piece of information and we are unable to support him from our ODS directly" with out telling us before what he wants!
What do you think?
Thanks in advance, I really appreciate your valuable discussion
Galileo
Great thanks for your reply,
You told us sir "If the business wants day-end balances for 3 years, it is what it is"
I'd like to let you be informed that We are Involving our business in Data mart modeling as in the the Data mart we put the data in representative form for the business needs.
But, our policy in building the ODS is to collect all the available sources and model it to be available when the business ask for it after the awareness by the DWH value.
So, we are as a DWH architect modelers say "It is a shame on us if the Business asked to get a piece of information and we are unable to support him from our ODS directly" with out telling us before what he wants!
What do you think?
Thanks in advance, I really appreciate your valuable discussion
Galileo
Galelio- Posts : 4
Join date : 2014-08-03
Re: Accounts subject area ODS daily loading disaster!!
There lies the path to failure.
"Build it and they will come" is a waste of your time and the company's money.
Work on one process at a time, implemented as a star schema, driven by business demand.
Use a data warehouse automation tool (there is a list here) to quickly prototype and modify your warehouse as it evolves to meet that demand.
Providing fast iteration value to business users is the key to data warehouse success.
Ron.
"Build it and they will come" is a waste of your time and the company's money.
Work on one process at a time, implemented as a star schema, driven by business demand.
Use a data warehouse automation tool (there is a list here) to quickly prototype and modify your warehouse as it evolves to meet that demand.
Providing fast iteration value to business users is the key to data warehouse success.
Ron.
Re: Accounts subject area ODS daily loading disaster!!
I basically agree with what everyone else has said. Couple of points:
1. You talk about an ODS - while this is a topic that everyone has an opinion on and is discussed endlessly, an ODS is not normally used to describe a dimensional model. An ODS is where you gather/hold all the source data that you then load into your DM (and you may well use it for other purposes). Therefore talking about SCD2s in the context of an ODS doesn't make a lot of sense
2. Storage: storage capacity, in itself, is so cheap these days that it is often considered to be effectively free: in the sense that others costs will be orders of magnitude greater so the storage costs are not a driving factor in anything. How you manage and access this storage is another issue and this is where you need to look at look at your architectural strategy for 'normal' DBs (such as Oracle, SQL Server, DB2), such as your partitioning strategy, or start looking at dedicated appliances such as Oracle Exadata, Neteeza, etc. (which are definitely not 'effectively free' !). If your ODS (not your DM) is really large, and I don't think yours is yet, then maybe look at a Big Data solution such as Hadoop.
And just to re-iterate what Ron Dunn said - and I think this is the most important point - everything should be driven by user requirements. Start with what you know and can achieve in a reasonable time scale and then build on it. One of the big benefits of a DM is that it is easily extensible without affecting what has already been built and/or the reports that are already running against it
1. You talk about an ODS - while this is a topic that everyone has an opinion on and is discussed endlessly, an ODS is not normally used to describe a dimensional model. An ODS is where you gather/hold all the source data that you then load into your DM (and you may well use it for other purposes). Therefore talking about SCD2s in the context of an ODS doesn't make a lot of sense
2. Storage: storage capacity, in itself, is so cheap these days that it is often considered to be effectively free: in the sense that others costs will be orders of magnitude greater so the storage costs are not a driving factor in anything. How you manage and access this storage is another issue and this is where you need to look at look at your architectural strategy for 'normal' DBs (such as Oracle, SQL Server, DB2), such as your partitioning strategy, or start looking at dedicated appliances such as Oracle Exadata, Neteeza, etc. (which are definitely not 'effectively free' !). If your ODS (not your DM) is really large, and I don't think yours is yet, then maybe look at a Big Data solution such as Hadoop.
And just to re-iterate what Ron Dunn said - and I think this is the most important point - everything should be driven by user requirements. Start with what you know and can achieve in a reasonable time scale and then build on it. One of the big benefits of a DM is that it is easily extensible without affecting what has already been built and/or the reports that are already running against it
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Accounts subject area ODS daily loading disaster!!
Dear All,
I'd like to thank you all,
Please confirm the loading type for my ACCOUNT_BALANCES table should it will be a snapshot or UPSERT method?
- SNAPSHOT, I mean everyday i will load all accounts in new records even it have no any changes in balances.
- UPSERT, I mean i will reload only the updated accounts balances records for the new day.
The following to examples to clear my question for to two accounts loaded in ODS in first day. Second day; one balance of them changed and the another one remain as it is.
So, the please advice me by the loading method??
SNAPSHOT METHOD
UPSERT METHOD
Great thanks in advance,
Galileo
I'd like to thank you all,
Please confirm the loading type for my ACCOUNT_BALANCES table should it will be a snapshot or UPSERT method?
- SNAPSHOT, I mean everyday i will load all accounts in new records even it have no any changes in balances.
- UPSERT, I mean i will reload only the updated accounts balances records for the new day.
The following to examples to clear my question for to two accounts loaded in ODS in first day. Second day; one balance of them changed and the another one remain as it is.
So, the please advice me by the loading method??
SNAPSHOT METHOD
"ACCOUNT_ID" " | BALANCE" " | DAY" |
4441 | 1500 | FIRST DAY |
4442 | 3600 | FIRST DAY |
4441 | 1500 | SECOND DAY |
4442 | 4000 | SECOND DAY |
"ACCOUNT_ID" " | BALANCE" " | DAY" |
4441 | 1500 | FIRST DAY |
4442 | 3600 | FIRST DAY |
4442 | 4000 | SECOND DAY |
Galileo
Galelio- Posts : 4
Join date : 2014-08-03
Re: Accounts subject area ODS daily loading disaster!!
I think we can separate two concepts - fact table type, and update method.
There are three types of fact table: Transactional, Periodic Snapshot and Accumulating Snapshot. In your case, it seems that a Periodic Snapshot is the likely solution.
Upsert is one of the methods by which you might update your fact and dimension tables.
It is rare to update fact tables, usually only in the case of Accumulating Snapshots. In Transactional and Periodic snapshots, it is more common that new data will be inserted to the table.
Assuming that you have an Account dimension, you might use Upsert/Merge logic to record changes in the account number, name, domicile branch, etc.
Driving your design from the requirements, rather than the data, would help you to see how this data should be structured. Having worked on quite a few Banking scenarios, I'm going to guess that typical requirements might be to analyse end-of-day balances by branch, or product. Be careful here, because you're going to run into the issue of semi-additive facts - because these are balances, which are point-in-time, you won't be able to accumulate/add in the time dimension.
There are three types of fact table: Transactional, Periodic Snapshot and Accumulating Snapshot. In your case, it seems that a Periodic Snapshot is the likely solution.
Upsert is one of the methods by which you might update your fact and dimension tables.
It is rare to update fact tables, usually only in the case of Accumulating Snapshots. In Transactional and Periodic snapshots, it is more common that new data will be inserted to the table.
Assuming that you have an Account dimension, you might use Upsert/Merge logic to record changes in the account number, name, domicile branch, etc.
Driving your design from the requirements, rather than the data, would help you to see how this data should be structured. Having worked on quite a few Banking scenarios, I'm going to guess that typical requirements might be to analyse end-of-day balances by branch, or product. Be careful here, because you're going to run into the issue of semi-additive facts - because these are balances, which are point-in-time, you won't be able to accumulate/add in the time dimension.
Similar topics
» Split Fact Tables by Subject Area or Granularity?
» A/R Accounts Receivable Star Schema
» customer and accounts
» Staging area.. is it a must?
» Parent and Child Accounts
» A/R Accounts Receivable Star Schema
» customer and accounts
» Staging area.. is it a must?
» Parent and Child Accounts
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum