Data Architecture for Single Source System (Normalised).
4 posters
Page 1 of 1
Data Architecture for Single Source System (Normalised).
Hi,
I am working on a project to implement a reporting platform, consisting of what should be a hosted data warehouse through a supplier/agency. They are also developing the data architecture whilst we are under project. When the reporting platform goes live, I will be maintaining, supporting, developing the platform, with a view to expand our in-house function into a team of BI developers.
The source system is accessible in the following ways based on the suppliers proposal for the three environments we have requested:
Sandbox - UAT application (operational) database is directly accessible.
QA/UAT - UAT application (operational) database is directly accessible.
Production - application database is pushed via replication into the DW and refreshed/updated every 15 mins.
The source system/database should be fully normalised.
The supplier/agency has replicated the source database into the production DW via replication service. They have then created a DW schema that appears to be a number of tables designed specifically for a suite of operational reports. So, this DW schema (by my understanding) skips any notion of staging tables, data quality system, and NDS/ODS, as the source system database is already normalised.
It also means that the DW schema has certain bits of information from the source system missing.
Note: the DW schema is not dimensional.
My question is what sort of data architecture is the best solution where there is only (for the moment) one source, and that source is fully normalised, which is pushed into the DW. Is it wise to disregard the typical staging > DQ > NDS/ODS > MDB model especially if 12 months from now, someone asks to report on data from a different source?
Regards,
M.
I am working on a project to implement a reporting platform, consisting of what should be a hosted data warehouse through a supplier/agency. They are also developing the data architecture whilst we are under project. When the reporting platform goes live, I will be maintaining, supporting, developing the platform, with a view to expand our in-house function into a team of BI developers.
The source system is accessible in the following ways based on the suppliers proposal for the three environments we have requested:
Sandbox - UAT application (operational) database is directly accessible.
QA/UAT - UAT application (operational) database is directly accessible.
Production - application database is pushed via replication into the DW and refreshed/updated every 15 mins.
The source system/database should be fully normalised.
The supplier/agency has replicated the source database into the production DW via replication service. They have then created a DW schema that appears to be a number of tables designed specifically for a suite of operational reports. So, this DW schema (by my understanding) skips any notion of staging tables, data quality system, and NDS/ODS, as the source system database is already normalised.
It also means that the DW schema has certain bits of information from the source system missing.
Note: the DW schema is not dimensional.
My question is what sort of data architecture is the best solution where there is only (for the moment) one source, and that source is fully normalised, which is pushed into the DW. Is it wise to disregard the typical staging > DQ > NDS/ODS > MDB model especially if 12 months from now, someone asks to report on data from a different source?
Regards,
M.
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
It's not entirely clear to me what is being built for you but it doesn't sound like anything that would normally be described as a data warehouse.
Given that you've posted to the Kimball forum you'll be unsurprised to be told that your reporting schema should be a dimensional model, designed using the Kimball methodology
I guess replicating your Prod source DB into your DW is one way of creating what are effectively staging tables and as long as the overall data volumes are low then I guess it would work. It would be more usual to only bring changes into your DW but the method you describe is not necessarily wrong.
Presumably there are then load routines that identify differences between the source tables and your reporting schema and load the deltas into your reporting tables? Do you slowly changing dimensions and does it cope with these?
Thinking about it further, is this meant to be a pre-canned reporting solution for a specific application? If it is, and it works, then I guess it's an OK solution - basically you're looking at a COTS solution and if it meets your requirements then you don't really care how it gets there.
However, if you are looking for this to be the starting point for a generic data warehouse that your company, rather than the agency, are going to develop in the future then this architecture looks pretty dodgy to me
Given that you've posted to the Kimball forum you'll be unsurprised to be told that your reporting schema should be a dimensional model, designed using the Kimball methodology
I guess replicating your Prod source DB into your DW is one way of creating what are effectively staging tables and as long as the overall data volumes are low then I guess it would work. It would be more usual to only bring changes into your DW but the method you describe is not necessarily wrong.
Presumably there are then load routines that identify differences between the source tables and your reporting schema and load the deltas into your reporting tables? Do you slowly changing dimensions and does it cope with these?
Thinking about it further, is this meant to be a pre-canned reporting solution for a specific application? If it is, and it works, then I guess it's an OK solution - basically you're looking at a COTS solution and if it meets your requirements then you don't really care how it gets there.
However, if you are looking for this to be the starting point for a generic data warehouse that your company, rather than the agency, are going to develop in the future then this architecture looks pretty dodgy to me
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Data Architecture for Single Source System (Normalised).
Hi Nick,
Thanks for your comments. I don't think the business knows, or is sure for that matter, in terms of what they are trying to implement. At a very high-level they are implementing a COTS solution for L&D. They have then asked about reporting and explained what their requirements are, and what they might be in the future. The supplier has then subsequently sold an 'analytics' product/service in addition, as the COTS solution on its own would not allow them to consume/build dashboards, etc. for example.
The supplier's track record with regards to the additional analytics product/service has been delivering it to clients where THEY develop, maintain, and support the solution, whilst the client simply consumes the report, dashboards, etc.
I would agree that it's not entirely clear what is being built. It's not what I would expect in terms of a data warehouse. The data warehouse schema that they've developed is not a dimensional one. It's more like a relational database. If it were me, without changing the current set-up too much, I would (as you have said) have opted for a dimensional data store only architecture with the load routines pulling (incrementally) from the replicated source database, as it is effectively the stage.
What they have developed, as far as a load-routine is concerned, is the 'whole table every time' approach, but I cannot be sure as they will not be providing the load scripts/packages until we approach the handover period. I can't say I've spotted any soft deletions in the data warehouse schema, for example, so I reckon they are dropping the entire table and reloading it. How that works when the development and QA environment are pulling from the operational UAT application database is a cause for concern, but we'll see. There's no evidence of slowly changing dimensions either.
The supplier was effectively briefed to re-engineer existing reports from our legacy system, as well as a suite of KPI/metric reports, and a dashboard. The business also wanted its own in-house function to develop reports, feeds, dashboards, etc. The supplier now (for one reason or another) is only committed to delivering the first deliverable. So, although not the original intention, when we go-live with this 'analytics' product/service, it will effectively be a pre-canned solution of sorts. Not sure if that last paragraph explains it, but there you go.
M.
Thanks for your comments. I don't think the business knows, or is sure for that matter, in terms of what they are trying to implement. At a very high-level they are implementing a COTS solution for L&D. They have then asked about reporting and explained what their requirements are, and what they might be in the future. The supplier has then subsequently sold an 'analytics' product/service in addition, as the COTS solution on its own would not allow them to consume/build dashboards, etc. for example.
The supplier's track record with regards to the additional analytics product/service has been delivering it to clients where THEY develop, maintain, and support the solution, whilst the client simply consumes the report, dashboards, etc.
I would agree that it's not entirely clear what is being built. It's not what I would expect in terms of a data warehouse. The data warehouse schema that they've developed is not a dimensional one. It's more like a relational database. If it were me, without changing the current set-up too much, I would (as you have said) have opted for a dimensional data store only architecture with the load routines pulling (incrementally) from the replicated source database, as it is effectively the stage.
What they have developed, as far as a load-routine is concerned, is the 'whole table every time' approach, but I cannot be sure as they will not be providing the load scripts/packages until we approach the handover period. I can't say I've spotted any soft deletions in the data warehouse schema, for example, so I reckon they are dropping the entire table and reloading it. How that works when the development and QA environment are pulling from the operational UAT application database is a cause for concern, but we'll see. There's no evidence of slowly changing dimensions either.
The supplier was effectively briefed to re-engineer existing reports from our legacy system, as well as a suite of KPI/metric reports, and a dashboard. The business also wanted its own in-house function to develop reports, feeds, dashboards, etc. The supplier now (for one reason or another) is only committed to delivering the first deliverable. So, although not the original intention, when we go-live with this 'analytics' product/service, it will effectively be a pre-canned solution of sorts. Not sure if that last paragraph explains it, but there you go.
M.
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
I guess as long as they deliver the "existing reports from our legacy system, as well as a suite of KPI/metric reports, and a dashboard" and they perform acceptably (especially in the future when presumably your data volumes will increase) then your supplier has met their commitments and how they have actually achieved it is immaterial.
Is the supplier going to continue to support and develop the data model? If they are then again I guess you'll be OK as if you have a reporting requirement then it's up to them to deliver the data to support the report you are building. My only concern would be costs - what they are designing sounds very inflexible so requesting what sounds like a small change may involve a lot of work. Again, as long as your contract with them keeps this sort of support/development cost to a reasonable level you should be fine.
However, if you are going to have to enhance the data model and build your new reports then I think your pretty much stuffed - any enhancements are going to be difficult, and costly, to implement compared to a proper data warehouse and, assuming the supplier provides a warranty for what they've delivered, as soon as you touch anything that warranty will be void and so you'll have no comeback if their reports stop working - even if you know that your changes haven't broken them.
Is the supplier going to continue to support and develop the data model? If they are then again I guess you'll be OK as if you have a reporting requirement then it's up to them to deliver the data to support the report you are building. My only concern would be costs - what they are designing sounds very inflexible so requesting what sounds like a small change may involve a lot of work. Again, as long as your contract with them keeps this sort of support/development cost to a reasonable level you should be fine.
However, if you are going to have to enhance the data model and build your new reports then I think your pretty much stuffed - any enhancements are going to be difficult, and costly, to implement compared to a proper data warehouse and, assuming the supplier provides a warranty for what they've delivered, as soon as you touch anything that warranty will be void and so you'll have no comeback if their reports stop working - even if you know that your changes haven't broken them.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Data Architecture for Single Source System (Normalised).
Sounds like they are delivering a reporting solution, not a data warehouse. Basically a replication of the operational database is just that, an exact copy. Most operational systems do not maintain historical state information, so one would not expect such information would be available. This may or may not matter to your business.
Their reporting solution is essentially a metadata layer over the operational DB to make it more user friendly. This will work within the scope of the data available.
The future challenge will come when you try to integrate other data from other systems. The purpose of a separate data warehouse model (normalized or dimensional) and the ETL processes is to address data integration and historical stability.
Their reporting solution is essentially a metadata layer over the operational DB to make it more user friendly. This will work within the scope of the data available.
The future challenge will come when you try to integrate other data from other systems. The purpose of a separate data warehouse model (normalized or dimensional) and the ETL processes is to address data integration and historical stability.
Re: Data Architecture for Single Source System (Normalised).
I guess as long as they deliver the "existing reports from our legacy system, as well as a suite of KPI/metric reports, and a dashboard" and they perform acceptably (especially in the future when presumably your data volumes will increase) then your supplier has met their commitments and how they have actually achieved it is immaterial. wrote:
They've reneged on this commitment and are only delivering the existing suite of reports from the legacy system, all of which are tabular reports.
Is the supplier going to continue to support and develop the data model? If they are then again I guess you'll be OK as if you have a reporting requirement then it's up to them to deliver the data to support the report you are building. wrote:
Not really. It effectively lands in my lap when this whole thing goes live and becomes operational.
My only concern would be costs - what they are designing sounds very inflexible so requesting what sounds like a small change may involve a lot of work. Again, as long as your contract with them keeps this sort of support/development cost to a reasonable level you should be fine. wrote:
We can ask them to carry out this work whether it be support and development, but there would be a charge incurred under professional services. Support and maintenance would only cover hosting and management of the solution. So, what they are developing effectively gets handed over at the end of the project.
However, if you are going to have to enhance the data model and build your new reports then I think your pretty much stuffed - any enhancements are going to be difficult, and costly, to implement compared to a proper data warehouse and, assuming the supplier provides a warranty for what they've delivered, as soon as you touch anything that warranty will be void and so you'll have no comeback if their reports stop working - even if you know that your changes haven't broken them. wrote:
In most cases I would be developing any new reports, which means that changes would need to be made to the existing DW schema, or another one created for anything that didn't have anything to do with the re-engineered legacy reports.
I'd be interested to know what the ideal implementation/data architecture would be given what I will be left with come go-live, and what you might do with the existing DW schema?
- Source > Multi-Dimensional/Database Tables
- Source > Operational/Normalised Data Store > Multi-Dimensional/Database Tables
- Source > Staging Tables > Operational/Normalised Data Store > Multi-Dimensional/Database Tables
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
ngalemmo wrote:Sounds like they are delivering a reporting solution, not a data warehouse. Basically a replication of the operational database is just that, an exact copy. Most operational systems do not maintain historical state information, so one would not expect such information would be available. This may or may not matter to your business.
Their reporting solution is essentially a metadata layer over the operational DB to make it more user friendly. This will work within the scope of the data available.
The future challenge will come when you try to integrate other data from other systems. The purpose of a separate data warehouse model (normalized or dimensional) and the ETL processes is to address data integration and historical stability.
They probably are in a fashion, but it's not what their literature says with regards to their analytics product. They call it a business intelligence solution, and they also talk about integrating other data sources, even external data marts. So, whilst they haven't outright said 'data warehouse', their literature for the product suggests that it is something more than a reporting solution.
I heard there was an interesting conversation with the supplier and one of our BA's regarding 'point in time', which points to capturing historical data in the DW and SCD. I don't think they ever got an answer or got the bottom of why the couldn't do it, or weren't prepared to talk about it. Conversely, there is a request in our product backlog for an operational report to capture user/cost centre changes over a period of time. So, that would require building historical tables for users.
Anyway, whatever is in the product backlog I will be developing anyway. It's just trying to figure out what to do with the existing DW schema.
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
I suggest that you need to get the supplier to properly document the architecture and the details of what it is they are delivering and you should also ask them to provide example approaches of how you should enhance this solution once they have handed it over to you given the scenarios you can envisage e.g. provide historic reporting on existing data sets, integrate new data sources, etc.
Once you've got this information you'll then be in a position to make an informed decision whether enhancing what they are delivering is a viable solution or whether you're actually better off building a "proper" data ware house from scratch.
Once you've got this information you'll then be in a position to make an informed decision whether enhancing what they are delivering is a viable solution or whether you're actually better off building a "proper" data ware house from scratch.
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Data Architecture for Single Source System (Normalised).
So, this has been a bit of a struggle. In essence what they have developed is the bare minimum of what might constitute a DW:
Source ---> ETL ---> Set of de-normalised tables - not fact or dimension tables, just de-normalised tables.
This is all hard coded T-SQL views and stored procs.
Trying to get the system architecture diagrams has been a game of cat and mouse, and they keep saying that we don't need to see these as it is SaaS and comes under the SLA.
We have no idea whether the solution is load balanced and part of a failover cluster.
The solution is intended to be a global solution, so pretty much available 24/7. Our view is that it is Platform as a service as we have our own in-house function, so there will be some, if not a lot, of in-house development.
Additionally we're running into issues with standard run-of-the-mill aspects of SQL Server development, such as being able to access estimated and actual execution plans for our ETL scripts from source, not being able to partition tables, etc. We've also discovered that the initial ETL process is a full bulk load, every single time. One of the tables in the source system contains 7.5 million records, and will grow by 1.5 million records per year. I can't understand why this is not being incrementally loaded.
So, we still have a lot of questions remaining unanswered.
Source ---> ETL ---> Set of de-normalised tables - not fact or dimension tables, just de-normalised tables.
This is all hard coded T-SQL views and stored procs.
Trying to get the system architecture diagrams has been a game of cat and mouse, and they keep saying that we don't need to see these as it is SaaS and comes under the SLA.
We have no idea whether the solution is load balanced and part of a failover cluster.
The solution is intended to be a global solution, so pretty much available 24/7. Our view is that it is Platform as a service as we have our own in-house function, so there will be some, if not a lot, of in-house development.
Additionally we're running into issues with standard run-of-the-mill aspects of SQL Server development, such as being able to access estimated and actual execution plans for our ETL scripts from source, not being able to partition tables, etc. We've also discovered that the initial ETL process is a full bulk load, every single time. One of the tables in the source system contains 7.5 million records, and will grow by 1.5 million records per year. I can't understand why this is not being incrementally loaded.
So, we still have a lot of questions remaining unanswered.
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
"Source ---> ETL ---> Set of de-normalised tables - not fact or dimension tables, just de-normalised tables"
... I've seen sh*t like that before. Basically "ETL" is a report they ran against the operational system that happens to write to a table (their "DW" to use the term loosely).
Something like this is not a data warehouse. It is not an integrated analytic environment. It is simply a collection of reports in database form.
... I've seen sh*t like that before. Basically "ETL" is a report they ran against the operational system that happens to write to a table (their "DW" to use the term loosely).
Something like this is not a data warehouse. It is not an integrated analytic environment. It is simply a collection of reports in database form.
Re: Data Architecture for Single Source System (Normalised).
The reports utilise stored procs to get the data out from the DW schema. However, the stored procs which contain views do have table joins - users and medical history, as a generic example. So, maybe I misunderstood but to be clear the tables aren't versions of the report in table/database form as such.
We've also been told that for our 'dedicated SQL server instance' if we want to create any additional databases we will have to pay 5k for the privilege, and there is a set quota on number of SQL server agent jobs as well. So, our access to the SQL server and the levers that we can pull are somewhat limited as per previous message.
We've also been told that for our 'dedicated SQL server instance' if we want to create any additional databases we will have to pay 5k for the privilege, and there is a set quota on number of SQL server agent jobs as well. So, our access to the SQL server and the levers that we can pull are somewhat limited as per previous message.
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
I guess the vendor forgot about the "service" in SaaS. Sounds like there's still time to find another job before this career ruining freight train lands on your desk.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Data Architecture for Single Source System (Normalised).
I guess I misunderstood what the 'source' is that you are referring to. From previous comments I had assumed it to be the operational system. What is the DW Schema you are referring to?
Re: Data Architecture for Single Source System (Normalised).
Possibly Not. In essence a replicated copy of the OLTP database is pushed onto the SQL Server instance, this is then in effect the staging area. So, the source is the operational system. However, we have plans in the future to integrate multiple data sources. A number of scheduled stored procedures are run to pull/load the data from the OLTP database into a database (called DW Schema) containing a set of de-normalised tables. These tables are not modelled dimensionally, they are simply de-normalised tables, so no fact or dimension tables as before
MarkW- Posts : 14
Join date : 2015-11-25
Re: Data Architecture for Single Source System (Normalised).
Then I reiterate my previous comments. It is not an architecture that, as a company, you can move forward with. It is not sustainable.
Similar topics
» multi ERP source system, single DW load strategy
» Is it a best practice that Data warehouse follows the source system data type?
» Extracting data when there is no timestamp in the source system
» Can Data Warehouse be used as source for operational system
» Poorly sturctured data at source system
» Is it a best practice that Data warehouse follows the source system data type?
» Extracting data when there is no timestamp in the source system
» Can Data Warehouse be used as source for operational system
» Poorly sturctured data at source system
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|