How to best organize my weird datamart
2 posters
Page 1 of 1
How to best organize my weird datamart
Hi all,
This is my first post here, and I could really use some help.
Here's my situation.
I'm building a datamart in SQL Server 2005, which will furnish a single report (for now). We have 26 clients (medical organizations) that will be using this report. Each client will have between 1 and 2,000 users accessing it at any given time (most likely to average in the 50's but it needs to scale). There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.
The interface for said report is SSRS 2012. This is our first 2012 report - the rest are still 2005, and the rdl will access a 2005 database. I am not using OLAP; the SSRS report runs SPs and views.
We built a virtual server and installed SQL 2012 to host the report (both these virtual servers live on the same physical machine, in case that matters). Nothing else will be running on the SQL 2012 virtual server.
These are the facts about the environment...
Our system is not OLTP. With one exception, which I'll describe below, it's all Read (except for the ETL of course).
We have one client-facing database per client, 26 altogether. In these we store transactional data, rolled-up data, some report-ready flat tables and a ton of T-SQL code that crunches numbers when clients pull up reports in SSRS. I'll refer to these databases as "operational" because, for me, they will essentially function as ODS's.
The operational databases are loaded by a horrific ETL process (clients have varying load schedules - usually monthly or weekly). I'll be building a mini ETL process (hopefully not so horrific) to populate the datamart from those operational databases.
All of the datamart's dimensions are conforming, but due to HIPAA restrictions, some of them (like Physician and Patient) can't be stored in a central database, and neither can the fact table. So there will need to be 26 versions of the same fact and dimension tables, one for each client.
There is a real-time component to our system. Physicians and nurses can enter transactional data via our site and all reports need to reflect the changes immediately. As for the datamart, they'll only affect the fact table. For what it's worth, this is why I decided not to use SSAS. I know diffs process really quickly but it just feels like too many moving parts.
I plan to create a trickle-in fact table with a view that combines it with the main fact table. Again, I'll need 26 of these. A pared down version of my new ETL process will need to run upon each user edit.
Here are my questions...
1. Where should I store the 26 sets of datamart tables?
2. Where should I store the 26 trickle-in fact tables?
3. Are there any other important questions that I should be asking?
With all of these questions, I’m concerned about good design practice, but mostly about the performance of the report and the performance of the ETL that will need to run upon user edit.
I hope all of this made sense. Please let me know if I should provide more information.
I would very much appreciate any feedback, and thank you for taking the time to read this!
This is my first post here, and I could really use some help.
Here's my situation.
I'm building a datamart in SQL Server 2005, which will furnish a single report (for now). We have 26 clients (medical organizations) that will be using this report. Each client will have between 1 and 2,000 users accessing it at any given time (most likely to average in the 50's but it needs to scale). There will be 26 versions of the same rdl, one for each client. Each one will access its own respective client database.
The interface for said report is SSRS 2012. This is our first 2012 report - the rest are still 2005, and the rdl will access a 2005 database. I am not using OLAP; the SSRS report runs SPs and views.
We built a virtual server and installed SQL 2012 to host the report (both these virtual servers live on the same physical machine, in case that matters). Nothing else will be running on the SQL 2012 virtual server.
These are the facts about the environment...
Our system is not OLTP. With one exception, which I'll describe below, it's all Read (except for the ETL of course).
We have one client-facing database per client, 26 altogether. In these we store transactional data, rolled-up data, some report-ready flat tables and a ton of T-SQL code that crunches numbers when clients pull up reports in SSRS. I'll refer to these databases as "operational" because, for me, they will essentially function as ODS's.
The operational databases are loaded by a horrific ETL process (clients have varying load schedules - usually monthly or weekly). I'll be building a mini ETL process (hopefully not so horrific) to populate the datamart from those operational databases.
All of the datamart's dimensions are conforming, but due to HIPAA restrictions, some of them (like Physician and Patient) can't be stored in a central database, and neither can the fact table. So there will need to be 26 versions of the same fact and dimension tables, one for each client.
There is a real-time component to our system. Physicians and nurses can enter transactional data via our site and all reports need to reflect the changes immediately. As for the datamart, they'll only affect the fact table. For what it's worth, this is why I decided not to use SSAS. I know diffs process really quickly but it just feels like too many moving parts.
I plan to create a trickle-in fact table with a view that combines it with the main fact table. Again, I'll need 26 of these. A pared down version of my new ETL process will need to run upon each user edit.
Here are my questions...
1. Where should I store the 26 sets of datamart tables?
- On a dedicated 2005 server, away from the SSRS server and the operational databases?
- On the same server as the operational databases but in dedicated dds databases?
- Within the operational databases themselves?
- On the SQL 2012 reporting server?
- On the moon? Other?
2. Where should I store the 26 trickle-in fact tables?
- In the same database as the main fact tables?
- On the same server as the operational databases but in dedicated DDS databases?
- Within the operational databases themselves? This seems logical to me given that they'll need to be unioned at runtime...
- Should I create a central database for non-sensitive dimensions?
- Maybe create a cloning process to copy them to the individual DDS's?
- Or is it simpler to just have 26 of these darned things as well?
3. Are there any other important questions that I should be asking?
With all of these questions, I’m concerned about good design practice, but mostly about the performance of the report and the performance of the ETL that will need to run upon user edit.
I hope all of this made sense. Please let me know if I should provide more information.
I would very much appreciate any feedback, and thank you for taking the time to read this!
anna.rwfh- Posts : 6
Join date : 2013-05-15
Re: How to best organize my weird datamart
You don't want anything related to the BI side on the operational servers. Wither you place the marts on the same server as the fact tables depends on performance. Using a dedicated server gives you some scaling options. The trickle feed should go to the same server as the facts you are loading. You can consider the Moon as a disaster recovery site.
Given the sensitivity of the data, I would look at the 26 instances as clones each with their own databases and data streams.
Given the sensitivity of the data, I would look at the 26 instances as clones each with their own databases and data streams.
Re: How to best organize my weird datamart
Thank you for your insight, ngalemmo!
EDIT:
Actually, your response made me think a little deeper. In a normal situation I would never consider putting BI related tables on an operational server. But in this situation, the datamart will serve the same purpose as many of the tables in the operational databases: they all supply data to our external clients via Reporting Services reports that are pulled up from our online application. Also, this datamart will be loaded directly from the operational databases (maybe it would be better to call them "reporting" databases). There is no intermediary at this time. But given that user edits do happen in the operational databases, I'm a bit concerned about the performance hit when creating a new fact record. Each time an edit happens, a stored procedure will need to run that hits tables in one of the "reporting" databases. This means cross server read/write. Does this information change your opinion at all?
EDIT:
Actually, your response made me think a little deeper. In a normal situation I would never consider putting BI related tables on an operational server. But in this situation, the datamart will serve the same purpose as many of the tables in the operational databases: they all supply data to our external clients via Reporting Services reports that are pulled up from our online application. Also, this datamart will be loaded directly from the operational databases (maybe it would be better to call them "reporting" databases). There is no intermediary at this time. But given that user edits do happen in the operational databases, I'm a bit concerned about the performance hit when creating a new fact record. Each time an edit happens, a stored procedure will need to run that hits tables in one of the "reporting" databases. This means cross server read/write. Does this information change your opinion at all?
Last edited by anna.rwfh on Wed May 15, 2013 6:37 pm; edited 1 time in total (Reason for editing : Had some additional ideas.)
anna.rwfh- Posts : 6
Join date : 2013-05-15
Re: How to best organize my weird datamart
My original reading of "operational" database was the one supporting the execution of business activity (i.e. POS, accounting, inventory, whatever). But your comment appears to contradict my understanding. If by "operational" you mean the system that supports reporting then, there is no reason why the marts could not coexist on the same server.anna.rwfh wrote:... the operational databases (maybe it would be better to call them "reporting" databases). ...
As far as update contention, it could be an issue. It's difficult to say without a better understanding of what the real time interface actually does.
Re: How to best organize my weird datamart
Yeah, that was lousy wording on my part. The reason I called them "operational" is because they're the closest thing we have. We have no internal systems to report against and the product we provide to our clients is the specialized reporting of their own data. So we have ETL servers to load our client's source files and a PROD server where we store the results of the ETL. When clients log into our app, they are connecting to their database on said PROD server. They view their data, run reports off their data, perform limited adds/edits on their data, and that's about it. So yeah, they're probably just reporting databases, not operational.ngalemmo wrote:My original reading of "operational" database was the one supporting the execution of business activity (i.e. POS, accounting, inventory, whatever).
But those reporting databases are a NIGHTMARE to maintain and have huge performance issues. So I'm doing a proof-of-concept project. I'm creating a star schema and a new reporting interface for their most important report, which is also the report our client's complain about most often. As it is, the report actually runs a stored procedure that gathers its results by compiling transactional data at run-time; and the transactional data rarely changes. It's craziness. So I'm hoping this star will really help.
I'm really glad to hear you say that. I was starting to get a sour stomach thinking about the logistics involved with cloning the transactional tables.ngalemmo wrote:If by "operational" you mean the system that supports reporting then, there is no reason why the marts could not coexist on the same server.
Understood.ngalemmo wrote:As far as update contention, it could be an issue. It's difficult to say without a better understanding of what the real time interface actually does.
Thanks so much!
anna.rwfh- Posts : 6
Join date : 2013-05-15
Similar topics
» Multiple Datamart Architecture
» Do we need an additional layer on top of Datamart?
» How to make reports against a Datamart
» Payer datamart question
» Loading DataMart Current and History
» Do we need an additional layer on top of Datamart?
» How to make reports against a Datamart
» Payer datamart question
» Loading DataMart Current and History
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum