Some basic knowledge about data warehouse (confused )
3 posters
Page 1 of 1
Some basic knowledge about data warehouse (confused )
Hi all,
My problems:
1/ Illustration:
+ Data Source --> NDS --> DDS
--> Fact & dimensions contain data ??? Or just in NDS.
2/ Logical in DDS - ex Star diagram.
Physical - tables (related)
--> If I using mdx (pivot excel ...) to calculated measures in fact , the process will start from logical (to know structure of dims and fact) to physical (mapping logical --> physical). Is it true ?
3/ I can use data in NDS to calculated (ex: pivot excel ...) instead of creating DDS . Is it ok ?
4/ In 1 tables (dimension), we can > 1 hierarchy ?
Some problems, I can not clearly describe. Sry abt that.
Thanks for helping me !
Regards,
Luân.
My problems:
1/ Illustration:
+ Data Source --> NDS --> DDS
--> Fact & dimensions contain data ??? Or just in NDS.
2/ Logical in DDS - ex Star diagram.
Physical - tables (related)
--> If I using mdx (pivot excel ...) to calculated measures in fact , the process will start from logical (to know structure of dims and fact) to physical (mapping logical --> physical). Is it true ?
3/ I can use data in NDS to calculated (ex: pivot excel ...) instead of creating DDS . Is it ok ?
4/ In 1 tables (dimension), we can > 1 hierarchy ?
Some problems, I can not clearly describe. Sry abt that.
Thanks for helping me !
Regards,
Luân.
komaty- Posts : 7
Join date : 2013-01-11
Re: Some basic knowledge about data warehouse (confused )
Sorry, I'm not sure what your question is. What is DDS? What is NDS?
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Some basic knowledge about data warehouse (confused )
@BoxesAndLines: I think we can use data in NDS to calculate (ex: we have a star diagram but the dim tables is empty & when calculating we get data from NDS).
komaty- Posts : 7
Join date : 2013-01-11
Re: Some basic knowledge about data warehouse (confused )
If you mean Normalized Data Store and Dimensional Data Store, then you are basically describing an Inmon architecture. In a Kimball architecture, all data is stored in a dimensional model (usually on a relational platform). You can achieve the same ends with either architecture, going directly to a dimensional model reduces the cost and complexity of the implementation.
The Inmon architecture (as you describe in #1) is a store and publish architecture. The normalized data warehouse serves as a means to store data and is not intended for direct end-user queries. It is expected that data would be published from the data warehouse to another physical structure for end-user consumption. The physical structure could be a dimensional model, it could also be some other form, such as pre-generated reports, or data extracts for use in another tool. The basic reason behind this is a normalized data model is not well suited for large ad-hoc queries and heavy analytics, so you publish to distance the end-users from the complexities of the normalized warehouse. It could also be accomplished by creating views, however, query performance is usually much better if the contents of the view is materialized, rather than repeatedly querying the normalized repository.
I don't understand the question in #2. Queries execute against physical models. But, if you are talking about a meta-layer that is part of the end user interface (tools such as Business Objects, MicroStrategy, Cognos, etc...), the, yes, it is common to have logical models that provide a business interpretation of the data, isolating the end-user from the physical implementation. An, yes, you can hide a lot of 'sins' on the DW side. What you do against what is a matter of your environment and the performance of the DBMS. It is not usual to move data to a MDDBMS (cube database, such as SSAS, EssBase, PowerPlay, etc...) to provide high performance and functionality to the end user. An enterprise wide implementation would use a lot of different strategies to get information to the end-user. There usually isn't one solution that suits all challenges.
On #3, if you have a normalized DW, where and how you do queries is a matter of performance and usability. On Netezza and many other platforms, a star schema can perform better than a normalized schema, particularly with large analytic queries. Doing queries to look at a handful of lines on a specific order, makes a normalized model look good, but it will still perform much poorer on Netezza rather than a typical relational database. This is intentional in Netezza's architecture. Which is why it deals so well with extremely large amounts of data.
On #4 - Its very common to have many different hierarchies for the same set of data.
The Inmon architecture (as you describe in #1) is a store and publish architecture. The normalized data warehouse serves as a means to store data and is not intended for direct end-user queries. It is expected that data would be published from the data warehouse to another physical structure for end-user consumption. The physical structure could be a dimensional model, it could also be some other form, such as pre-generated reports, or data extracts for use in another tool. The basic reason behind this is a normalized data model is not well suited for large ad-hoc queries and heavy analytics, so you publish to distance the end-users from the complexities of the normalized warehouse. It could also be accomplished by creating views, however, query performance is usually much better if the contents of the view is materialized, rather than repeatedly querying the normalized repository.
I don't understand the question in #2. Queries execute against physical models. But, if you are talking about a meta-layer that is part of the end user interface (tools such as Business Objects, MicroStrategy, Cognos, etc...), the, yes, it is common to have logical models that provide a business interpretation of the data, isolating the end-user from the physical implementation. An, yes, you can hide a lot of 'sins' on the DW side. What you do against what is a matter of your environment and the performance of the DBMS. It is not usual to move data to a MDDBMS (cube database, such as SSAS, EssBase, PowerPlay, etc...) to provide high performance and functionality to the end user. An enterprise wide implementation would use a lot of different strategies to get information to the end-user. There usually isn't one solution that suits all challenges.
On #3, if you have a normalized DW, where and how you do queries is a matter of performance and usability. On Netezza and many other platforms, a star schema can perform better than a normalized schema, particularly with large analytic queries. Doing queries to look at a handful of lines on a specific order, makes a normalized model look good, but it will still perform much poorer on Netezza rather than a typical relational database. This is intentional in Netezza's architecture. Which is why it deals so well with extremely large amounts of data.
On #4 - Its very common to have many different hierarchies for the same set of data.
Re: Some basic knowledge about data warehouse (confused )
@ngalemmo: thank you very much.
In #2 :
Logical - I have a star diagram include: Dim1, Dim2, Dim3 and Fact_1 .
Physical - I have dim tables mapping from Dim1, Dim2, Dim3.
When I query (using MDX), the process will start from logical or physical.
In your answer, I understand about problem.
In #2 :
Logical - I have a star diagram include: Dim1, Dim2, Dim3 and Fact_1 .
Physical - I have dim tables mapping from Dim1, Dim2, Dim3.
When I query (using MDX), the process will start from logical or physical.
In your answer, I understand about problem.
komaty- Posts : 7
Join date : 2013-01-11
Re: Some basic knowledge about data warehouse (confused )
Any query (MDX or SQL) is executed against the physical model. Cubes (MDDBMS) are queried using MDX, while relational databases are queried using SQL. Data only exists in the physical, so the query must execute against the physical model. Thats where it ends.
If the query definition starts at a logical level (via a metadata layer and UI of some sort) it needs to be translated into a proper physical query (or queries) for the physical database.
If the query definition starts at a logical level (via a metadata layer and UI of some sort) it needs to be translated into a proper physical query (or queries) for the physical database.
Re: Some basic knowledge about data warehouse (confused )
Ah, I understand what u say.
Fantastic !
Thanks again !
Fantastic !
Thanks again !
komaty- Posts : 7
Join date : 2013-01-11
Re: Some basic knowledge about data warehouse (confused )
@ngalemmo: hi.
Can I ask you some questions ? I can not understand this.
Ex:
Dimension_Customer: ID, code_id, name, gender.
Dimension_Time : ID, Value_Year.
Fact_XXXX: dim_key_1, dim_key_2, measure: list customers' gender (male / female) every Year.
I create 3 correlative tables in physical & ETL data from source to here (of course via staging).
But I don understand how it can group: every year --> count customers' gender. Cuz in here, in 2 dimension dont have relation .
So, In my report can not have: (dragging to report).
2011 --> 10 cus (male) - 15 cus (female)
2012 --> 16 cus (male) - 18 cus (female)
Plz explaining to me.
Thanks !
Can I ask you some questions ? I can not understand this.
Ex:
Dimension_Customer: ID, code_id, name, gender.
Dimension_Time : ID, Value_Year.
Fact_XXXX: dim_key_1, dim_key_2, measure: list customers' gender (male / female) every Year.
I create 3 correlative tables in physical & ETL data from source to here (of course via staging).
But I don understand how it can group: every year --> count customers' gender. Cuz in here, in 2 dimension dont have relation .
So, In my report can not have: (dragging to report).
2011 --> 10 cus (male) - 15 cus (female)
2012 --> 16 cus (male) - 18 cus (female)
Plz explaining to me.
Thanks !
komaty- Posts : 7
Join date : 2013-01-11
Re: Some basic knowledge about data warehouse (confused )
Here my example
Dim_Location
ID COUNTRY CITY ....
1 USA A
2 USA B
3 USA C
Fact_Sale
dim_local_key measure_A
1 100
......
As we see in the fact , DIM_KEY is only 1 (data type: number) . So I think It just save dim_keys (not value: USA, A ...). Is it right ?
May be my knowledge is not good, i hope s.o can instruct me several books to read.
Thanks.
Dim_Location
ID COUNTRY CITY ....
1 USA A
2 USA B
3 USA C
Fact_Sale
dim_local_key measure_A
1 100
......
As we see in the fact , DIM_KEY is only 1 (data type: number) . So I think It just save dim_keys (not value: USA, A ...). Is it right ?
May be my knowledge is not good, i hope s.o can instruct me several books to read.
Thanks.
komaty- Posts : 7
Join date : 2013-01-11
Similar topics
» Some very basic question regarding overall ETL and Modeling strategy/best practices for Data warehouse project
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
» Ebook The Data Warehouse Lifecycle Toolkit, 2nd Edition: Practical Techniques for Building Data Warehouse and Business Intelligence Systems
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» data warehouse and data warehouse system
» difference between data mart and data warehouse at logical/physical level
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum