Customer looking for sub second performance in a relational implementation.
4 posters
Page 1 of 1
Customer looking for sub second performance in a relational implementation.
Customer is looking for a sub second performance on a query against a fact that is close to 0.7 TB. The query is running in 10 secs. I built an aggregate table on high level of granularity and met the expectation because the size of the aggregate table was significantly smaller however the solution does not satisfy queries at item level. Would appreciate any suggestions. All the queries are for a supplier.
For Supplier: There are 4 fact tables, 1) is snapshot table bulit every month 2) Daily delta table 3) Aggregate Snap table at Geo level 4) Aggregate Delta table at Geo Level.
For Product: There are 2 fact tables, one is snapshot table bulit every month and the 2nd one is a daily delta table.
Given below is the data model:
DIM_PRODUCT
--------------
PRODUCT_ID
SUBJECT_ID
CATEGORY_ID
DEPT_ID
PRODUCT_NAME
AUTHOR_NAME
DIM_GEO
---------
MARKET_REGION_ID
MARKET_REGION_ID
GEO_ID
ZIP_CODE
DESCRITPION
RUN_ID
CHEKSUM32
DIM_ORG (STORE)
------------------
CHANNEL_ID (ONLINE, STORE, etc.)
ORG_ID (Store)
ZIP_CODE
HAS_SALES
HAS_INVENTORY
LAT
LONG
RUN_ID
CHECKSUM32
FACT_SUPPLIER_MONTHLY_SNAP
-------------------------------
SUPPLIER_ID
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
SI_MODEL_QTY
SI_RETURN_QTY
SI_OUT_OF_STOCK_QTY
DC_RESERVE_COM_OH_QTY
DC_RESERVE_RET_OH_QTY
DC_RESERVE_COL_OH_QTY
RUN_ID
FACT_SUPPLIER_DAILY_DELTA
------------------------------
SUPPLIER_ID
DATE_ID
PRODUCT_ID
GEO_ID
ORG_ID
FRONTLIST
SALES_QTY
FREE_QTY
SALES_AMT
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
SI_MODEL_QTY_DELTA
SI_RETURN_QTY_DELTA
SI_OUT_OF_STOCK_DELTA
DC_RESERVE_COM_OH_QTY_DELTA
DC_RESERVE_RET_OH_QTY_DELTA
DC_RESERVE_COL_OH_QTY_DELTA
RUN_ID
FACT_PROD_DAILY_STORE_INVENTORY
--------------------------------------
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
MODEL_QTY
RETURN_QTY
OUT_OF_STOCK_QTY
RUNID
FACT_PROD_DAILY_STORE_INVENTORY_DELTA
---------------------------------------------
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
MODEL_QTY_DELTA
RETURN_QTY_DELTA
OUT_OF_STOCK_QTY_DELTA
RUNID
create table FACT_SUPP_DELTA_AGGREGATE AS
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(ON_HAND_QTY) ON_HAND_QTY,
sum(ON_ORDER_QTY) ON_ORDER_QTY,
sum(AVAILABLE_QTY) AVAILABLE_QTY,
sum(SI_MODEL_QTY) SI_MODEL_QTY,
sum(SI_RETURN_QTY) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_inventory_snap a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where DATE_ID >= 20091201
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;
create table FACT_SUPP_DELTA_AGGREGATE
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(SALES_QTY) SALES_QTY,
sum(FREE_QTY) FREE_QTY,
sum(SALES_AMT) SALES_AMT,
sum(ON_HAND_QTY_DELTA) ON_HAND_QTY,
sum(ON_ORDER_QTY_DELTA) ON_ORDER_QTY,
sum(AVAILABLE_QTY_DELTA) AVAILABLE_QTY,
sum(SI_MODEL_QTY_DELTA) SI_MODEL_QTY,
sum(SI_RETURN_QTY_DELTA) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY_DELTA) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY_DELTA) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY_DELTA) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY_DELTA) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_salesinv_delta a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where (DATE_ID >= 20101127) or (date_id between 20091215 and 20100115)
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;
For Supplier: There are 4 fact tables, 1) is snapshot table bulit every month 2) Daily delta table 3) Aggregate Snap table at Geo level 4) Aggregate Delta table at Geo Level.
For Product: There are 2 fact tables, one is snapshot table bulit every month and the 2nd one is a daily delta table.
Given below is the data model:
DIM_PRODUCT
--------------
PRODUCT_ID
SUBJECT_ID
CATEGORY_ID
DEPT_ID
PRODUCT_NAME
AUTHOR_NAME
DIM_GEO
---------
MARKET_REGION_ID
MARKET_REGION_ID
GEO_ID
ZIP_CODE
DESCRITPION
RUN_ID
CHEKSUM32
DIM_ORG (STORE)
------------------
CHANNEL_ID (ONLINE, STORE, etc.)
ORG_ID (Store)
ZIP_CODE
HAS_SALES
HAS_INVENTORY
LAT
LONG
RUN_ID
CHECKSUM32
FACT_SUPPLIER_MONTHLY_SNAP
-------------------------------
SUPPLIER_ID
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
SI_MODEL_QTY
SI_RETURN_QTY
SI_OUT_OF_STOCK_QTY
DC_RESERVE_COM_OH_QTY
DC_RESERVE_RET_OH_QTY
DC_RESERVE_COL_OH_QTY
RUN_ID
FACT_SUPPLIER_DAILY_DELTA
------------------------------
SUPPLIER_ID
DATE_ID
PRODUCT_ID
GEO_ID
ORG_ID
FRONTLIST
SALES_QTY
FREE_QTY
SALES_AMT
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
SI_MODEL_QTY_DELTA
SI_RETURN_QTY_DELTA
SI_OUT_OF_STOCK_DELTA
DC_RESERVE_COM_OH_QTY_DELTA
DC_RESERVE_RET_OH_QTY_DELTA
DC_RESERVE_COL_OH_QTY_DELTA
RUN_ID
FACT_PROD_DAILY_STORE_INVENTORY
--------------------------------------
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
MODEL_QTY
RETURN_QTY
OUT_OF_STOCK_QTY
RUNID
FACT_PROD_DAILY_STORE_INVENTORY_DELTA
---------------------------------------------
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
MODEL_QTY_DELTA
RETURN_QTY_DELTA
OUT_OF_STOCK_QTY_DELTA
RUNID
create table FACT_SUPP_DELTA_AGGREGATE AS
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(ON_HAND_QTY) ON_HAND_QTY,
sum(ON_ORDER_QTY) ON_ORDER_QTY,
sum(AVAILABLE_QTY) AVAILABLE_QTY,
sum(SI_MODEL_QTY) SI_MODEL_QTY,
sum(SI_RETURN_QTY) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_inventory_snap a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where DATE_ID >= 20091201
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;
create table FACT_SUPP_DELTA_AGGREGATE
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(SALES_QTY) SALES_QTY,
sum(FREE_QTY) FREE_QTY,
sum(SALES_AMT) SALES_AMT,
sum(ON_HAND_QTY_DELTA) ON_HAND_QTY,
sum(ON_ORDER_QTY_DELTA) ON_ORDER_QTY,
sum(AVAILABLE_QTY_DELTA) AVAILABLE_QTY,
sum(SI_MODEL_QTY_DELTA) SI_MODEL_QTY,
sum(SI_RETURN_QTY_DELTA) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY_DELTA) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY_DELTA) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY_DELTA) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY_DELTA) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_salesinv_delta a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where (DATE_ID >= 20101127) or (date_id between 20091215 and 20100115)
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;
mahajas- Posts : 5
Join date : 2010-08-14
Re: Customer looking for sub second performance in a relational implementation.
Don't bang your head against the wall trying to do something the hardware probably can't do.
Figure out what it would take to move the data into a multidimensional data store (i.e. a cube) such a Analysis Services or Hyperion. Figure out the cost for hardware and your time, give them an estimate, then discover if 'sub-second response' is really that important.
Figure out what it would take to move the data into a multidimensional data store (i.e. a cube) such a Analysis Services or Hyperion. Figure out the cost for hardware and your time, give them an estimate, then discover if 'sub-second response' is really that important.
Re: Customer looking for sub second performance in a relational implementation.
Depends on the database software. One option might be an indexed view if using SQL Server 2008.
Jeff Smith- Posts : 471
Join date : 2009-02-03
MPP database
The client has purchased MPP database so it has the horsepower (the query that dynamically builds aggregates using windowing features and heavy computation runs in 7 secs) but the SQL MDX cubes are running slightly better (obviously because they are pre-aggregated data). Web app renders the page mush in case SQL server a little bit faster than against the MPP DB. I am trying to build cubes in MPP DB by doing a GORUP and the UNION ALL different combinations but it's taking a very long time to build these cubes (more than 2 days).
Thanks
Thanks
mahajas- Posts : 5
Join date : 2010-08-14
Re: Customer looking for sub second performance in a relational implementation.
I guess SSAS cubes should give you an instant result once processed. The issue is cube process takes too long and would go beyond the load window for nightly or any periodical ETL process. Have you considered partitioning on both fact tables and cubes so that you can carry out incremental processing after the initial time consuming build.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Customer looking for sub second performance in a relational implementation.
A multidimensional database will almost always peform better than a relational one, regardless of hardware. They just don't have the ability to store large volumes of detailed data.
Just curious as to why you have an ORDER BY in your CREATE TABLE statment. Ordering the table doesn't do anything for you (other than take time) unless you are using a Netezza box and are optimizing the zone map.
Just curious as to why you have an ORDER BY in your CREATE TABLE statment. Ordering the table doesn't do anything for you (other than take time) unless you are using a Netezza box and are optimizing the zone map.
Similar topics
» Merging customer data from disparate sources to create a master customer dimension
» De-normalizing Customer Information to create a Customer Dimension
» Customer Ship to Vs Customer Dimension
» Relational Datawarehouse
» SCD 2 implementation
» De-normalizing Customer Information to create a Customer Dimension
» Customer Ship to Vs Customer Dimension
» Relational Datawarehouse
» SCD 2 implementation
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|