Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Customer looking for sub second performance in a relational implementation.

4 posters

Go down

Customer looking for sub second performance in a relational implementation. Empty Customer looking for sub second performance in a relational implementation.

Post  mahajas Sun Jan 16, 2011 8:42 pm

Customer is looking for a sub second performance on a query against a fact that is close to 0.7 TB. The query is running in 10 secs. I built an aggregate table on high level of granularity and met the expectation because the size of the aggregate table was significantly smaller however the solution does not satisfy queries at item level. Would appreciate any suggestions. All the queries are for a supplier.

For Supplier: There are 4 fact tables, 1) is snapshot table bulit every month 2) Daily delta table 3) Aggregate Snap table at Geo level 4) Aggregate Delta table at Geo Level.

For Product: There are 2 fact tables, one is snapshot table bulit every month and the 2nd one is a daily delta table.


Given below is the data model:

DIM_PRODUCT
--------------

PRODUCT_ID
SUBJECT_ID
CATEGORY_ID
DEPT_ID
PRODUCT_NAME
AUTHOR_NAME

DIM_GEO
---------

MARKET_REGION_ID
MARKET_REGION_ID
GEO_ID
ZIP_CODE
DESCRITPION
RUN_ID
CHEKSUM32

DIM_ORG (STORE)
------------------

CHANNEL_ID (ONLINE, STORE, etc.)
ORG_ID (Store)
ZIP_CODE
HAS_SALES
HAS_INVENTORY
LAT
LONG
RUN_ID
CHECKSUM32

FACT_SUPPLIER_MONTHLY_SNAP
-------------------------------

SUPPLIER_ID
DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
SI_MODEL_QTY
SI_RETURN_QTY
SI_OUT_OF_STOCK_QTY
DC_RESERVE_COM_OH_QTY
DC_RESERVE_RET_OH_QTY
DC_RESERVE_COL_OH_QTY
RUN_ID

FACT_SUPPLIER_DAILY_DELTA
------------------------------

SUPPLIER_ID
DATE_ID
PRODUCT_ID
GEO_ID
ORG_ID
FRONTLIST
SALES_QTY
FREE_QTY
SALES_AMT
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
SI_MODEL_QTY_DELTA
SI_RETURN_QTY_DELTA
SI_OUT_OF_STOCK_DELTA
DC_RESERVE_COM_OH_QTY_DELTA
DC_RESERVE_RET_OH_QTY_DELTA
DC_RESERVE_COL_OH_QTY_DELTA
RUN_ID

FACT_PROD_DAILY_STORE_INVENTORY
--------------------------------------

DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY
ON_ORDER_QTY
AVL_QTY
MODEL_QTY
RETURN_QTY
OUT_OF_STOCK_QTY
RUNID

FACT_PROD_DAILY_STORE_INVENTORY_DELTA
---------------------------------------------

DATE_ID
PRODUCT_ID
ORG_ID
GEO_ID
FRONTLIST
ON_HAND_QTY_DELTA
ON_ORDER_QTY_DELTA
AVL_QTY_DELTA
MODEL_QTY_DELTA
RETURN_QTY_DELTA
OUT_OF_STOCK_QTY_DELTA
RUNID

create table FACT_SUPP_DELTA_AGGREGATE AS
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(ON_HAND_QTY) ON_HAND_QTY,
sum(ON_ORDER_QTY) ON_ORDER_QTY,
sum(AVAILABLE_QTY) AVAILABLE_QTY,
sum(SI_MODEL_QTY) SI_MODEL_QTY,
sum(SI_RETURN_QTY) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_inventory_snap a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where DATE_ID >= 20091201
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;


create table FACT_SUPP_DELTA_AGGREGATE
select SUPPLIER_ID,
DATE_ID,
PRODUCT_GROUP_ID,
CATEGORY_ID,
SUBJECT_ID,
CHANNEL_ID,
GEO_ID,
FRONTLIST,
sum(SALES_QTY) SALES_QTY,
sum(FREE_QTY) FREE_QTY,
sum(SALES_AMT) SALES_AMT,
sum(ON_HAND_QTY_DELTA) ON_HAND_QTY,
sum(ON_ORDER_QTY_DELTA) ON_ORDER_QTY,
sum(AVAILABLE_QTY_DELTA) AVAILABLE_QTY,
sum(SI_MODEL_QTY_DELTA) SI_MODEL_QTY,
sum(SI_RETURN_QTY_DELTA) SI_RETURN_QTY,
sum(SI_OUT_OF_STOCK_QTY_DELTA) SI_OUT_OF_STOCK_QTY,
sum(DC_RESERVE_COM_OH_QTY_DELTA) DC_RESERVE_COM_OH_QTY,
sum(DC_RESERVE_RET_OH_QTY_DELTA) DC_RESERVE_RET_OH_QTY,
sum(DC_RESERVE_COL_OH_QTY_DELTA) DC_RESERVE_COL_OH_QTY
from fact_supp_daily_salesinv_delta a
join dim_product p on a.product_id = p.product_id
join dim_organization o on a.org_id = o.org_id
where (DATE_ID >= 20101127) or (date_id between 20091215 and 20100115)
GROUP BY SUPPLIER_ID, DATE_ID, PRODUCT_GROUP_ID, CATEGORY_ID,
SUBJECT_ID, CHANNEL_ID, GEO_ID, FRONTLIST
ORDER BY SUPPLIER_ID, DATE_ID;

mahajas

Posts : 5
Join date : 2010-08-14

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty Re: Customer looking for sub second performance in a relational implementation.

Post  ngalemmo Mon Jan 17, 2011 3:41 pm

Don't bang your head against the wall trying to do something the hardware probably can't do.

Figure out what it would take to move the data into a multidimensional data store (i.e. a cube) such a Analysis Services or Hyperion. Figure out the cost for hardware and your time, give them an estimate, then discover if 'sub-second response' is really that important.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty Re: Customer looking for sub second performance in a relational implementation.

Post  Jeff Smith Mon Jan 17, 2011 5:35 pm

Depends on the database software. One option might be an indexed view if using SQL Server 2008.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty MPP database

Post  mahajas Wed Jan 19, 2011 12:26 am

The client has purchased MPP database so it has the horsepower (the query that dynamically builds aggregates using windowing features and heavy computation runs in 7 secs) but the SQL MDX cubes are running slightly better (obviously because they are pre-aggregated data). Web app renders the page mush in case SQL server a little bit faster than against the MPP DB. I am trying to build cubes in MPP DB by doing a GORUP and the UNION ALL different combinations but it's taking a very long time to build these cubes (more than 2 days).

Thanks

mahajas

Posts : 5
Join date : 2010-08-14

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty Re: Customer looking for sub second performance in a relational implementation.

Post  hang Thu Jan 20, 2011 1:26 am

I guess SSAS cubes should give you an instant result once processed. The issue is cube process takes too long and would go beyond the load window for nightly or any periodical ETL process. Have you considered partitioning on both fact tables and cubes so that you can carry out incremental processing after the initial time consuming build.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty Re: Customer looking for sub second performance in a relational implementation.

Post  ngalemmo Thu Jan 20, 2011 2:26 am

A multidimensional database will almost always peform better than a relational one, regardless of hardware. They just don't have the ability to store large volumes of detailed data.

Just curious as to why you have an ORDER BY in your CREATE TABLE statment. Ordering the table doesn't do anything for you (other than take time) unless you are using a Netezza box and are optimizing the zone map.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Customer looking for sub second performance in a relational implementation. Empty Re: Customer looking for sub second performance in a relational implementation.

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum