Data mart for mobile devices and applications

View previous topic View next topic Go down

Data mart for mobile devices and applications

Post  naresh1234 on Thu Jan 30, 2014 8:51 pm

Hello

We have a requirement to create a data mart where we store mobile applications details and count on a weekly basis

Let us say there are 50,000 mobile devices and each could have say 10 applications . so a total of 500,000 appliications
We want to know what is the count of application either at one point of time or over a few points of time in history, say for the last 12 weeks or say for the last 6 months, every week.

so from a time dimension we want to store weekly data...including details of what the application type is , version...carrier...
ex.
type - .androis , ios,...
carrier - att , verion,...
region to where the device belongs...ex. americas, emea, (based on the employee who owns the device)...so there is a hr dimension also invlived.

I was intiailly thinking a fact table which will have

Date_id (typically last day of the week)
DEVICE_ID
Application_id
emploee_id
and then there need not be any real measures...as we could do
count of (distinct devic id)...to get device count
count if (application _id) to get total appcation count...

But this design may not be sustainable as if we store the data weekly then its way too much data...
from above numbers...
weekly ...500,000
monthly ...2 million
yearly ...24 million

I am also thinking that the one on HR data mart might be good one
http://www.kimballgroup.com/1998/02/02/human-resources-data-marts/
Just that for me its more like a Application data mart...where application is installed (hired) or deleted (terminated)...

The design at the url above is perhaps perfect to get count of application (with all required details) at one poiint of time...but doubt that will work if I want a trend on the count...or say I want historic count (example count for the alst 12 months)

Any inputs are much appreciated... !!

Thanks for your valuable time Gurus !!


naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  BoxesAndLines on Fri Jan 31, 2014 9:07 am

24 million rows is not a lot of data. With database partitioning the performance will be fine.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  nick_white on Fri Jan 31, 2014 10:15 am

A couple of thoughts...

1) How about having 2 date keys on your fact table - date added and date deleted; set the date deleted to a date far in the future for any non-deleted fact record? You can then find the count at any point in time (such as today) by querying where "today between date added and date deleted"

2) You need to declare the grain of your fact table and then stick to it. You state the measures are:
count of (distinct device id)...to get device count
count of (application _id) to get total application count

these are at different grains (one at the device level and one at the app/device level) which breaks one of the fundamental rules of the Kimball methodology. If you want both these measures then you need 2 fact tables. It would also be better to set the measures = 1 as you would then use a SUM when query the tables - which is generally faster than a COUNT.

Hope this helps

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Partiotion

Post  naresh1234 on Fri Jan 31, 2014 12:41 pm

BoxesAndLines wrote:24 million rows is not a lot of data.  With database partitioning the performance will be fine.

Thanks ! will explore this option further !

naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  naresh1234 on Fri Jan 31, 2014 3:06 pm

nick_white wrote:A couple of thoughts...

1) How about having 2 date keys on your fact table - date added and date deleted; set the date deleted to a date far in the future for any non-deleted fact record? You can then find the count at any point in time (such as today) by querying where "today between date added and date deleted"

2) You need to declare the grain of your fact table and then stick to it. You state the measures are:
count of (distinct device id)...to get device count
count of (application _id) to get total application count

these are at different grains (one at the device level and one at the app/device level) which breaks one of the fundamental rules of the Kimball methodology. If you want both these measures then you need 2 fact tables. It would also be better to set the measures = 1 as you would then use a SUM when query the tables - which is generally faster than a COUNT.

Hope this helps

Nick,
Thanks for the reply !

1.
I like this idea for sure and its more like what is mentioned at http://www.kimballgroup.com/1998/02/02/human-resources-data-marts/
This will definately give me counts at one point in time. But not sure how that would work for historic trend ? Say I wanted count over past 12 month , so want the count over 12 different points of time...say last day or every month. any ideas would be appreciated.

2.
Agreed the sum(measure), ie sum(1) to get the count...though the reason I would like both DEVICE and APP in the same table is, we need count of apps per device also.
Some flavors we need are
1. count of apps per month
2. counter of apps per operating system (and opertaing system is associated to device and not the app)
3. count of apps per region (region is based on employee who owns the device)
4. baiscally counts based on diferent combinations of attributes of the device and/or application or user.

Any further thoughts are much appreciated !!

Thanks a lot !!!

naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  nick_white on Mon Feb 03, 2014 3:59 am

Hi,

regarding point 1: historic trends. The facts are semi-additive i.e. you can't sum across the date dimension (the count today + the count tomorrow has no meaning) so it needs a bit more thought. If you just want to pick a particular day of the month then you can run a query and constrain your date dimension on this (e.g. day = 1, 1st Monday in the month, etc.) - which is obviously easier if you have the relevant attribute in your date dimension. If your requirement is more complex, such as the month values used in the historic trend analysis are the average daily counts in the month then I would create a monthly snapshot fact table

Point 2: as long as your measure is count of apps and this is the grain of your fact table then all the examples are quite straightforward assuming you have the Dimensions you describe attached to the Fact. Where you will start getting into difficulties is if you try and add a measure for "count of devices" to this fact as it is at a different grain and you can't then just sum up the measures. Put this is a separate fact table and then as long as you are using conformed dimensions you can combine the facts by "querying across" e.g. combine "sum app count from app_fact group by date and operating system" with "sum device count from device_fact group by date and operating system": you can then perform calculations such as apps per device in the query rather than trying to hold this value in a fact table

Hope this helps

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Here is what I am thinking

Post  naresh1234 on Mon Feb 10, 2014 7:58 pm

Thanks Nick for the reply !
Here is what I am thinking in case you get a chance to take a look at it and comment..

APP_FACT
Device_ID...ex..D123
App_ID......ex..APP123
Start_Date...
end_Date
Counter...always will have the value 1
DUMMY_INT_ID...always will have the value 1


DATE_DIM..This table will store only Fridays or will filter out only Friday from this table...as snapshots are taken only one a week for the project....

DUMMY_INT_ID...always have the value 1
DATE_VALUE...
WEEK_ID
Month
Year


the idea then is to create a cartesian product between the 2 tables (to use the DATE_VALUE from the DIM table and generate a row for every row in fact table against every selected row from the DATE_DIM table)


SELECT
 DATE_DIM.YEAR,DATE_DIM.DATE_VALUE
 sum
( case when  DATE_DIM.DATE_VALUE between  APP_FACT.START_DATE and APP_FACT.END_DATE then
  APP_FACT.COUNTER else 0 end)
FROM
 DATE_DIM,
 APP_FACT
WHERE
 ( APP_FACT.DUMMY_INT_ID=DATE_DIM.DUMMY_INT_ID)
 AND  
 ( DATE_DIM.DATE_VALUE =  )
GROUP BY
 DATE_DIM.YEAR


===========
for a monthly snap shot

SELECT
 DATE_DIM.YEAR,
 DATE_DIM.MONTH,
 sum
( case when  DATE_DIM.DATE_VALUE between  APP_FACT.START_DATE and APP_FACT.END_DATE then
APP_FACT.COUNTER else 0 end)
FROM
 DATE_DIM,
 APP_FACT
WHERE
 ( APP_FACT.DUMMY_INT_ID=DATE_DIM.DUMMY_INT_ID)
 AND  
 ( DATE_DIM.DATE_VALUE =( select max(D1.DATE_VALUE) from DATE_DIM D1 WHERE DATE_DIM.MONTH = D1.MONTH AND  DATE_DIM.YEAR = D1.YEAR )  )
GROUP BY
 DATE_DIM.YEAR,
 DATE_DIM.MONTH

======

for a yearly snap shot...

SELECT
 DATE_DIM.YEAR,
 sum
( case when  DATE_DIM.DATE_VALUE between  APP_FACT.START_DATE and APP_FACT.END_DATE then
APP_FACT.COUNTER else 0 end)
FROM
 DATE_DIM,
 APP_FACT
WHERE
 ( APP_FACT.DUMMY_INT_ID=DATE_DIM.DUMMY_INT_ID)
 AND  
 ( DATE_DIM.DATE_VALUE =( select max(D1.DATE_VALUE) from SV_DATA.ZAFAN_DATE D1 WHERE   SV_DATA.ZAFAN_DATE.YEAR = D1.YEAR )  )
GROUP BY
 DATE_DIM.YEAR


any pitfalls that you see ?

naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  nick_white on Tue Feb 11, 2014 4:35 am

A few thoughts:

1. Your DATE_DIM should be a conformed Dim (as should all your Dims) that can be used across your DW - rather than storing only Fridays it should definitely have every day in it that could be referenced in your DW i.e. from prior to the date of the oldest record you hold to a date in the future past when you expect your DW to still be in use e.g. 2050

2. If you are going to have monthly and yearly snapshots then you will need to aggregate your Date_Dim table up to Month_dim and Year_Dim

3. I'm not clear what the purpose is of your first SQL statement or the DUMMY_INT_ID column. Assuming your Date_dim has a year-month column (holding values such as 201402) and a last day of the month flag, the SQL to create you monthly snapshot might be something like (assuming you want to snapshot the position on the last day of the month):

SELECT
DATE_DI.YEAR_MONTH, -- This will be the SK value in your month_dim
SUM (APP_FACT.COUNTER)
FROM
APP_FACT
INNER JOIN DATE_DIM ON APP_FACT.START_DATE <= DATE_DIM.DATE_VALUE AND APP_FACT.END_DATE >= DATE_DIM.DATE_VALUE
WHERE DATE_DIM.LAST_DAY_IN_MONTH_FLAG = 'Y'

Logic for your yearly snapshot would be similar - up to you whether you have a "last day in year" flag on your date dim or just put month_day = 1231 in your WHERE clause

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Thanks Again !

Post  naresh1234 on Tue Feb 11, 2014 7:12 pm

Thanks Again Nick !

1. Agreed. I was just using a quick and dirty example, I will be using an already existing DIM that goes by what you are saying.

3. Agreed, much better approach, I will test it out.

2. I am not sure I got this one. Are you talking of aggregating the data in the FACT table ? In the FACT table from what I am thinking and the design I proposed in the tables (with ur help) does not require aggregating the data in the table ? did I miss something ?


Regards,

naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  nick_white on Wed Feb 12, 2014 4:43 am

Hi,

point 2: sorry, probably wasn't that clear. When you aggregate your fact data from the day level up to the month (or year level) and put it into a new fact table you can't have an SK in the new fact table referencing your DATE_DIM (because the 'grain' of that Dim is days). So you need to create a new MONTH_DIM (and/or YEAR_DIM) that takes all the month/year specific attributes from your DATE_DIM - the following table illustrates this though unless you can think of more useful attributes to add to the YEAR_DIM it would make more sense to treat this as a degenerate dimension on the Fact table rather than creating a Dim table for it:

DATE_DIMMONTH_DIMYEAR_DIM
D_DATE_SKD_MONTH_SKD_YEAR_SK
D_DATE_BKD_MONTH_BKD_YEAR_BK
FULL_DATE
DAY_SHORT_NAME
DAY_FULL_NAME
DAY_NUMBER_IN_WEEK
DAY_NUMBER_IN_MONTH
DAY_NUMBER_IN_YEAR
WEEK_NUMBER_IN_MONTH
WEEK_NUMBER_IN_YEAR
MONTHMONTH
MONTH_NUMBERMONTH_NUMBER
MONTH_SHORT_NAMEMONTH_SHORT_NAME
MONTH_FULL_NAMEMONTH_FULL_NAME
DAYS_IN_MONTHDAYS_IN_MONTH
QUARTER_NUMBERQUARTER_NUMBER
BEG_OF_QUARTERBEG_OF_QUARTER
END_OF_QUARTEREND_OF_QUARTER
DAYS_IN_QUARTERDAYS_IN_QUARTER
YEARYEARYEAR
WEEK_DAY_FLAG
WEEK_END_FLAG
DAYS_IN_YEARDAYS_IN_YEAR
DAY_DESC
MONTH_DESCMONTH_DESC
WEEK_DESC
QUARTER_DESCQUARTER_DESC
QUARTER_KEYQUARTER_KEY
MONTH_KEYMONTH_KEY
WEEK_KEY
ORDINAL_DATE
CALENDAR_MONTHCALENDAR_MONTH

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

No need to aggregate !

Post  naresh1234 on Wed Feb 12, 2014 12:10 pm

Thanks again Nick !

But after what we discsused abve I am not seeing a need to aggregate anything from a day to month or year level

APP_FACT
Device_ID...ex..D123
App_ID......ex..APP123
Start_Date...
end_Date
Counter...always will have the value 1


DATE_DIM..This table will have all dates with a column for last day or week and last day or month and last day of year...

DATE_VALUE...
WEEK_ID
Month
Year

and then with the sql that u proposed I can get snaphot for any date, month (last day of mont) or year (last day of year)....

Any day
SELECT
DATE_DIM.YDATE_VALUE, -- This will be the SK value in your month_dim
SUM (APP_FACT.COUNTER)
FROM
APP_FACT
INNER JOIN DATE_DIM ON APP_FACT.START_DATE <= DATE_DIM.DATE_VALUE AND APP_FACT.END_DATE >= DATE_DIM.DATE_VALUE
WHERE DATE_DIM.DATE_VALUE =

Any Month
SELECT
DATE_DIM.YDATE_VALUE, -- This will be the SK value in your month_dim
SUM (APP_FACT.COUNTER)
FROM
APP_FACT
INNER JOIN DATE_DIM ON APP_FACT.START_DATE <= DATE_DIM.DATE_VALUE AND APP_FACT.END_DATE >= DATE_DIM.DATE_VALUE
WHERE DATE_DIM.LAST_DAY_IN_MONTH_FLAG = 'Y' and year = and month =

If I want all the months in year
All Months in a year
SELECT
DATE_DIM.YDATE_VALUE, -- This will be the SK value in your month_dim
SUM (APP_FACT.COUNTER)
FROM
APP_FACT
INNER JOIN DATE_DIM ON APP_FACT.START_DATE <= DATE_DIM.DATE_VALUE AND APP_FACT.END_DATE >= DATE_DIM.DATE_VALUE
WHERE DATE_DIM.LAST_DAY_IN_MONTH_FLAG = 'Y' and year = and month =


At any point of time my FACT table will have only one row for each app on any particular device.
say for Device123 and App123 there will be only one row any time...
when the device is deleted the END_DATE will be updated....


Thanks a lot for all your valuable input !!

naresh1234

Posts : 6
Join date : 2014-01-30

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  nick_white on Thu Feb 13, 2014 7:00 am

Hi - the reason for creating aggregates is to improve performance. If you hold the data at the day level and the queries that produce monthly, or yearly, summaries run quickly enough then there is no need to build aggregates - and if you don't have these aggregates fact tables then you don't need "aggregate" Dim tables such as Month or Year.
When and where to create aggregates is always a trade-off between the query performance gain of having them and the overhead of maintaining them

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Data mart for mobile devices and applications

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum