Denormalizing different grained data
Page 1 of 1
Denormalizing different grained data
Until now, I used to follow classical Kimball design of building fact tables for same/common grained data. Now, again and again, coming across data sets which need to be denormalized and still not sure, how to handle it.
The source of data is click stream, but now business users want, simple way to slice click stream all the way to down stream funnel.
Data comprises following high level activity
1. Web activity (this comes with web dimensions ex. browser, device type, geo (derived from IP), ISP, IP etc)
tells us what users did on our web properties, which pages they visited, how long stayed on those pages, overall browsing, drop offs etc.
2. Merchandizing
This data set captures, which offers were shown to users, ranking of clients in those offers, various products (lead gen, click out, display etc)
3. User interaction
which offers user clicked on
4. Monetization
captures various transactions which we were able to monetize.
etc. etc.
As you can see, these data sets have different grain. For easy comprehensive reporting/analysis, is it ok to stitch all these data sources together by duplicating dimensionality from upstream activity?
For example copy all web/click stream dimensions for merchandizing and continue duplicating dimensions until you go all the way down to monetization.
So they can easily run reports like
1. Total revenue generated by consumers who used samsung smart phones?
2. revenue by product by device type?
3. average time spent on site by device/browser?
4. Call center activity or conversion based on device?
Thanks
The source of data is click stream, but now business users want, simple way to slice click stream all the way to down stream funnel.
Data comprises following high level activity
1. Web activity (this comes with web dimensions ex. browser, device type, geo (derived from IP), ISP, IP etc)
tells us what users did on our web properties, which pages they visited, how long stayed on those pages, overall browsing, drop offs etc.
2. Merchandizing
This data set captures, which offers were shown to users, ranking of clients in those offers, various products (lead gen, click out, display etc)
3. User interaction
which offers user clicked on
4. Monetization
captures various transactions which we were able to monetize.
etc. etc.
As you can see, these data sets have different grain. For easy comprehensive reporting/analysis, is it ok to stitch all these data sources together by duplicating dimensionality from upstream activity?
For example copy all web/click stream dimensions for merchandizing and continue duplicating dimensions until you go all the way down to monetization.
So they can easily run reports like
1. Total revenue generated by consumers who used samsung smart phones?
2. revenue by product by device type?
3. average time spent on site by device/browser?
4. Call center activity or conversion based on device?
Thanks
ravibkulkarni- Posts : 2
Join date : 2013-08-07

» Looking for a Data Architect/Data Modeler for NYC Big Data Startup
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
» Is it a best practice that Data warehouse follows the source system data type?
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
» Is it a best practice that Data warehouse follows the source system data type?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|