Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Time-Stamped transaction tracking not suitable, gets too big

4 posters

Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Time-Stamped transaction tracking not suitable, gets too big

Post  Toby Mon Oct 03, 2011 12:21 pm

Hi

I am looking for a modelling solution to the following problem:

My base data is a large amount of residents, each of them with a large set of (clearly defined) attributes (name, gender, address, job, name of father, ...). As residents move, marry, die, and so on, their attribute data changes. These changes happen quite frequently. With each change, usually only a small set of attributes change. I need to answer questions about both the present and any point back in time.
Basically, this is the same thing as the time-stamped transaction tracking in a dimension approach that Kimball describes for employee transactions. However, due to the large amount of people (residents instead of employees) and the size of each resident entry, I know that this approach will not work, even though the kind of questions that can be answered by this model seems suitable.

What are my options here?

Thanks,
Toby


Toby

Posts : 3
Join date : 2011-10-03

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  ngalemmo Mon Oct 03, 2011 2:31 pm

I'm sure there are frequent updates, but I can't imagine any one person is updated very frequently at all. Most people don't move that often and you only die once (last time I checked). Why don't you think the approach won't work?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Toby Tue Oct 04, 2011 3:45 am

OK, so I did the maths:

A person has about 3Kbyte worth of data which cannot easily be left out (things like 'name of father', which nobody will ever use for a query already removed). There are a lot of unicode varchar fields that are responsible for this size.
I start with 700'000 people in the largest area.
A typical week has about 10'000 updates in that area.

This leads to:
2GB worth of data to start with (700'000*3/1024/1024)
1.5GB worth of updates for a year (52*10000*3/1024/1024)

I agree that it's managable in terms of data storage. Will it be in terms of query performance? People plan to use PowerPivot to analize the data, which I guess will push their client machines to the limit (I don't know much about PowerPivot yet).

Thanks,
Toby

Toby

Posts : 3
Join date : 2011-10-03

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  John Simon Tue Oct 04, 2011 6:10 pm

A few points:
1. PowerPivot has a great compression engine, but if yur values are mostly distinct you won't get any gains
2. PowerPivot has an effective limit of 2Gb. PowerPivot itself does not have this limit, but SharePoint does, which means you cannot share this data in PowerPivot.
3. You may want to consider using a demographics mini-dimension if these attributes are changing often, and link this dimension into your fact table.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Mike Honey Tue Oct 04, 2011 7:11 pm

Hi Toby,

I'm curious about your statement: "700'000 people in the largest area". How many "areas" do you have to serve? If you have more than one or two, you may find the PowerPivot architecture a bit frustrating. AFAIK, it doesn't offer any built in support for cloning a model for different data slices, so maintenance and support of your models could get out of hand.

You might be better served building an SSAS model to cover all the "areas" in one pass, with a single model. Dimension security (on an "Areas" dimension) could split it for user consumption. The end-user interface could be Excel PivotTables, optionally published to Excel Web Services in SharePoint, so the same experience (actually a little better) to PowerPivot.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Toby Wed Oct 05, 2011 8:02 am

@John:
Thanks for this info, the compression engine will have a lot, since a lot of attributes are enum-like. The client currently wishes to not use sharepoint, so for now at least, the 2GB file size limit will not be a problem. I'll think about the demographics mini dimension, this seems to be a good approach. A lot of questions will be comparisons between different geographic locations for instance.

@Mike:
Thanks for this warning. In my case, it's one area = one customer, and the largest of all installations will be 700'000 people. There currently is one customer, with a good possibility of up to 10 other customers. These customers have quite similar needs. There is however no combined data available, and each customer runs his own database and his own clients with PowerPivot installed.
What I expect to happen is that I copy&paste the first few PowerPivot reports between customers (adjusting headers and such), and then each customer will then manage his reports himself. I will then only be called upon for new requirements that need the warehouse/etl to be adjusted or that require deeper PowerPivot knowledge (which I am not having yet).

Toby

Posts : 3
Join date : 2011-10-03

Back to top Go down

Time-Stamped transaction tracking not suitable, gets too big Empty Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum