Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Bridge table and double counting issue

2 posters

Go down

Bridge table and double counting issue Empty Bridge table and double counting issue

Post  tiberghv Mon Oct 14, 2013 8:50 am

Hi everyone,

I am designing a datamart for a university that needs to measure how many students are applying to its courses.

When a student arrives, he can choose one or two courses he wants to apply to.

I am using a bridge table between the Fact table "application" and the dimension "Course" to deal with the many to many relation between application and course :

Dim Student -> Fact Application <- Bridgetable -> Dim Course

My Dimension Course has a "Course ID" and a "Course Type" attribute. "Course ID" is specific to each Course but one "Course Type" can be shared by many courses.
I need to measure the application number by "Course ID", and by "course Type".

To avoid the double counting issue introduced by the bridge table, i could be using a weight factor inside the bridge table, but in my case it would not work :
When a student has applied to two different courses, 1 application must be counted for each course (not 0,5 or other weight factor).
If the two courses the student has applied to are both part of the same "Course Type", then i should count just 1 application for this "Course Type", not 2.
The solution i have found to deal with the situation is to count the "application number" using this formula : count(distinct student_key).

It works, but i am not sure this is the correct way to deal with such a situation. Is it ?

- Supplementary questions :
I also have a second Fact table that is an agregated version of the first table : it is not linked to the student dimension, and it stores agregated number of applications by program. I cannot use the count(distinct student_key) anymore and i am facing the double count issue. What are the solutions to use in that case ?

Thank you for your help.

tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  ngalemmo Mon Oct 14, 2013 12:51 pm

Why the bridge? It is unnecessary. The fact table doesn't need to have one row per application... it should be one row per course applied.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  tiberghv Mon Oct 14, 2013 3:42 pm

Well, i was hesitating with the solution you propose. If i choose this solution, should i also add the "course choice number" (first choice or second choice) inside the fact table (i was previously saving this information inside the bridge table) ?

Also, i still have the double count issue on my agregated table, as i cannot use the count(distinct student_ID) on this table (wich is agregated and not linked to the dim ).
My agregated fact table is only linked to "Dim Date" and "Course".

Thank you very much for your help.

tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  ngalemmo Mon Oct 14, 2013 5:36 pm

Trust me, you don't need a bridge.

Store the preference (first, second...) as a degenerate dimension in the fact.

And, if you don't want to double count, don't create the aggregate, or don't use the aggregate for that particular metric. If you want to count # students applying for a program, it is a simple count distinct off the detailed fact grouping by program (which I assume is an attribute of the course dimension... if it is not, include program as a dimension of the detailed fact).

ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  tiberghv Tue Oct 15, 2013 6:34 am

Concerning the agregated table :
I need to visualize the evolution through time (each week) of the number of applications per course (and "course type").
But i cannot afford to store the status of each application every week (we have thousands and thousands of students applying), it takes too much disk space. That's why i use the agregated fact table, removing the dim student link from it.

So i link my agregated fact table to the "Dim course", inserting two lines, one for each course choosen.
But when i want to count the applications number per Course type, i get a double count if course 1 and course 2 both have the same "course type".
What do you recommend for that case ?
Should i make a new independent Dim for the Course Type, and a special agregated Fact table linked to that dimension (and not linked to the dim course) ?

tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  ngalemmo Tue Oct 15, 2013 8:25 am

You don't need to store every application every week... just every time it changes(if you need history, or just once if you don't). And 'thousands and thousands' is not much at all... a properly designed fact table would be hard-pressed to get much larger than a few MB. A photo takes up more space.

The thing is you would not be able to use a course level aggregate to produce program level totals. Rethink your model and keep it simple.



ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  tiberghv Tue Oct 15, 2013 9:15 am

I have about 50 000 applications in the system. If i add two rows in the fact table for every application (one per course choice), that's 100 000 rows.
If i save this information every week(52 weeks) , that is 5.2 Million rows per year. It is quite too large for me.

If i just store the changes as you suggest (instead of once a week), how will i be able to have a sql query that outputs each week of the year and the related number of applications.


tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  ngalemmo Tue Oct 15, 2013 9:52 am

Put effective dates on the row.

The thing is, you keep the detail at all times.  If you want to create aggregates, fine... create all the aggregates you want.  As I stated before, and should be obvious to you, aggregates are not useable all the time... which is why you need to keep the detail, if for nothing else, to create another aggregate.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  tiberghv Tue Oct 15, 2013 10:04 am

I am sorry i don't understand.

If i build a fact table as follow, not using one row each week, but only one row each time data changes :

Student_key | Effective Date | Course
1   |   2013-01-01(week 1)  |  CRSE_1
1   |   2013-01-30(week 4)  |  CRSE_2

How can i query this table to get the result :

week | number of applications on CRSE_1
1 |  1
2 |  1
3 |  1
4 |  0

tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  ngalemmo Tue Oct 15, 2013 10:54 am

You can either
A: store an expiration date
or
B: Use the lead/lag windowing function to get the effective date from the next row

Get weekly status by joining to a date dimension or list of week dates. Select one row per week from the calendar and find applications in effect for that date.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  tiberghv Tue Oct 15, 2013 11:04 am

I will have to try this solution.
I was used to have effective start and end dates inside SCD dimensions, but i didn't know i could use them into Fact tables. that surprises me.

tiberghv

Posts : 6
Join date : 2013-10-14

Back to top Go down

Bridge table and double counting issue Empty Re: Bridge table and double counting issue

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum