how to set a fact tabel for email support system
+4
nick_white
manickam
hkandpal
friend4allmo
8 posters
Page 1 of 1
how to set a fact tabel for email support system
Hi ,
I'm into designing DW for email support system
i'm building a transaction fact for emails
i have set dimensions for the date and mailboxes and email addresses ,...
the issue is where to set the email subject and content
are they set as measures in the fact table ?
the second problem is
the support system is not just by email it may be also by calls
so i intend to put the emails with calls in the same fact but the issue also is about the different properties for them
like the metric of the handle time i have a value for it in the email part but i don't have a value for the call for such metric. sohow to deal with different properties from the calls and the emails ?
with the note that many calls and emails may be related to the same case and this is the reason that i intend to put them in one fact as they details of the header which is a case ?
Thanks in advance,
Fr
I'm into designing DW for email support system
i'm building a transaction fact for emails
i have set dimensions for the date and mailboxes and email addresses ,...
the issue is where to set the email subject and content
are they set as measures in the fact table ?
the second problem is
the support system is not just by email it may be also by calls
so i intend to put the emails with calls in the same fact but the issue also is about the different properties for them
like the metric of the handle time i have a value for it in the email part but i don't have a value for the call for such metric. sohow to deal with different properties from the calls and the emails ?
with the note that many calls and emails may be related to the same case and this is the reason that i intend to put them in one fact as they details of the header which is a case ?
Thanks in advance,
Fr
friend4allmo- Posts : 10
Join date : 2014-05-22
RE:how to set a fact tabel for email support system
Hi,
it looks like you will be tracking the email by subject, so it is better to put the email subject in a dimension table.
The contenet/body of the email should go into the fact table.
For email and calls you should have a customer id you can use the customer id to relate them.
thanks
Himanshu
it looks like you will be tracking the email by subject, so it is better to put the email subject in a dimension table.
The contenet/body of the email should go into the fact table.
For email and calls you should have a customer id you can use the customer id to relate them.
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: how to set a fact tabel for email support system
Make your fact table as a degenerated dimension table by having both subject and content information.
Fact table can also accomadate the support given by calls by using a column to distinguish the support given by calls or email.
Fact table can also accomadate the support given by calls by using a column to distinguish the support given by calls or email.
manickam- Posts : 27
Join date : 2013-04-26
Re: how to set a fact tabel for email support system
Thanks for your reply,
there is still an issue where the email have a subject and content but the call does not, so this means that these degenerate fields will contains nulls in the case of call record, is this right ?
there is still an issue where the email have a subject and content but the call does not, so this means that these degenerate fields will contains nulls in the case of call record, is this right ?
friend4allmo- Posts : 10
Join date : 2014-05-22
Re: how to set a fact tabel for email support system
Unless you can parse the subject and content fields and extract useful/actionable information from them - which you can then use as dimension attributes - these are basically freeform text comments. Kimball covers how to deal with these in Chapter 14 of The Data Warehouse Toolkit
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: how to set a fact tabel for email support system
Unstructured data in the dimensional model has limited value. What metrics are you trying to produce by storing email text?
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: how to set a fact tabel for email support system
it's not a metric, it's a transaction fact and the operation guys need to be able to see them in the reports.
the issue is not just for the subject and context .
the issue in general if i have the calls and the emails and i want to store them in the same fact as sometimes they should be counted as the same thing as they are activities per the same case or activities on the same product
so, regardless of the text metrics , if there are some different properties between the emails and the calls like the emails has the sender email and the mailbox name used and handle time which are not exist for the call
so is it OK to set nulls for these different metrics in the call records ?
or how to deal with them ?
Thanks
fr
the issue is not just for the subject and context .
the issue in general if i have the calls and the emails and i want to store them in the same fact as sometimes they should be counted as the same thing as they are activities per the same case or activities on the same product
so, regardless of the text metrics , if there are some different properties between the emails and the calls like the emails has the sender email and the mailbox name used and handle time which are not exist for the call
so is it OK to set nulls for these different metrics in the call records ?
or how to deal with them ?
Thanks
fr
friend4allmo- Posts : 10
Join date : 2014-05-22
Re: how to set a fact tabel for email support system
I would not store the content of the email in a fact table. It is much too large and of little value unless you process and compile it, which is a totally separate discussion.
If your intent is to store any and all interactions in a single fact, you need to accommodate varying characteristics of such interactions, such as: channel (email, phone, social media, etc…), person involved, touchpoint (email address, phone number…), employee involved, nature of the interaction (positive, negative, subject… there is a lot that can be done here), and so on.
It is not unusual that not all information is available, so you need to have 'n/a' or 'unknown' dimensional references. Wither you put phone, social account name, email address and other touch points into separate or one dimension is a judgement call. It really depends on how you plan to use it. If it is purely for reference, a single junk dimension with a type code and the value should be good enough. If you are performing marketing then you need to break it out so you can tie responses back to the means of contact.
If your intent is to store any and all interactions in a single fact, you need to accommodate varying characteristics of such interactions, such as: channel (email, phone, social media, etc…), person involved, touchpoint (email address, phone number…), employee involved, nature of the interaction (positive, negative, subject… there is a lot that can be done here), and so on.
It is not unusual that not all information is available, so you need to have 'n/a' or 'unknown' dimensional references. Wither you put phone, social account name, email address and other touch points into separate or one dimension is a judgement call. It really depends on how you plan to use it. If it is purely for reference, a single junk dimension with a type code and the value should be good enough. If you are performing marketing then you need to break it out so you can tie responses back to the means of contact.
Re: how to set a fact tabel for email support system
Agree with Ngalemmo and B&L that unstructured data (emails or possible call recordings) should not be stored in fact table and have little analytic value when treated as metrics, just like typical freeform comments which should also be out of fact table based on Kimball's methodology. I would store all these unstructured data in the file systems, maybe compressed, and only manage the metadata on the files and characteristics that Negalemmo metioned about these unstructured data in the dimensional model. If you are using SQL Server, the new "File Tables" feature could be leveraged to manage the unstructured data.
The subject, as one of the characteristics, should be stored in a dimension like freeform comments as they could be quite lengthy and shared by many emails. A Subject SK in the fact table can point to a nonexistent dimension record if it's not applicable for calls. However whether you can combine calls and emails into a single fact table depends on their common dimensionality, and it looks like it makes more sense to combine them.
The subject, as one of the characteristics, should be stored in a dimension like freeform comments as they could be quite lengthy and shared by many emails. A Subject SK in the fact table can point to a nonexistent dimension record if it's not applicable for calls. However whether you can combine calls and emails into a single fact table depends on their common dimensionality, and it looks like it makes more sense to combine them.
Last edited by hang on Sat May 24, 2014 3:13 am; edited 1 time in total
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: how to set a fact tabel for email support system
Thanks Ngalemmo!
Thanks Hang!
Thanks Hang!
friend4allmo- Posts : 10
Join date : 2014-05-22
Re: how to set a fact tabel for email support system
Hi Again,
i'm sorry to re post on this again.
but when i went onto the separating the free form text like comments and email contents into separate diemsnions, i received a bad performance.
i tried to aanalyze the reasons of this and i wish my analysis be correct- first i'm working on sql server-
the free form text like comments and email content are stored in LOB datatypes and those ones are stored in sql server in separate extents/pages from the other data in the table which means if the query doesn't need to show them then sql server won't read them at all and their size won't affect the read operations required to get the data
but if they are required then reading them from their separate extents is more faster than joining the fact with a very huge diemsnion as the dimenion should have - for the email contents dimension - around half the numbers of the records of the fact and then the join operation is taking too much time against the reading of these data from their separate extents which are owned by the fact table .
the strange point for me is that my fact nearly doesn't have any numeric metrics at all as it's a transactional level fact which have records for calls and emails so the only measure i have set until now is the handle time and i may add some like delay time and so on by most of the data is in text format.
this model is for getting a faster performance than getting a analytical data from it.
as most of the analytical data is set in other facts which is aggrregated over time intervals.
so please if you see my way is bad or my analysis for the reason of the problem has some issues please advice me , also is it acceptable to leave the email contents and case comments in the fact ?
Thanks,
Fr
i'm sorry to re post on this again.
but when i went onto the separating the free form text like comments and email contents into separate diemsnions, i received a bad performance.
i tried to aanalyze the reasons of this and i wish my analysis be correct- first i'm working on sql server-
the free form text like comments and email content are stored in LOB datatypes and those ones are stored in sql server in separate extents/pages from the other data in the table which means if the query doesn't need to show them then sql server won't read them at all and their size won't affect the read operations required to get the data
but if they are required then reading them from their separate extents is more faster than joining the fact with a very huge diemsnion as the dimenion should have - for the email contents dimension - around half the numbers of the records of the fact and then the join operation is taking too much time against the reading of these data from their separate extents which are owned by the fact table .
the strange point for me is that my fact nearly doesn't have any numeric metrics at all as it's a transactional level fact which have records for calls and emails so the only measure i have set until now is the handle time and i may add some like delay time and so on by most of the data is in text format.
this model is for getting a faster performance than getting a analytical data from it.
as most of the analytical data is set in other facts which is aggrregated over time intervals.
so please if you see my way is bad or my analysis for the reason of the problem has some issues please advice me , also is it acceptable to leave the email contents and case comments in the fact ?
Thanks,
Fr
friend4allmo- Posts : 10
Join date : 2014-05-22
Re: how to set a fact tabel for email support system
How are you using the free-form text in queries? Are you looking for keywords?
The problem with free-form text is their isn't any way to use it efficiently in a traditional database. If you are looking for text that contains certain words, the DB needs to scan every row and text every time. It is a very expensive operation.
The way to gain efficiency is to parse the text into individual words and create a phrase dimension, containing the text, a keyword dimension containing unique words (usually force to upper or lower case to avoid storing duplicates and making queries consistent) and a bridge between the keywords and phrases. You store the phrase key on the fact. This allows you to build indexes on the bridge and dimensions making the whole thing a lot faster.
There are a variety of techniques to reduce the size of the bridge, such as throwing away noise words, punctuation, vulgarities, and using dictionaries to correct common spelling errors. There are vendors that can provide such data. You only need to store each unique word in the text once in the bridge, it is also useful to have a count in the bridge. Having the keyword dimension also allows you to attach attributes to the word, based on topologies or other characteristics the business wishes to apply.
Since you are using SQL Server, you may also want to look into column based indexing on the bridge.
The problem with free-form text is their isn't any way to use it efficiently in a traditional database. If you are looking for text that contains certain words, the DB needs to scan every row and text every time. It is a very expensive operation.
The way to gain efficiency is to parse the text into individual words and create a phrase dimension, containing the text, a keyword dimension containing unique words (usually force to upper or lower case to avoid storing duplicates and making queries consistent) and a bridge between the keywords and phrases. You store the phrase key on the fact. This allows you to build indexes on the bridge and dimensions making the whole thing a lot faster.
There are a variety of techniques to reduce the size of the bridge, such as throwing away noise words, punctuation, vulgarities, and using dictionaries to correct common spelling errors. There are vendors that can provide such data. You only need to store each unique word in the text once in the bridge, it is also useful to have a count in the bridge. Having the keyword dimension also allows you to attach attributes to the word, based on topologies or other characteristics the business wishes to apply.
Since you are using SQL Server, you may also want to look into column based indexing on the bridge.
Re: how to set a fact tabel for email support system
Thanks Nick for your reply data.
actually i'm not doing any search in this free - form text i'm showing them as they are
say for example the user need to show the emails data including the date time and content
then i'm showing them as they are without any search through the emails content all filters are for dates and agents, and the status of the email or the case and so on.
and also the comments on each case they need show the data filtered the agent or the date and so on but i haven't see any any requirement about searching inside this free form text.
i retested the reason of the delay when separating the text into the dimension:
i found that it's taking to much time to make a clustered index scan for the dimension because for the dimension there is no filters applied to it
and the data filter is applied to fact dates and agent data. but if in the free - form text it has to make full index scan to get all the data and then join it with the fact table which take too much time because of the huge number of the records in the dimension.
actually i'm not doing any search in this free - form text i'm showing them as they are
say for example the user need to show the emails data including the date time and content
then i'm showing them as they are without any search through the emails content all filters are for dates and agents, and the status of the email or the case and so on.
and also the comments on each case they need show the data filtered the agent or the date and so on but i haven't see any any requirement about searching inside this free form text.
i retested the reason of the delay when separating the text into the dimension:
i found that it's taking to much time to make a clustered index scan for the dimension because for the dimension there is no filters applied to it
and the data filter is applied to fact dates and agent data. but if in the free - form text it has to make full index scan to get all the data and then join it with the fact table which take too much time because of the huge number of the records in the dimension.
friend4allmo- Posts : 10
Join date : 2014-05-22
Re: how to set a fact tabel for email support system
Are you sure that dimensional modelling is the right solution for your problem in this situation?
We know emails are big - many KB of text are being returned so this will obviously impact the return speed. To reduce this, if you need to do a report (i.e. something in MSRS or the like) then maybe you could trim the email to the first (say) 250 characters and hyperlink/join out to the full email text somewhere else (e.g., link to a report that is passing only the email ids into a indexed email content table that is hanging off the fact table that you are using for reporting).
Similarily, you may be modelling for a report when the better design is an application.
We know emails are big - many KB of text are being returned so this will obviously impact the return speed. To reduce this, if you need to do a report (i.e. something in MSRS or the like) then maybe you could trim the email to the first (say) 250 characters and hyperlink/join out to the full email text somewhere else (e.g., link to a report that is passing only the email ids into a indexed email content table that is hanging off the fact table that you are using for reporting).
Similarily, you may be modelling for a report when the better design is an application.
nathanjones77- Posts : 11
Join date : 2014-06-03
Location : Zurich
Re: how to set a fact tabel for email support system
Hi,
could you please send us some information on the table strucutre that you have created.
thanks
Himanshu
could you please send us some information on the table strucutre that you have created.
thanks
Himanshu
hkandpal- Posts : 113
Join date : 2010-08-16
Re: how to set a fact tabel for email support system
I agree with the previous comment about a Dimensional model possibly not being the right solution. How I would do it, based on my knowledge of OBIEE but I'm sure most other reporting tools implement similar functionality, is as follows:
1. Create your dimensional model without any of these large text fields
2. Create a table that holds the large text fields plus any keys necessary to link the records to records in your dimensional model. While this table is not formally part of your dimensional model you can hold it in the same DB - or anywhere else that you choose
3. Create whatever reports you require against your dimensional model
4. Create a 'report' against your text table
5. Create links from your dimensional model report to allow you to drill-through to your text table report - pulling through the parameters necessary to filter the text table report based on what is displayed in your dimensional model report
1. Create your dimensional model without any of these large text fields
2. Create a table that holds the large text fields plus any keys necessary to link the records to records in your dimensional model. While this table is not formally part of your dimensional model you can hold it in the same DB - or anywhere else that you choose
3. Create whatever reports you require against your dimensional model
4. Create a 'report' against your text table
5. Create links from your dimensional model report to allow you to drill-through to your text table report - pulling through the parameters necessary to filter the text table report based on what is displayed in your dimensional model report
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: how to set a fact tabel for email support system
i have set the design with separating the text in separate dimensions as the image below
where all text fields for email are set in one dimension
and all case text fields in one dimension as they are used as search or something like this. they are displayed as they are if their emails are part of the result set.
however i intend to separating the case subject into separate dimension as it's required many time in the reporting and it has 90 % less disctinct values against the distinct values for the other text fields . but i'm afraid of that the fact has so many keys to dimensions
so is there an advice about which is better to separate the sbject into separate diemsnion and how many keys to be suitable in the fact table.
the big problem i have now and i can't find any solution to model it is
most emails-out are tied to emails-in
but each record in the fact table is set to either emailin, emailout or call
so if i added a key anywhere in a dimenion or in the fact to the part emailin for the emailout this means that i will need to self join the fat to get the details of the emailin and emailout with each other, or is there any other solution to cover this ?
Thanks,
Fr
where all text fields for email are set in one dimension
and all case text fields in one dimension as they are used as search or something like this. they are displayed as they are if their emails are part of the result set.
however i intend to separating the case subject into separate dimension as it's required many time in the reporting and it has 90 % less disctinct values against the distinct values for the other text fields . but i'm afraid of that the fact has so many keys to dimensions
so is there an advice about which is better to separate the sbject into separate diemsnion and how many keys to be suitable in the fact table.
the big problem i have now and i can't find any solution to model it is
most emails-out are tied to emails-in
but each record in the fact table is set to either emailin, emailout or call
so if i added a key anywhere in a dimenion or in the fact to the part emailin for the emailout this means that i will need to self join the fat to get the details of the emailin and emailout with each other, or is there any other solution to cover this ?
Thanks,
Fr
friend4allmo- Posts : 10
Join date : 2014-05-22
Similar topics
» Using a dimension in multiple fact tables with different grain and support SCD
» Tracking Support Tickets: accumulating snapshot and transaction grain fact (a presentation question)
» Billing System Transaction fact table
» Primary Key of the Purchase Order (PO) Fact
» Dimensional model for support ticketing process
» Tracking Support Tickets: accumulating snapshot and transaction grain fact (a presentation question)
» Billing System Transaction fact table
» Primary Key of the Purchase Order (PO) Fact
» Dimensional model for support ticketing process
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum