Text Analytics and Dimensional Models
3 posters
Page 1 of 1
Text Analytics and Dimensional Models
I have been doing some research in the Text Analytics / Big Date space. I was wondering if dimensional models have any play in this space?
If yes, how? What is the architecture? What kind of measures / facts would belong in the Fact table?
Is there a real world example of this?
If yes, how? What is the architecture? What kind of measures / facts would belong in the Fact table?
Is there a real world example of this?
sgrover3- Posts : 8
Join date : 2011-04-14
Re: Text Analytics and Dimensional Models
Yes and no... If you intend to structure the data in some manner, a dimensional model can perform well. For example, I implemented a dimensional model for a website which included clickstream analysis based on search phrases used to access the site. The phrase was represented as a multivalued dimension of keywords allowing users to analyze visitor behavior based on combinations of keywords used. It also included attributes to determine if access was through a paid link or generic search.
If you need to deal with large volumes of free text, the issue isn't so much the model as it is the effort to parse large volumes of text for the purposes of structuring it. Depending on the resources available to you, parsing may become a bottleneck in the load process. However, parsing and reducing the text to a series of surrogate keys can significantly reduce the data storage requirements.
If you need to deal with large volumes of free text, the issue isn't so much the model as it is the effort to parse large volumes of text for the purposes of structuring it. Depending on the resources available to you, parsing may become a bottleneck in the load process. However, parsing and reducing the text to a series of surrogate keys can significantly reduce the data storage requirements.
Re: Text Analytics and Dimensional Models
There are some tools out there that "convert" unstructured data into structured data (tools like attensity). Do we then have the same problem or bottlenec?
The model that you did, did the dimensions store big columns like free text etc or just keywords?
The model that you did, did the dimensions store big columns like free text etc or just keywords?
sgrover3- Posts : 8
Join date : 2011-04-14
Re: Text Analytics and Dimensional Models
There was a phrase table and a keyword table. The phrase table was used to control surrogate key assignment for the multivalued dimension and to reduce the amount of parsing to be done. The source data was search phrases, not long text like documents... they rarely exceeded more than a few words and were often duplicated. The ETL process (including the parsing) was handled using Informatica. Only new phrases were parsed, and after a while the number of new phrases encountered represented a small fraction of all the phrases received.
Re: Text Analytics and Dimensional Models
Wouldn't Google be an example of this, big data that is? I don't think they're using dimensional models. There was an a nice article on big data awhile back in Information Management, here's a link
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» From Enterprise Models to Dimensional Models. Can a single Dimension table be referenced multiple times in a fact table?
» Dimensional models for K-12 education
» Pre-Built Dimensional Models
» Granularity In two different Dimensional Models
» Example of a business process with more than 1 fact table
» Dimensional models for K-12 education
» Pre-Built Dimensional Models
» Granularity In two different Dimensional Models
» Example of a business process with more than 1 fact table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum