Wednesday, 30 September 2009

The day on blogs is just a question of vocabulary

The method of referencing the Swedish Institute in computer science
analyzes the similarities between the words used. This will enable SMEs to easily identify the communities that speak them.

The classification of content that run on social media like blogs through simple methods, considers the Swedish Institute in Computer Science. The evidence: one of his teams focused on a test basis, the similarity of vocabulary. This method, they label themselves as "naive," sort of way relevant blogs according to their content. Interest: allow users to more easily find blogs dealing with subjects that interest them. And for small and medium businesses have simple tools to improve control of their image on the web by identifying communities of blogs may talk about them. But also find similarities between groups are unaware that sometimes, to better target their marketing campaigns.
Choose the words to analyze
All words used in blogs are obviously not taken into account. Disposed those considered too common, such as articles, but also the words too unusual. Those who do not stand a handful of times over several months are often to be spelling errors that result from polluting. This in-between that analysis is conducted. The method accelerates the process and improves the relevance. By eliminating these sets of words, a certain amount of blogs are de facto eliminated from the scope of the analysis. Those whose content is too small to be classified, but also and especially the most splogs (blogs garbage). From the similarities observed, the software then establishes a ranking of blogs in groups.
Establish communities are unaware
These communities are content, even if their authors are not necessarily connected. The categories are as diverse as can be the blogosphere: politics, books, technology, music ... The researchers note that they exist even within these groups amount of subgroups that show as many features. Blogs of a group are more or less interrelated: it is therefore a prioritization that takes place, and the software highlights in graphical form. The authors conclude by explaining that simplicity of their method avoids creating artificial links at any price between all the blogs. A through which, they say, fall easily more complex methods.