Dienstag, 21. April 2009

Sentiment Detection and Opinion Mining

Sentiment detection and opinion mining is currently a hot topic of extracting subjective opinions and assertions from web portals.  A good survey is provided by Pang and Lee. The survey addresses several aspects of this field. Most important in the field of knowledge discovery and text mining is the question how algorithms for analysing unstructured texts written by web users differs with standard text mining tasks like text classification, named entity recognition. From the survey i have taken the following points
  • A smaller number of classes compared to text classification (e.g. positive, ambivalent, negativ vs. Topic hierarchies)
  • Higher dependency on subjective writing stile (e.g. sacarsm)
  • Higher dependency on common sense knowledge: Sentiments can be expressed using non sentiment words and comparing to very good/very bad situation (it feels like driving a car at 360 kmh)
  • High degree of subjectivity: Given the above sentence, some people may like it, some may not
  • Order effects might overthrough frequency effects
Sentiment Tasks

  • Polarity Opinion Classification: Deterine whether a piece of text is good or bad
  • Rating inference/ordinal regeression: Determine the scale of goodness/badness
  • Subjectivity Detection: Detect whether a piece of text contains subjective/objective material
  • Joint Topic/Sentiment analysis

Facts
  • Machine Learning using Unigram Models can achieve over 80% accuracy (Pang et. al. Thumbs Up! Sentiment Classification using Machine Learning)
  • Templates are more stable among domains (compared to IE)
  • Finding correct keywords expressing sentiments seems to be hard ("Go read the book" in movie vs. book domain)
  • Unclear whether bigrams help or not
  • POS Tagging can be considered as a rough version of WSD
  • Syntax has found to be usefull (dependency tree)
  • Negations count (as second feature, by transforming words e.g. NOT, deeper modelling)





Keine Kommentare:

Kommentar veröffentlichen