home › event - pruning non-informative text through non-expert annotations to improve sentiment classification

EVENT:

Pruning non-informative text through non-expert annotations to improve sentiment classification

Coling 2010 Workshop

28 August 2010
Beijing, China

 

description

Sentiment analysis attempts to extract the author's sentiments or opinions from unstructured text. Unlike approaches based on rules, a machine-learning approach holds the promise of learning robust, high-coverage sentiment classifiers from labeled examples. However, people tend to use different ways to express the same sentiment due to the richness of natural language. Therefore, each sentiment expression normally does not have many examples in the training corpus. Furthermore, sentences extracted from unstructured text (e.g., I filmed my daughter's ballet recital and could not believe how the auto focus kept blurring then focusing.) often contain both informative (e.g., the auto focus kept blurring then focusing) and extraneous non-informative text regarding the author's sentiment towards certain topic. When there are few examples of any given sentiment expression, extraneous non-sentiment information cannot be identified as noise by the learning algorithm and can easily become correlated with the sentiment label, thereby confusing sentiment classifiers.

In this paper, we present a highly effective procedure for using crowd-sourcing techniques for labeling informative and non-informative information regarding the sentiment expressed in a sentence. We also show that pruning non-informative information using non-expert annotations during the training phase can result in classifiers with better performance even when the test data includes non-informative information.

 

upcoming events   view all 

The Future of Making Things and the Business of Breakthroughs
Stephen Hoover, Keynote Speaker
21 April 2015 - 22 April 2015 | Montreal, Canada
Conferences & Talks  

Disruptive Technologies in Manufacturing (Opening Keynote)
Stephen Hoover
28 April 2015 | Seattle, WA
Conferences & Talks  

The First Five Kilobytes are the Hardest
George Dyson
29 April 2015 | George E. Pake Auditorium, PARC
PARC Forum  

Printed Hybrid Logic Circuits
Janos Veres
29 April 2015 | Berlin, Germany
Conferences & Talks  

IoT User Experience Design
Mike Kuniavsky
12 May 2015 | San Francisco, CA
Conferences & Talks