Book Series -
Book Title - Agent and Multi-Agent Systems: Technologies and Applications
Chapter Title - A Comparative Study of Two Short Text Semantic Similarity Measures
First Page - 172
Last Page - 181
Copyright - 2008
Author - James O’Shea
Author - Zuhair Bandar
Author - Keeley Crockett
Author - David McLean
DOI - 10.1007/978-3-540-78582-8_18
Link - http://www.springerlink.com/content/v0867641u342pm2

Book Title - Agent and Multi-Agent Systems: Technologies and Applications
Chapter Title - A Comparative Study of Two Short Text Semantic Similarity Measures
First Page - 172
Last Page - 181
Copyright - 2008
Author - James O’Shea
Author - Zuhair Bandar
Author - Keeley Crockett
Author - David McLean
DOI - 10.1007/978-3-540-78582-8_18
Link - http://www.springerlink.com/content/v0867641u342pm2
James O’Shea1
, Zuhair Bandar1
, Keeley Crockett1
and David McLean1 
| (1) | Department of Computing and Mathematics, Manchester Metropolitan University, Chester St., Manchester, M1 5GD, United Kingdom |
Abstract
This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer self-service, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. “Short texts” are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers.
Keywords Natural Language - Semantic Similarity - Dialogue Management - User Modeling - Benchmark - Sentence
Important Points
- Discussion and summary of different kinds of similarities (Taxonomic, related, goal derived and radial)
- Introduction of a (small) test corpora and how the corpora was created. This includes some discussion on how humans rate.
- Statement that co-occurrence measures yield also high similarity values for antonyms

Keine Kommentare:
Kommentar veröffentlichen