Donnerstag, 26. Februar 2009

Recurrent Neural Networks for Robust Real-World Text Classification

Garen Arevian
2007 IEEE/WIC/ACM International Conference on Web Intelligence

ABSTRACT






This paper explores the application of recurrent neural networks for
the task of robust text classification of a real-world benchmarking
corpus. There are many well-established approaches which are used for
text classification, but they fail to address the challenge from a more
multi-disciplinary viewpoint such as natural language processing and
artificial intelligence. The results demonstrate that these recurrent
neural networks can be a viable addition to the many techniques used in
web intelligence for tasks such as context sensitive email
classification and web site indexing.

Noteworthy

  • Use of recurrent neural networks (Elman Networks) with a context layer, able to consider word orders
  • Further references for NN's in text mining
  • Title based semantic representation (at least pointers to prior literature on the topic)
  • Word order was not important
  • The claim made that NNs acn outperform other classifiers is very strong and does not hold in general








Montag, 16. Februar 2009

Information Retrieval System Evalution: Effort, Sensitivity, and Reliabilitiy

Information Retrieval System Evaluation: Effort, Sensitivity, and Reliability
Mark Sanderson, Justin Zobel

The paper is excellent in terms of comparing IR Systems and the difference in MAP and other measures. A must read for evaluation.


Abstract: The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests over-estimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.

Important Aspects Covered:

  • Brief introduction to statistical significance testing in IR (how and why)
  • Summary of results found by Zobel and Vorhees/Buckley:  8-9% MAP difference on 25 topics (conf = 95%)m, 5-6% MAP difference on 50 topics (conf = 95%)
  • Impact of significance testing on projecting MAP accuracy.
  • Large difference in MAP does not necessarily imply a statistical significant difference, especially on small topic set sizes (e.g. 25). At its worst, comparison must be significant and the difference for MAP must be higher than 10%.
  • T-Test produces lower error rates than sign and Wilcoxon test
  • MAP is more reliable than P@10, but building a reliably P@10 only collection should be cheaper (From an assessors point of view). However, the stability of shallow pool sizes is unclear, not yet tested.