1996 SALS-SIG Seminars
SALS-SIG Research Seminar
A Probabilistic Word Sense Disambiguation Algorithm
Dr How Khee Yin
Artificial Intelligence Lab, National Computer System of Singapore
When: Monday, 18th November 1996
Where: Room E6A357, Macquarie University
Word sense disambiguation (WSD) is an important problem in natural language processing. Recently, there has been much work on statistical approaches to solving the problem (see Bruce and Wieber 94 paper on "Word-Sense Disambiguation using decomposable model", Yarowsky 95 paper on "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora" and Ng and Lee's 96 paper on "Integrating multiple knowledge sources to disambiguate word sense : an exemplar-based approach").
Statistical approaches generally treat WSD as a classification problem where the classes are the number of possible senses. Two main areas of work in such an approach are the number and type of features to use for classification and the classificiation algorithm itself. Both Bruce and Wieber and Yarowsky used a probabilistic classification method while Ng and Lee used an exemplar based method. All three use features such as the surface form, part-of-speech and morphological information of the surrounding words.
In our work, we have developed a simple probabilistic classification method using only information from the coocurrences of surrounding words in their surface form. This minimalist approach to WSD is evaluated on the "interest" data set of Bruce and Wieber and Ng and Lee's sense-tagged BC50 (consisting of 7119 occurrences of the 191 words in 50 selected text files of Brown Corpus) and WSJ6 (consisting of 14139 occurrences of 191 words in 6 selected files of the Wall Street Journal Corpus). The disambiguation accuracy resulting from our method is better or comparable to other statistical approaches when tested on a common data set.
This talk will describe our method, the choice of features, the size of the training sentences required and the results we obtained.
Biography of the Speaker:
Dr How Khee Yin obtained his PhD from U of Edinburgh in 1993. His PhD work is on a temporal framework for understanding instructional text.
Currently, he is employed by National Computer System of Singapore where he is heading the Artificial Intelligence Lab in the Computer Research Division.
|Last modified: July 1997|