Collocations can also be modeled by assigning more than one variable to the agents or by adding a dummy agent which gives collocational information, but in view ofin the interests offor the sake of simplicity we do not go into those details.
Topical word associations, semantic word associations, and selectional preferences can also be modeled similarin a similar way to collocations. Complex information involving more than two entities can be modelledmodeled by using n-ary utility functions.
We carried out a simple experiment to test the effectiveness of a the DCOP algorithm. We conducted our experiment in an all wordsall-words setting and used only WordNet (Fellbaum, 1998) basedWordNet-based (Fellbaum, 1998) relatedness measures as a the our knowledge source so that results cancould be compared with earlier state-of-artstate-of-the-art knowledge-based WSD systems like (Agirre and Soroa, 2009; Sinha and Mihalcea, 2007), which used similar knowledge sources asto ours.
Our method performs disambiguation on a sentence by sentence basis. A utility function based on semantic relatedness is defined for every pair of words falling inwithin a particular window size. Restricting utility functions to a window size reduces the number of constraints. An objective function is defined as the sum of these restricted utility functions over the entire sentence and, thus allowing information flow across all the words. Hence, a DCOP algorithm which aims to maximize this objective function leads to a globally optimal solution.
In our experiments, we used the best similarity measure settings of (Sinha and Mihalcea, 2007), which isare athe sum of the normalized similarity measures jcn, lch and lesk. We used used the Distributed Pseudotree Optimization Procedure (DPOP) algorithm (Petcu and Faltings, 2005), which solves DCOP using a linear number of messages amongbetween agents. The implementation provided with the open source toolkit FRODO (Leaute et al., 2009) is used.
To compare our results, we ran our experiments on SENSEVAL-2 and SENSEVAL-3 English all-words data sets. Table 1 shows the results of our experiments. All these results are carried out usingexperiments are carried out usingresults are based on a window size of four. Ideally, precision and recall values are expected to be equal in our setting. But in certain cases, the tool we used, FRODO, failed to find a solution with the available memory resources.
ResultsThe results show that our system performs consistently better than (Sinha and Mihalcea, 2007), which uses exactly the same knowledge sources as used by us (with anthe exception of adverbs in Senseval-2). This shows that DCOP algorithm performa DCOP algorithm performsthe DCOP algorithm performsDCOP algorithms perform better than the page-rank algorithm used in their graph basedgraph-based setting. Thus, for knowledge-based WSD, a the DCOP framework is a potential alternative to graph basedgraph-based models.
Table 1 also shows the system (Agirre and Soroa, 2009), which obtained best results for knowledge basedknowledge-based WSD. A direct comparison between this and our system is not quantitative since they used additional knowledge such as extended WordNet relations (Mihalcea and Moldovan, 2001) and the sense disambiguatedsense-disambiguated glossglosses present in WordNet3.0.
We conducted our experiment on a computer with two 2.94 GHz processprocessors and 2 GBa 2GB2GB of memory. Our algorithm just tooktook just 5 minutes 31 seconds on the Senseval-2 data set, and 5 minutes 19 seconds on the Senseval-3 data set. This is a singablesizeablesignificant reduction compared to the execution time of the page rankpage-rank algorithms employed in both Sinha07 and Agirre09. In Agirre09, it falls in the range 30 to 180 minutes on a much powerfulmore powerful system with 16 GBa 16GB16GB of memory having four 2.66 GHz processors. On our system, the time taken by the page rank algorithm in (Sinha and Mihalcea, 2007) is 11 minutes when executed on the Senseval-2 data set.
Since DCOP algorithms are truly distributed in nature, the execution times can be further reduced by running them parallelyin parallel on multiple processors.
Earlier approaches to WSD which encoded information from a variety of knowledge sources can be classified as follows:
Supervised approaches: Most of the supervised systems (Yarowsky and Florian, 2002; Lee and Ng, 2002; Martinez et al., 2002; Stevenson and Wilks, 2001) rely on the sense taggedsense-tagged data. These are mainly discriminative or aggregative models which essentially pose WSD a classification problema classification problem for WSD. Discriminative models aim to identify the most informative feature and aggregative models make their decisions by combining all features. They disambiguate word by word and do not collectively disambiguate the whole context and thereby do not capture all the relationships (e.ge.g.eg sense relatednesssense-relatedness) amongbetween all the words. Further, they lack the ability to directly represent constraints like one sense per discourse.
Graph basedGraph-based approaches: These approaches crucially rely on lexical knowledge basea lexical knowledge baselexical knowledge bases. Graph-based WSD approaches (Agirre and Soroa, 2009; Sinha and Mihalcea, 2007) perform disambiguation over a graph composed of senses (nodes) and relations between pairs of senses (edges). The edge weights encode information from a lexical knowledge base but lack an efficient way of modellingmodeling information from other knowledge sources like collocational information, selectional preferences, domain information, and discourse. Also, the edges represent binary utility functions defined over two entities which lackslack the ability to encode ternary, and in general, any N-aryn-ary utility functions.
This framework provides a convenient way of integrating information from various knowledge sources by defining their utility functions. Information from different knowledge sources can be weighedweighted based on the setting at hand. For example, in a domain specificdomain-specific WSD setting, sense distributions play a crucial role. The utility function corresponding to the sense distributions can be weighedweighted higher in order to take advantage of domain information. Also, different combinationcombinations of weights can be tried out for a given setting. Thus, for a given WSD setting, this framework allows us to find 1) the impact of each knowledge source individually 2) the best combination of knowledge sources.
Limitations of DCOP algorithms: Solving DCOPs is NP-hard. A variety of search algorithms have therefore been developed to solve DCOPs (Mailler and Lesser, 2004; Modi et al., 2004; Petcu and Faltings, 2005). As the number of constraints or words increaseincreases, the search space increases, thereby increasing the time and memory bounds needed to solve them. Also, DCOP algorithms exhibit a trade-off between memory used and number of messages communicated between agents. DPOP (Petcu and Faltings, 2005) useuses a linear number of messages but requires exponential memory, whereas ADOPT (Modi et al., 2004) exhibits linear memory complexity but exchangeexchanges an exponential number of messages. So it is crucial to choose a suitable algorithm based on the problem at hand.