Chen and Ji (2009a) applied a the normalized spectral algorithm to conduct event coreference resolution: partitioning a set of mentions into events. An event is a specific occurrence involving participants. An event mention is a textual reference to an event which includes a distinguisheddistinguishing trigger (the word that most clearly expresses that an event occurs) and involvinginvolves arguments (entities/temporal expressions that play certain roles in the event). A graph is similarly constructed as in entity coreference resolution except that it involves quite different feature engineering (most features are related withto event triggerthe event triggerevent triggers and arguments). The graph clustering approach yields competitive results by comparing with an agglomerative clustering algorithm proposed in (Chen et al., 2009b), unfortunately; unfortunately. Unfortunately, a scientific comparison amongbetween the algorithms remains unexplored.
Word clustering is a problemtechnique defined as clustering a set of words (e.g., nouns, verbs) into groups so that similar words are in the same cluster. Word clustering is a major technique that can benefit many NLP tasks, e.g., thesaurus construction, text classification, and word sense disambiguation. Word clustering can be solvedcarried out by following a two-step procedure: (1) the classification step, by representingwhich represents each word as a feature vector and computingcomputes the similarity ofbetween two words; (2) the clustering step, which applies some clustering algorithm, e.g., single-link clustering, complete-link clustering, average-link clustering, such that similar words are grouped together.
Ichioka and Fukumoto (2008) applied a similar approach asto Matsuo et al. (2006) for Japanese Onomatopoeticonomatopoetic word clustering, and showed that the approach outperforms k-means clustering by 16.2%.
Word sense disambiguation is the problem of identifying which sense of a word (meaning) is conveyed in the context of a sentence, when the word is polysemic. In contrast to supervised WSD, which relies on a pre-defined list of senses from dictionaries, unsupervised WSD induces word senses directly from the corpus. Among those unsupervised WSD algorithms, graph-based clustering algorithms have been found to be competitive with supervised methods, and in many cases, outperform most vector-based clustering methods.
Dorow and Widdows (2003) built a co-occurrence graph in which each node represents a noun and two nodes have an edge between them if they co-occur more thanmore often thanmore frequently thanabove a given threshold. They then applied the Markov Clustering algorithm (MCL), which is surveyed in section 2.5, but cleverly circumventcircumvented the problem of choosing the right parameters. Their algorithm not only recognizes senses of polysemic words, but also provides a high-level readable cluster name for each sense. Unfortunately, they neither discussed further how to identify the sense of a word in a given context, nor compared their algorithm with other algorithms by conducting experiments.
VĂ©ronis (2004) proposed a graph basedgraph-based model named HyperLex, based on the small-world properties of co-occurrence graphs. Detecting the different senses (uses) of a word reduces to isolating the high-density components (hubs) in the co-occurrence graph. Those hubs are then used to perform WSD. To obtain the hubs, HyperLex finds the vertex with the highest relative frequency in the graph at each iteration and, if it meets somecertain criteria, it is selected as a hub. Agirre (2007) proposed another method based on PageRank for finding hubs. HyperLex can detect low-frequency senses (as low as 1%) and most importantly, it offers an excellent precision (97% compared to 73% for baseline). Agirre (2007) further conducted extensive experiments by comparing the two graph basedgraph-based models (HyperLex and PageRank) with other supervised and non-supervised graph methods and concluded that graph basedgraph-based methods perform close to supervised systems in the lexical sample task and yield the second-best WSD systems for the Senseval-3 all-words task.
In this survey, we organize the sparse related literature ofon graph clustering into a structured presentation and summarize the topic as a five partfive-part story, namely, hypothesis, modeling, measure, algorithm, and evaluation.: Thethe hypothesis serves as a basis for the whole graph clustering methodology,; quality measures and graph clustering algorithms construct the backbone of the methodology,; modeling acts as the interface between the real application and the methodology,; and evaluation deals with utility. We also survey several typical NLP problems, in which graph-based clustering approaches have been successfully applied.
We have the following final comments on the strengths and weaknesses of graph clustering approaches:
(1) GraphA graph is an elegant data structure that can model many real applications with solid mathematical foundations, including spectral theory, and the Markov stochastic process.
(2) Unlike many other clustering algorithms which act greedily towards the final clustering and thus may miss the optimal clustering, graph clustering transforms the clustering problem intoby optimizing some quality measure. Unfortunately, those optimization problems are NP-Hard, thus, all proposed graph clustering algorithms only approximately yield "optimal" clustering.
(3) Graph clustering algorithms have been criticized for low speed when working on large scale grapha large-scale graphlarge-scale graphs (with millions of vertices). This may not be true, since new graph clustering algorithms have been proposed, e.g., the multilevel graph clustering algorithm (Karypis and Kumar, 1999) can partition a graph with one million vertices into 256 clusters in a few seconds on current generation workstations and PCs. Nevertheless, the scalability problem of graph clustering algorithmthe graph clustering algorithmgraph clustering algorithms still needs to be explored which is becoming more important in social network study, which is becoming more important in social network studies, still needs to be explored.
We envision that graph clustering methods can lead to promising solutions in the following emerging NLP problems:
(1) Detection of new entity types, relation types and event types (IE area). For example, the eight event types defined in the ACE program may not be enough for wider usage and more event types can be induced by graph clustering on verbs.
(2) Web people search (IR area). The main issue in web people search is the ambiguity of the person namethe personal namethe person's namepersonal names. Thus, by extracting attributes (e.g., attended schools, spouse, children, friends) from returned web pages, constructing person graphs (involving those attributes) and applying graph clustering, we are optimistic to achieveabout achieving a better person searchperson-search engine.