Word Sense Disambiguation: Decision Trees on Bigrams
-
Pedersen (2001) exploits the "one sense per collocation" property, but also relies on features that
occur in the context of an ambiguous word
-
Word sense disambiguation within the SENSEVAL competition, starting with a manually sense-tagged corpus
(senses are drawn from WordNet, which tends to be quirky and make overly fine sense distinctions);
this is therefore supervised WSD
-
One goal: "establish clear upper bounds on the accuracy of diambiguation" without resorting to
"substantial pre-processing" (i.e., parsing and sematnic interpretation)
-
In conjunction with this goal, discover good feature sets that can distinguish word senses
-
Issues to be dealt with:
- selecting bigrams that are good disambiguators; hence, choosing good statistical measures
of significance("power dievergence"--like Chi-square, or Dice coefficient); note that the bigrams may
not include the ambiguous word iteself, though of course many useful ones do!
- designing the decision trees (sequences of decisions based on the bigrams) to choose a sense
for a new, untagged occurrence of an ambiguous word; some features count for more than others
in this model, unlike Bayesian classifier models; a machine-learning algorithm constructs the decision
trees, using the most informative bigrams for distinguishing among senses
-
Three benchmark methods for comparison:
- always choose the most frequent sense ("majority classifier")
- Bayesian classifiers, using every word in the tagged corpus as a feature
- decision stump: the simplest decision tree, with a single criterion (the most informative)
for making the choice among senses
-
Results and analysis:
-
Both statistical methods performed well in finding significant bigrams
-
Overall, decision trees outperformed the benchmark methods, but decision stumps, using just
the most informative bigram, did fairly well
-
Most of the features identified initially as statistically significant don't need to be used in the
decision trees (i.e., adding them wouldn't improve accuracy)
-
Decision trees are more easily interpreted by people than Bayesian classifiers
(2)
(on to syntactic ambiguity resolution)
(return to syllabus)