Course Projects
-
Project Requirements:
- A proposal of about 2 pages. This will be due on Monday, Nov. 4.
- A project write-up, which depending on what you do, may be much like
a 7-10 page course paper, or thorough documentation of any program you write.
This will be due on Wednesday, Dec. 4.
- A brief presentation of your project (about 15 minutes), in one of the allotted
class sessions. Let me know if you'll want equipment beyond what we normally use.
-
Project Ideas:
Your project should relate, at least indirectly, to the themes and techniques we
are studying in the course. The project might involve evaluating or comparing
existing tools (for example, document classification or
machine translation tools available on the web),
a research paper extending beyond the material covered in class (for example,
methods of probabilistic parsing), or programming and implementing a small system.
Some project ideas:
-
Compare the types of syntactic ambiguity that are prevalent in a couple of
languages. How does the grammar (and possibly the morphology and phonology)
affect what kinds of syntactic ambiguity are present and most common?
Will methods used in one language work for the other(s)? How can we test this?
Try to test some of your hypotheses on small amounts of text.
-
Manning and Schuetze (ch. 5) pose a problem regarding spotting company and other
organization names in text. What strategies can you develop for this task,
keeping in mind that an organization may be referred to in several different
ways (full name, shortened name, abbreviation, nickname, etc.). Test your
algorithms and heuristics on a small corpus.
-
The notion of "semantic distance" crops up in several areas of computational
linguistics. Investigate some of the measures that have been proposed (you
should probably familiarize yourself a bit with WordNet first) and compare how
well each measure helps with performance on a particular problem (e.g., lexical
ambiguity resolution, a small IR task, sorting e-mail messages by topic, etc.)
-
Investigate automatic document categorization (this is a big field so you'll
want to narrow this down!).
-
Investigate quantifier scope ambiguity (as in "Some tourists visited every
museum.") and methods for handling this ambiguity in NLP systems.
-
Prepositions (and particles) are notorious for foreign-language learners because
there is usually no neat correspondence between prepositions in one language and
those in another. Investigate strategies for automatic translation of
prepositions, focusing on two or three languages.
-
Investigate understanding or generation of tense and aspect, and temporal
relations in NLP systems (a big area so you'll want to narrow this down too).
-
Investigate approaches to computational morphology for some types of morphology
we haven't focused on in class, examining the issues that arise in one or two
morphologically complex languages.
-
Choose a small domain (for example, if you wanted to "talk to your television"
about what programs are coming up and have something recorded), and outline in
some detail the dialog structures that would need to be represented in an NLP
system handling this task.
-
We haven't discussed speech recognition, handwriting recognition, or sign
language recognition much, but many of the above topics could be addressed
with those in mind (particularly speech).
These are just examples. I've tried to think of some topics that are in some ways
"off the path" that we are following in the course lectures and readings, but it
is also OK to delve more deeply into something we have discussed in the course.
One thing you quickly learn in this business is that there are very few complete,
watertight solutions to these problems. Progress is achieved in increments. It
can also be difficult to judge in advance how difficult a problem is. That's why
I want to meet with each of you at least once, so that both your expectations and
mine are at an appropriate level.
(return to syllabus)