Linguistics 420
Statistical Natural Language Processing (NLP)
Spring 2006

Course goals

``Every time I fire a linguist, the performance of our speech recognition system goes up.'' - Fred Jelinek.

Although the quote by Jelinek may be an extreme position, it is no exaggeration that anyone wishing to work in natural language processing (NLP) must have some understanding of the statistical methods in common practice. This course will introduce you to the fundamentals of statistical NLP. Statistical NLP builds on ideas from many fields, including linguistics, probability theory, information theory, programming, and computer science. We will see how these fields provide us with tools to engage in part-of-speech (POS) tagging, parsing, word sense disambiguation, machine translation, and information retrieval.

The focus of the course is very data-driven, meaning that students will be working with large corpora and will be learning how to handle such large pieces of data. Applying statistical techniques to large corpora will also allow us to examine collocations and n-grams, along with techniques for categorizing text.

We will be focusing on statistical methods in the context of particular tasks, e.g., parsing. However, all of the methods we will use are applicable to a range of tasks in NLP, and thus this course provides an essential platform for finding one's way in the field of NLP.

Instructor: Markus Dickinson

Office: Intercultural Center (ICC) 452

Phone: 687-5753

E-mail: mad87 AT georgetown DOT edu

Office hours: (at least for the first week)

M 3:00-4:00pm
R 3:00-4:00pm
  or by appointment

Meeting time: R, 4:15-6:45pm

Classroom: Reiss Science Building (REI) 282

Course website: http://www9.georgetown.edu/faculty/mad87/06/420/

Course notes will be posted to this website.

Credits: 3

Course prerequisites: Introduction to NLP (Ling 362) or permission of instructor. Some programming experience is expected.

Textbook:

Course requirements:

Academic Misconduct:

As signatories to the Georgetown University Honor Pledge, and simply as good scholars and citizens, you are required to uphold academic honesty in all aspects of this course. You are expected to be familiar with the letter and spirit of the Standards of Conduct outlined in the Georgetown Honor System and on the Honor Council website. As faculty, I too am obligated to uphold the Honor System, and will report all suspected cases of academic dishonesty.

Students with Disabilities:

Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations.

I rely on the Academic Resource Center for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Academic Resource Center are encouraged to do so (202-687-8354; http://ldss.georgetown.edu/index.html).

Schedule:

I'm including both the main slides (.pdf) and the more printer-friendly versions (3x3.pdf). If you want a different size (e.g., 2x3), let me know.

Month Week Day Date Topic Reading Assignments
Jan. 1 R 12 Intro to class (.pdf, 3x3.pdf) M&S, ch. 1  
  2 R 19 Probability Theory (.pdf, 3x3.pdf) M&S, ch. 2.1 HW1 due
        Programming (1): Python (.pdf, 3x3.pdf)    
  3 R 26 Collocations (.pdf, 3x3.pdf) M&S, ch. 5 HW2 due
Feb. 4 R 2 Information Theory (.pdf, 3x3.pdf) M&S, ch. 2.2 HW3 due
        Programming (2): UNIX (.pdf, 3x3.pdf)    
  5 R 9 POS tagging (.pdf, 3x3.pdf) M&S, ch. 9, 10 HW4 due
  6 R 16 Corpora and Linguistic Annotation (.pdf, 3x3.pdf) M&S, ch. 3, 4 HW5 due
        Programming (3): NLTK (.pdf, 3x3.pdf)    
  7 R 23 Probabilistic Context-Free Grammars (.pdf, 3x3.pdf) M&S, ch. 11 HW6 due
        Programming (4): XML (.pdf, 3x3.pdf)    
Mar. 8 R 2 Probabilistic Parsing (.pdf, 3x3.pdf) M&S, ch. 12 HW7 due
  9 R 9 NO CLASS, SPRING BREAK    
  10 R 16 Lexicalized Parsing  
        Programming (5): Practicum    
  11 R 23 Text categorization (.pdf, 3x3.pdf) M&S, ch. 16 HW8 due
        PP attachment (.pdf, 3x3.pdf) M&S, ch. 8  
  12 R 30 Word Sense Disambiguation (.pdf, 3x3.pdf) M&S, ch. 7  
Apr. 13 R 6 Programming (6): Practicum 2   HW9 due
  14 R 13 NO CLASS, EASTER BREAK    
  15 R 20 Statistical Machine Translation (.pdf, 3x3.pdf) M&S, ch. 13  
  16 R 27 General techniques (.pdf, 3x3.pdf)   HW10 due
May 17 R 11     Final HW due

Disclaimer

This syllabus is subject to change. All important changes will be made in writing, with ample time for adjustment.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 syllabus.tex

The translation was initiated by Markus Dickinson on 2005-10-26


Markus Dickinson 2005-10-26