Ling 361, Intro to Computational Linguistics
fall 2002

Homework assignment 2

due at beginning of class, 11:40 am, Monday,Sept. 30

Remember that you are encouraged to collaborate on homework problems, but each person should turn in a separate assignment.

If you wish, you may e-mail this assignment to me as an attachment in Microsoft Word or Rich Text Format (RTF) (a plain text file will be difficult because there's no easy way to do graphics).

1. Hausa, a Chadic language spoken widely in northern Nigeria and Southern Niger, has many different ways of forming noun plurals. This exercise asks you to design finite state transducers for two Hausa plural classes. You can assume throughout that there are no morpheme boundaries in these words, other than those between the stem and the singular and plural affixes.

a. One class of nouns--call it Class E--is common for terms denoting occupations and ethnicities. It has the following singular (masculine) and plural forms (the symbol ' stands for a glottal stop, and functions like other consonants in Hausa):

singular (masc.) plural
ba'askare "soldier" 'askarawa "soldiers"
baduku "leather worker" dukawa "leather workers"
bafada "courtier" fadawa "courtiers"
bature "European man" turawa "European people"
balarabe "Arab man" larabawa "Arab people"
bayarabe "Yoruba man" yarabawa "Yoruba people"

i) Design finite state transducers that relate the lexical representations of Class E singular masculine nouns (that is, the symbols on the lexical tape, as described in Jurafsky and Martin, ch. 3) to their respective surface forms. (If you use two transducers, one between lexical and intermediate levels, the other between intermediate and final levels, one of them should be very simple!)
ii) Design finite state transducers that relate the lexical representations of Class E plural nouns to their respective surface forms.

b. Another class of Hausa nouns--let's call it Class R--has the following pattern of singular and plural forms:

singular plural
azaba "pain", "torture" azabobi "pains", "tortures"
kalma "word" kalmomi "words"
hanya "way", "road" hanyoyi "ways", "roads"
tambaya "question" tambayoyi "questions"
dila "jackal" diloli "jackals"
sana'a "trade", "occupation" sana'o'i "trades", "occupations"
gora "bamboo", "cane" gorori "bamboo", "canes"

i) State in words how the plurals of Class R nouns are formed.
ii) Design finite state transducers that relate the lexical representations of Class R plural nouns to their respective surface forms. What difficulty do you encounter? Can you think of a modification to FSTs that would enable your FSTs to reflect the general rule you stated in i?
iii) Hausa has relatively few nouns ending in a consonant, but there are some, including some words borrowed from English. Some of these have plural forms like those in Class R:

singular plural
tebur "table" teburori "tables"
fensir "pencil" fensirori "pencils"

What changes (if any) do you need to make in your FST for Class R plurals to handle these nouns?

2. There is a surprising number of Web search engines and closely related services out there. See, for example, Search Links or this Kansas City Library Introduction to Search Engines. In this exercise you'll do a comparison of three search engines of your choice.

a. Choose three queries for your comparison (you may want to experiment a bit with several queries on different search engines first). Try to make at least one of the queries "challenging" for the search engines you test. Your queries can all be about the same topic but should seek different information.
i) State each of your queries as a sentence. (For instance, a query might be very general, like: "Tell me about sea otters.", or more specific, like "What was the population of sea otters off the California coast in 1950?").
ii) State each query as you pose it to each of the three search engines you've selected, including keywords and any operators you may have used. Try to pose the same queries to all three search engines, as far as that is possible.

b. Examine your results. For this exercise, we'll just consider the top ten results returned by each engine you test (some studies claim that's all many users have the patience for anyway!)
i) For each result from each search engine, judge whether the result was successful in providing an answer to your query. What proportion of the top five and top ten results are successful (i.e., what is the precision at 5 and precision at 10)? What is the uninterpolated average precision (see Manning and Schütze, p. 535 for the definition of this quantity)? Does the performance of one of the search engines stand out from the others? Does performance on one of the queries stand out from the others?
ii) For each of your queries, provide the URL of the best result you got and the worst result you got. (Don't agonize over this if two or more results seem equally good or bad; just illustrate the range in quality.)
iii) How much overlap is there in the results returned by each search engine?
iv) Try to reformulate the query that returned the worst results, and see how much your results improve.

Back to the main syllabus page