"Precision Content Retrieval": Ambroziak and Woods (1998)
-
The goal here is to improve information retrieval by term expansion in a taxonomy
and by returning just the relevant passage of a document, rather than a whole document.
- Starting hypothesis: multiword phrases can be useful if we "do more with them than
simply look for exact matches".
- The "ConceptStore" program starts with a WordNet-like taxonomy, and
extracts phrases from documents to build a "phrasal taxonomy" on the fly.
- Queries are matched to phrases near them in this phrasal taxonomy. Some
rearrangements of words in a query are permitted when matching a phrase, but
the phrase then incurs a penalty score (better matches have smaller penalty scores).
- The user can view the taxonomy, and revise the query, making it more general
or more specific if appropriate.
-
Results are claimed to be much better (on a task involving queries about the UNIX
operating system) using these methods, as opposed to garden-variety IR tools. But
they point out that improvement is not "monotonic"; every method gets some results that
others miss.
(more on Precision Content Retrieval)
(back to general IR slides)
(on to collocations)
(return to syllabus)