WordNet and Other Lexical Resources: Applications, Extensions and Customizations


In Pittsburgh, Pennsylvania, USA with NAACL 2001

Endorsed by SIGLEX.

Lexical resources have become important basic tools within NLP and related fields. The range of resources available to the researcher is diverse and vast - from simple word lists to complex MRDs and thesauruses. The resources contain a whole range of different types of explicit linguistic information presented in different formats and at various levels of granularity. Also, much information is left implicit in the description, e.g. the definition of lexical entries generally contains genus, encyclopaedic and usage information.

The majority of resources used by NLP researchers were not intended for computational uses. For instance, MRDs are a by-product of the dictionary publishing industry, and WordNet was an experiment in modelling the mental lexicon.

In particular, WordNet has become a valuable resource in the human language technology and artificial intelligence. Due to its vast coverage of English words, WordNet provides with general lexico-semantic information on which open-domain text processing is based. Furthermore, the development of WordNets in several other languages extends this capability to trans-lingual applications, enabling text mining across languages. For example, in Europe, WordNet has been used as the starting point for the development of a multilingual database for several European languages (the EuroWordNet project). Other resources such as the Longman Dictionary of Contemporary English and Roget’s Thesaurus have also been used for various NLP tasks.