Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

http://wing.comp.nus.edu.sg/~antho/W/W97/#0800

In Madrid, Spain with ACL/EACL 1997

In the past years the development of high-quality and overall language resources has been the focus of many research groups. More recently also the corpus-based extraction of such resources has gained a wider interest. EuroWordNet, Sparkle and Ecran try to package some of this know-how and expertise into state-of-the-art tools and resources that can directly be applied in NLP-based services. In the EuroWordNet project a multilingual database is developed with wordnets for four European Languages linked to the existing Princeton WordNet (version 1.5). Such a database can be used in multilingual retrieval applications but it can also be seen as a starting point for automatic-translation aids, inferencing systems, and information extraction systems. Sparkle and Ecran both address the creation of language resources and technologies for real-world NLP applications in parallel. This objective is carried out through the development of software tools in the areas of shallow parsing and lexical acquisition. These tools are used to induce linguistic knowledge from text corpora and are progressively enriched by the information acquired.

In all three projects the current limits of Linguistic Technology are being explored for their practical benefits. Whereas EuroWordNet aims at the broadening and extension of the Princeton WordNet to a generic multitingual resource which is the first in its kind, Sparkle and Ecran aim at the dynamic anchoring of resources and information to the data and corpora that are of a user’s interest. The availability of these resources and tools is essential for the new generation of applications and products dealing with information in electronic form. The projects have finished their specification phase and are in the process of generating the results. In this workshop we want to discuss the scope and formats of semantic resources and information acquisition tools with scholars in the field and researchers from commercial R&D departments who have experience in developing and using them. The main themes of the workshop are: