MITCH
Mining for information in texts from the cultural heritage: An NWO CATCH project
MITCH is part of the CATCH programme (Continuous access to cultural heritage), which aims to expose knowledge hidden in the Dutch cultural heritage. As curator of numerous specimens, Naturalis is obviously a large contributor to that cultural heritage. Less evident, but of no less value are the vast amounts of documents describing these specimens: logs, labels, registries, publications, taxonomies, etc. Combining these sources reveals Naturalis' value: the documents are the key to the information and knowledge about the collection.
A major obstacle in exposing these relations is the sheer quantity of the available data and the different media, formats and methods used to store the data. Many sources of information are only available as paper documents, but the digital information sources are not readily accessible either and far from uniform. Variations range from minor typing errors to the use of different taxonomies, as a result of progressing research and international conventions.
To harness the enormous quantity of existing and future data, the MITCH project will develop the technological utilities to "mine" these data. Research in text mining, has advanced to a level at which language technology and information extraction modules can be used to structure large volumes of unstructured or semi-structured data. The project's goal is to provide the tools to extract, correct, normalize and link data, so that information from different sources can be combined, disclosed and put to better use.
Notice that the focus of this project is the automation of knowledge enrichment and understanding of digital data in flat text and textual object databases. Other projects, such as sister project SCRATCH, will deal with the capture of paper documents and its transformation to flat text.
The MITCH research programme is a joint effort of Naturalis and Tilburg University under the umbrella of NWO/CATCH.
More information
See the MITCH website.
Participants
Research team
Piroska Lendvai
Postdoc Researcher
P.Lendvai (at) uvt.nl
Marieke van Erp
PhD student
M.G.J.vanErp (at) uvt.nl
Steve Hunt
scientific programmer
S.J.Hunt (at) uvt.nl
Coordination
Antal van den Bosch
coordinator Tilburg University
René Dekker
coordinator Naturalis
Former staff
Caroline Sporleder
postdoc researcher
Tijn Porcelijn
scientific programmer
Friday, August 27, 2010