Access to digital archives via metadata and lemmatization


The project will develop a tool for improved access to selected digital historical text archives. The tool will support queries for lemmata (head words) and allow for the flexible composition of a corpus based on metadata. The mechanisms and resources necessary for this (databases, lexica, morphological analyzers) will be made accessible via web services. The project is geared towards Polish, which represents a rather difficult test case due to its rich morphology and orthographic variation. At the same time, this should make the methods transferable to other languages. Cooperation with the CLARIN-D centers in Saarbrücken, Tübingen, Nijmegen, Berlin and Leipzig is instrumental for the realization of this project.


Principal investigators
Meyer, Roland Prof. Dr. (Details) (West Slavic Languages)

Financer
BMBF - HU als Unterauftragnehmerin

Duration of project
Start date: 01/2013
End date: 08/2015

Last updated on 2022-08-09 at 03:09