A Minimal Infrastructure for the Sustainable Provision of Extensible Multi-Layer Annotation Software for Linguistic Corpora


The project's goal is the design, implementation, evaluation and documentation of a minimal infrastructure for the sustainable provision of research software. By hypothesis, an infrastructure can only be operated sustainably in an academic context if the technical and human resources that have to be provided by the respective academic institution can be minimalised for the long-term from the onset. In a case study providing a multi-layer annotation software for linguistic corpora, the project exemplifies that for such an infrastructure, only four components are strictly necessary: a source code repository platform; a repository providing versions of the software for end users; a repository providing the software's dependencies (e.g., software libraries); a maintainer who administers and publishes the infrastructure and research software, and manages the user and developer communities. Of these components, only the maintainer arguably needs to be funded by the respective academic institution. For all other components, potentially sustainable external infrastructure is available free of charge. A further requirement for the sustainable operation of the infrastructure developed in the project is the technical sustainability of the research software it provides. In the course of the project, the prototype "GraphAnno" will be developed into a stable product. The product, "Hexatomic", has a strong use potential across different linguistic disciplines by satisfying a verifiable high demand on the part of these scientific communities. Additionally, "Hexatomic" will satisfy the requirement for technical sustainability early on in the project, by implementing best practices of software engineering. These include, e.g., reproducible builds through an automated build system; comprehensive documentation of all aspects of the software; a permissive open source license; portability and runability on different operating systems; comprehensive test suites; public provision of the source code; extensibility and adaptability through modularisation and a generic data model; extensive compatibility with other tools and data standards; well-structured community processes. The project evaluates and documents not only the satisfaction of the software's use potential, but also its potential for long-term, project-independent development. This is partly achieved through the acquisition of external contributions of functional modules to "Hexatomic". Moreover, the minimal infrastructure model is not only implemented, its implementation is also documented and tested. The test results are, in turn, documented and condensed into best practices, which represent an important project goal in and of themselves. In combining a hypothesis-driven approach with a case study, the project makes an important contribution towards the evaluation of minimal requirements for sustainable infrastructures for research software.


Principal Investigators
Lüdeling, Anke Prof. Dr. phil. (Details) (German Linguistics / Corpus Linguistics/Morphology)

Participating external organizations

Duration of Project
Start date: 10/2018
End date: 09/2021

Research Areas
Applied Linguistics, Experimental Linguistics, Computational Linguistics, General and Comparative Linguistics, Typology, Non-European Languages, Security and Dependability, Software Engineering and Programming Languages

Last updated on 2021-29-07 at 18:29