Robert Munro and David Nathan, School of Oriental and African Studies, London

Towards Portability and Interoperability for Linguistic Annotation and Language-specific Ontologies

There are two core reasons for representing language–specific ontologies. The first is to support language speakers. The largest (and growing) user group for endangered languages materials are the speakers of endangered languages, and these speakers are rarely interested in looking at linguistic categories or navigating via them. The second motivation is linguistic, as language–specific ontologies will help determine a language’s morphosyntactic structures. Even where the relationship between a language-specific ontology and that language’s morphosyntax is indirect, for many researchers the most interesting phenomena are those not found universally.

The central task in constructing a general linguistic ontology is mapping the phenomena of a given language to the concepts in the ontology, and extending the ontology where no viable mapping exists. Mapping independent features is fairly straightforward, but mapping a full language–specific ontology to a general model is a complicated problem. This is especially true when part of the meaning of a language–specific ontology derives from sociocultural phenomena – it may not be meaningful to store the data independent of its presentational context, and so the context itself must be recorded in a structured format, or with richly annotated instances of photos, diagrams and/or videos.

In this study, we will survey a variety of linguistic documentation materials to report on the range of ontological categories and structures that may be required to integrate such phenomena into the GOLD framework. These will include Yolngu kinship terminology, Karaim geographical deixis, Betta Kurumba ethnobotany, and Rama lexical groupings.