Baden Hughes, University of Melbourne

A Folk Ontology for Linguistic Markup

GOLD1 is an emerging linguistic ontology and data category registry for morphosyntactic annotation of human language data. GOLD is intended to capture the knowledge of a well–trained linguist — an attempt to codify the general knowledge of the field.

Linguistic intuition about legitimate combinations of analytical concepts (eg nouns take case, pronouns take number, verbs take transitive/intransitive) is well grounded, both theoretically and in established practice. However these broad norms governing how various analytical classes can be used in combination are not explicitly documented.

At a formal level, GOLD is founded on principles of ontological engineering and features rich axiomatization of classes and relations. However, the instantiation of GOLD allows an unconstrained combinatorial approach to analysis, which is counter-intuitive to widely held linguistic notions.

In this paper we explore the commonly held intutions about the organisation of a linguistic system and instantiate these as a GOLD Community of Practice Extension (COPE). Having done this, we are able to provide a number of outputs immediately useful to documentary and descriptive linguistics in the form of shallow hierarcies of features extending core concepts, simplifiying access to GOLD.

Author Bio(s):
Baden Hughes is a Research Fellow in the Department of Computer Science and Software Engineering at the University of Melbourne.