Wallace Hooper, American Indian Studies Research Institute, Indiana U.

Abstract (with title):
Models for Integrated Text and Lexical Data at AISRI American Indian Studies Research Institute (AISRI) researchers have collected, digitized, and analyzed large amounts of language data in text, sound, and video formats, focusing on the Siouan and Caddoan language families. Since 1995 we have developed two software applications to integrate, manage, and exploit language data in different document structures and media formatsbilingual dictionaries, text corpora, and sound and video recordings. The Indiana Dictionary Database processor(IDD) builds bilingual dictionaries; the Annotated Text Processor (ATP) processes and manages large corpora of interlinear texts. Both applications have significant multimedia capabilities. ATP is capable of linking sound and video recordings with text and dictionary data at any level of granularity, and supports tasks of transcription and analysis.

We will discuss our experience and treatment of the problem of how to model text corpora and dictionary structures so that both can be composed of the same kinds of component objects, thereby facilitating use of dictionary resources in text glossing, and the importation of appropriate text examples into dictionaries. It is now clear that ATP modeling strategies can equally effectively be used for dictionary entries.

ATP needed to support very flexible data models that allow us to incorporate and integrate text, discourse, and lexical data in many different document structures in an integrated processing environment that also supports effective and interesting queries; it has effectively evolved into an XML-processor that supports the metadata and ontology schemes of OLAC and EMELD.