Gary F. Simons, SIL International

Beyond the Brink: Realizing Interoperation through an RDF Database

The participants of the inaugural EMELD workshop in 2001 easily reached consensus on three points:

  • XML markup provides the best format for the interchange and archiving of endangered language data.
  • No single schema for XML markup can be imposed on all language resources.
  • Linguists need to be able to perform queries across multiple resources.

But herein lies a fundamental problem: How do we interoperate across resources when those resources use different markup schemas and the linguists have used different terminology in their analysis and description? At the heart of EMELD's solution to this problem lies GOLD, which we hope will one day embody a community-wide consensus on a shared ontology of linguistic concepts. This provides the basis for interoperation.

The preceding talk by Will Lewis describes how language resources are brought to the brink of interoperation by transforming them to XML markup and using termsets and language profiles to map the linguist's terminology onto the shared concepts of GOLD. This talk picks up the thread and demonstrates how those resources are taken over the brink to enable queries over once disparate resources.

The solution uses RDF (Rich Description Framework) which is the base technology developed by the World Wide Web Consortium for building the so-called Semantic Web. In RDF, an information resource is expressed by a semantic representation in terms of a set of triples, in which the value may be another object or a literal, and in which the attributes and common objects of the problem domain are defined in an ontology. In an RDF database, the semantic representations of multiple information resources are loaded into a single store which is then queried through a special-purpose language.

After illustrating the RDF representation of GOLD-aware language resources, the talk will explain how the resources described by Lewis are translated to an RDF representation. This step is done with SIL (Semantic Interpretation Language), a metaschema technology developed within the EMELD project. Finally, results to date with RDF database queries over disparate language resources will be demonstrated.