K. David Harrison, Bernard Spolsky, William Rivers, & Richard Brecht, Swarthmore College, Max Planck Institute for Evolutionary Anthropology

Making Language Variation Visible

How should the study and documentation of languages deal with language and dialect variation? Occurring at all levels (phonetic, phonological, morphological, syntactic, lexical, prosodic) variation has not yet found a place within the emerging set of standards for mark–up, ontology and best practice. Among variation researchers, a wide range of project-specific practices exist side–by–side. But there is as yet no established set of conventions to tag variation so that it may be quantified across or within speech samples. In this poster, we propose a taxonomy for analyzing variation, and further propose ways in which it might be represented within an ontology and mark–up framework.

Variation occurs along two primary dimensions: Characteristics which inhere in the speech event (register, code choice, formality, number of interlocutors, and so on) and characteristics which inhere in the speaker (age, sex, education, background, status). The former may change dynamically during a speech event that is being recorded or annotated. The latter tend to remain stable, at least at the timescale of the speech event, and might thus be coded as part of the metadata associated with a speech event. But dynamic features, if they are to provide any basis for search or cross–comparison across speech samples, need to be coded according to a standard ontology and tag set. This presupposes a tag set, decisions about annotation tiers, and in some cases a notion of the ‘standard’ against which variation is to be diagnosed. Variation may also be diagnosed in a bottom–up fashion once a phonetic and morpho–phonemic annotation is completed. For example, a tool such as ELAN allows users to compile all examples of a particular morpheme or tagged word and thus look for occurrences of allomorphy or lexical variation. These in turn must be linked back to the meta–variables (both static and dynamic) if we are to get beyond anecdotal evidence. Finding phonetic variation is more challenging, because it presupposes a norm, or requires a very precise specification of the searched for environment.

We will present recent work in variation taxonomy that attempts to bridge a number of traditions in research, ranging from the quantitative study of phonetic variation (Labov 2005), to social models (Fishman 2002) to perceptual models (Preston 1999). As the knowledge base of documented languages expands, the study of variation will become more pressing. We present some data from dialect continua languages (e.g., Gulf Arabic, Central Asian Turkic) and suggest how a rational mark-up procedure for variation might be instantiated with respect to any/all of the above variables.

