Robert Vann, Western Michigan University

Digitizing and transcribing field recordings of Catalonian Spanish

Advances in technology and falling prices have given ordinary computers the ability to digitize and preserve analog field recordings for later linguistic analysis. Nevertheless, standards for creating and archiving digitized texts are still emerging (Bird & Simons 2002). This paper describes the practices chosen to create and archive digital texts in an ongoing project to document the variety of Spanish spoken in Catalonia, Spain. This dialect has not been documented previously with textual representations of spontaneous speech such as published transcripts or digital audio recordings.

Specifically, I discuss the practices chosen to economically digitize and transcribe 20 hours of audiocassette field recordings with maximal fidelity, portability, and potential access to the digital materials created. Digital recording freeware was configured to sample analog signals at 16 bits and 44kHz in AIFF. Transcriptions were word-processed orthographically with minimal formatting to avoid idiosyncratic representation and to make the transcripts useful for a variety of different linguistic purposes. Finally, digital materials were archived on CD to be published alongside the transcripts in print.

In describing these practices and their rationale, my paper contributes a case study to the debate surrounding methods of digital text creation and archiving in linguistic research. The practices involved in creating and archiving digital texts for this particular project may be useful to others involved in similar endeavors. Accordingly, I hope to contribute to the conference goal of working towards ''best practice'' recommendations for digitizing and annotating texts and field recordings.