WinPitch Corpus, a Tool for Alignment and Analysis of Large Corpora
Philippe Martin, Université Paris 7  
Project / Software Title :       WinPitch Corpus  
Project / Software URL:  
Access / Availability:       This software is available at  

Description of endangered languages normally starts with the collection of speech data, which are then segmented into various phonological, prosodic, morphological and syntactic units. In this process, the (phonetic) transcription is the most critical part, and user friendly tools are essential to tackle any sizeable work in a reasonable amount of time.

The software program WinPitch Corpus addresses these concerns directly, allowing two modes of operation to handle the data. In the first mode, text is not available and is generated by the user speech segment by speech segment (as it was the case when only analog tape recorders were available). In the second mode, speech has already been transcribed into text, but the text units are not aligned, i.e. a bi-univocal relationship between units of text and units of speech has not been established.

Although some existing software programs operate in the first mode, establishing implicit text and speech alignment in the process, few allow operations in commonly found (difficult) recording conditions such as voice overlapping or presence of noise. This paper introduces briefly some of the important features of WinPitch Corpus, as an efficient tool for transcription and analysis of speech data: slower speech rate for easier transcription, dynamic adjustment of segments with simultaneous display of spectrograms for precise alignment, etc.

Numerous speech analysis tools (fundamental frequency tracker, spectrogram, LPC formant analysis, etc.) are available with a quasi instantaneously display of the results. Support for the simultaneous acoustical analysis of both channels of stereo recordings is also provided.

The program has already been extensively used for analysis of large romance languages corpora of spontaneous speech (more than 1.200.000 words, C-ORAL-ROM, 2003), as well as for the phonetic and phonological description of Parkatêjê, an endangered language of the Amazon spoken by about 300 people (Araújo and Martin, 2003). WinPitch Corpus is available from the web site, under the name WinPitchPro.

Text to speech alignment can be done in two modes. In the first mode, text does not exists, and the user selects blocks of speech (which can be slowed down for playback), and enters the corresponding text (any UNICODE font can be used directly). In this process, a database is automatically built, which can be later saved in XML or Excel® formats.

The second mode of text to speech alignment implies a preexisting text. The speech sound is then played back at a reduced speed (dynamically programmable) while the user clicks on the part of text corresponding to the perceived sound unit. A database of the dynamically defined segments is automatically built (table in the dialog box on the  left).

