Instructions for Working Groups

In general

Once again, we are asking E-MELD Workshop participants to take part in working groups which will help the E-MELD project fulfill its goals. As those of you who were present at last year's workshop know, the mandate of the E-MELD project is to promote community consensus about best practices with regard to the digitization of language documentation. And part of our mission is to develop an educational website designed to explain and illustrate recommended practices, and to facilitate their adoption among linguists and language archivists.

In this context, "best practices" are those which make language documentation as widely available and as enduring as possible. In addition--to accommodate differences in objectives, traditions, and languages of study--"best practices" should be those which allow the individual linguist and archivist as much freedom as possible consistent with long-term data preservation and data interchange. Ideally, we would like to build a site which:
  • invites comments and suggestions, and fosters the sharing of knowledge
  • acknowledges and (if permitted) incorporates the work of other projects
  • motivates linguists to create archive-quality language documentation which will be available to future generations
That is where the E-MELD Working Groups--and you--come in. We need your help to design a website about the digitization of language documentation that is accurate, reasonably complete, and yet clear enough to interest and guide the linguist who does not have extensive technical training.

The design of the website

With a striking lack of originality, we have called our website the "School of Best Practices in the Digital Documentation of Language" (aka "The School"). It is to be found at: http://emeld.org/school/. In accordance with the school metaphor, the site now includes the following "rooms":
  • The Entrance Hall, which explains what "best practice" is and why it's important, and includes aids to navigating the site, a glossary of terms, and credits. It also includes Case Studies, which detail where the documentation we have on show came from, and how it was converted into "best practice" formats. The case studies are designed to aid the user in navigating the site--the user is urged to "follow the path of" various kinds of legacy data, e.g., the Biao Min data which came to us on notecards, the Mocovi data which was converted from Shoebox, and so on.
  • The Exhibit Hall: This room is designed to be an exhibition of what can be done by the consistent use of "best practice": it uses real language documentation to illustrate the distinction between archival and presentation formats, to show how to write metadata, to demonstrate the benefits of linking terminology to concepts in an ontology, and so on. Eventually, the Exhibit Hall will feature 10 typologically diverse languages and 5 different types of language documentation. We are seeking ways to display the data perspicuously, at the same time as we maintain a focus on best practices. This is a difficult balance to strike and we solicit your suggestions.
  • The Classroom: This room is designed to contain lessons and (eventually) online tutorials on aspects of "best practice", e.g., how to handle audio and video material, annotation, character conversion, and metadata. The Classroom is central to the website; so its clarity and accuracy are important. We would appreciate corrections and suggestions (especially about tutorials that we should develop).
  • The Reading Room: This room currently contains only a list of useful background reading and web links. We are aware that the bibliography is by no means comprehensive; and we are hoping that each working group will add relevant items.
  • The Workroom: This room is intended to be a place where linguists can work on their own documentation online, using facilities on the E-MELD servers. At present, we have only a few online facilities (ORE, FIELD, CharWrite, a fledgling terminology mapper); but we are eager for suggestions about other facilities which we might develop.
  • The Tool Room: In contrast to the Workroom, where work can actually be performed online, the Tool Room will provide links to client-side software available for download. We have set up databases and supporting interfaces for information on software and hardware. We are eager to add more information to the databases. We also want to collect more scripts and small "tools" for specific conversion tasks. We solicit your recommendations.
  • Search: There are several search facilities on the site, most notably OLAC search of the metadata of various archives and local search of the documentation in the Exhibit Hall. Although the longterm goal is intelligent searching via the Semantic Web, at present we are using a database search to illustrate an aspect of best practice: i.e., that systematic reference to the concepts in a general ontology enables unambiguous cross-language searching despite varying terminology.
  • Comments: We have provided a means by which users can send feedback to us and view others' comments. We want to encourage use of this facility and also interchanges among users; suggestions on how to do this are welcome.
  • Help: This is the beginning of what we hope will be a much more extensive help facility. By the time of the conference, we will have implemented an "Ask-an-Expert" facility, where linguists will be able to get help from experts in areas such as audio, video, conversion, etc. If you would be willing to serve on the Ask-an-Expert panel, be sure to let us know!
The site was barely begun last year, when we asked our 2003 workshop participants for input. If you were part of that process, you'll see that we have listened to what you had to say, incorporating as much of it as we could and adding a great deal on the basis of your suggestions. However, we are aware that a great deal still needs to be done. That is why we are once again asking you to review the site and help us clarify it, update it, and expand it.


This year we would like you to focus on the structure of the existing School and its "rooms." In terms of the objectives sketched above, what part of the site works, what doesn't, and what should be added?

Before the meeting, please review the areas assigned to your workgroup and come to the working group sessions ready to make suggestions. (The URLs to use as starting points are listed with each working group assignment.) We would particularly appreciate it if you would bring suggestions for relevant bibliography to add to the Reading Room and information on software to add to the Tool Room. We will ask the group leaders to assemble a bibliography and software list for us from these suggestions. And, finally, as in past years, will ask each of the group leaders to submit a report 2-3 weeks after the workshop ends, summarizing the opinions and suggestions of their working group. We will use these to improve the site during the following year.

We list below each of the working groups, with the areas of the site that we would like them to review. We have assigned all currently registered workshop participants to what we hope is an appropriate working group. In many cases, however, we were unsure of your interests. If you would like to be in a different working group, or if we have missed someone, please let us know at: workshop@linguistlist.org

1. The Entrance Hall (including Glossary, Credits)/Reading Room


Chair: Jeff Good, U. of Pittsburgh
Members: Emily Bender, U. of Washington; Shauna Eggers, U. of Arizona; Jonathan Evans, Academica Sinica; Veronica Grondona, Eastern Michigan U; Ada Kovaci, Indiana U; Paul Kroeber, Indiana U; James Mason, The Rosetta Project & ALL Language Archive; Ron Zacharski, New Mexico State U
Liaison: Steve Moran, Eastern Michigan U (steve@linguistlist.org)

The "Entrance Hall" is in some ways the most important part of the site. This is the area that introduces best practice, explaining what it is and how it is important. The concept, as we have implemented it, is largely based on the Gary Simons and Steven Bird's article "Seven Dimensions of Portability for Language Documentation and Description" (http://www.language-archives.org/documents/portability.pdf) published in Language in 2003. If you are unfamiliar with the paper, you may want to look at it. We want to know whether the Entrance Hall adequately represents the concepts in the paper. And, of course, we are also interested in any extensions or revisions you feel need to be made to the principles set forth in Bird and Simons, 2003.

2. The Exhibit Hall/Case Studies


Chair: Arienne Dwyer, U. of Kansas
Members: Laura Buszard-Welcher, Eastern Michigan U; Lyle Campbell, U. of Utah; Östen Dahl, Stockholm U; Naomi Fox, Wayne State U and Linguist List; John Lesko, Saginaw State; Johanna Nichols, U. of California at Berkeley; Udaya Singh, Central Institute of Indian Languages; Wallace Hooper, American Indian Studies Res. Inst. & Indiana U
Liaison: Stephanie Stoll, Eastern Michigan U (stephani@linguistlist.org)

The "Case-Studies" are intended to aid users in navigating the site and also to motivate them to implement BP by showing what other linguists have done. Because of their importance, we have asked several working groups to look at the Case Studies. The "Exhibit Hall" is also intended to motivate, primarily by offering examples of documentation digitized according to best practices. The languages chosen for this area are intentionally varied, and we have different types of documentation for the different languages. This poses some problems for how the data should be presented; and we solicit your suggestions.

3. The Classroom (I): Annotation, Unicode/the Work Room


Chair: Dafydd Gibbon, Universität Bielefeld
Members: Anthony Aristar, Wayne State U; Steven Bird, U. of Melbourne; Zhenwei Chen, Linguist List; Chin-Chuan Cheng, Academica Sinica; Artem Chebotko, Wayne State U; David Harrison, Yale; Will Lewis, CSU Fresno; Donald Salting, North Dakota U; Jozsef Szakos, Providence U. and National DongHua University; Dietmar Zaefferer, Ludwig-Maximilians Universität München
Liaison: Megan Zdrojkowski, Eastern Michigan U (megan@linguistlist.org)

The Work Room includes some online tools which would benefit from your review, e.g., the FIELD tool for lexical input; the OLAC Repository Editor; and Charwrite (http://emeld.org/tools/charwrite.cfm), a tool for inputting Unicode characters on the web. The Work Room will also house a terminology mapper which references GOLD, an ontology of linguistic concepts developed by the E-MELD team at the U. of Arizona. We would like suggestions about other tools that we could provide on our site, especially tools to facilitate use of the ontology. In addition, we are asking you to review the Unicode and Annotation sections of the Classroom. The Annotation section is particularly in need of additional content; and the Unicode section needs review by experts, since linguists are eager for information on Unicode.

4. The Classroom (II): Images, Audio, Video, Conversion


Chair: Peter Wittenberg, Max Planck Institute
Members: Michael Appleby, Eastern Michigan U and Linguist List; Hans-Jörg Bibiko, Max Planck Institute; Yu Deng, Wayne State University; John Lowe, UC Berkeley and The Rosetta Project; Hans Nelson, Brigham Young U; Barbara Need, U. of Chicago; Laurie Poulson, U of Washington; Gary Simons, SIL; Helen Aristar-Dry, Eastern Michigan U
Liaison: Susan Hooyenga, Eastern Michigan U (susan@linguistlist.org)

This section of the classroom is designed to inform linguists about archiving images and recording and archiving audio material and video files. These are topics about which field linguists are hungry for information; however, helpful information is difficult to find. We are relying on this working group to review the information we have made available: Is it up-to-date? Is it complete? We would like to know what sort of advice you give colleagues who consult you about techniques and equipment for multimedia language documentation.

5. The Classroom (III): Archiving, Ethics, Metadata


Chair: Heidi Johnson, Archive of the Indigenous Languages of Latin America
Members: Steve Conley, Ohio State U; Ferdinand de Haan, U. of Arizona; Brian Fitzsimmons, U. of Arizona; Ulrike Glavitsch, Swiss Federal Institute of Technology; Joseph Grimes, SIL International and U. of Hawaii at Manoa; Douglas Parks, Indiana U; Kevin Roddy, U. of Hawaii-Manoa; Doug Whalen, Haskins Laboratories
Liaison: Sadie Williams, Eastern Michigan U (sadie@linguistlist.org)

Writing metadata is an important part of best practice: without metadata a resource is undiscoverable, and an undiscoverable resource is essentially a resource lost. We also want to encourage field linguists to think of themselves as creating a long-lasting collection of important (perhaps irreplaceable) materials and depositing the collection in an established archive. Does the archiving area convey this? Naturally, we also want to encourage respect for intellectual property rights, especially the rights of speaker communities. However, surprisingly little information is available on certain Intellectual Property Right (IPR) issues, no doubt because the legal ramifications have not been explored. We will appreciate any suggestions you can give us for making this area of the site comprehensible, sensible, and helpful. For example, do you know of IPR statements that we could link in as examples?

6. Classroom (IV): Software, Stylesheets/The Toolroom/Case Studies


Chair: Baden Hughes, U. of Melbourne
Members: Ed Garrett, Eastern Michigan U; D. Terence Langendoen, U. of Arizona; Lori Levin, Carnegie Mellon; Mike Maxwell, Linguistic Data Consortium; Manuela Nosky, Microsoft; Steven Shen; Lameen Souag, The Rosetta Project; Ljuba Veselinova, Stockholm U
Liaison: Neil Salmond, Eastern Michigan U (neil@linguistlist.org)

Most linguists want help choosing software and hardware; and this issue is treated in several sections of the School. The Case Studies, for example, are supposed to provide a snapshot of specific conversion processes and an introduction to the tools used. Thus we have asked this working group to review several of the areas of the site. We are aware that these areas are as yet thinly populated, and we are hoping for your suggestions about content. The "Toolroom" section, in particular, is designed to be an exhaustive listing of useful software, by preference annotated for its uses and failings. It is nowhere near complete, and we hope you will help us flesh it out.

In addition to these specific assignments, we welcome any ideas you have about how to improve the School website, or otherwise fulfill the E-MELD mission. A clear and accessible reference site on digital language documentation will benefit the whole linguistics community. Any success we have in constructing one will be attributable primarily to E-MELD Advisors like you, who take part in the working groups and give us the benefit of your experience.

