LINGUIST List 35.1160

Mon Apr 08 2024

Review: Corpus-Assisted Discourse Studies: Gillings, Mautner, Baker (2023)

Editor for this issue: Justin Fuller <justinlinguistlist.org>

LINGUIST List is hosted by Indiana University College of Arts and Sciences.



Date: 08-Apr-2024
From: Aleksandra Uttenweiler <aleksandra.uttenweilergmail.com>
Subject: Applied Linguistics: Gillings, Mautner, Baker (2023)
E-mail this message to a friend

Book announced at https://linguistlist.org/issues/34.2968

AUTHOR: Mathew Gillings
AUTHOR: Gerlinde Mautner
AUTHOR: Paul Baker
TITLE: Corpus-Assisted Discourse Studies
SERIES TITLE: Elements in Corpus Linguistics
PUBLISHER: Cambridge University Press
YEAR: 2023

REVIEWER: Aleksandra Uttenweiler

SUMMARY

‘Corpus-Assisted Discourse Studies’ is part of the ‘Elements in Corpus Linguistics’ series published by Cambridge University Press and edited by Susan Hunston. This series provides compact introductions to the main areas of the field. In their element, Gillings, Mautner, and Baker focus on the use of corpus tools for discourse studies aiming at researchers interested in working in this area. The book comprises seven sections that provide an overview of methods in Corpus-Assisted Discourse Studies (CADS) and reflect on their practical application.

The first section opens with a brief definition of CADS and identifies the target audience as master’s or PhD students and teachers who wish to introduce their students to the method. While the authors note that the term ‘discourse’ is a “notoriously fuzzy notion” (p.1), they use it in a broad sense to include all naturally occurring longer stretches of language that perform social functions. Similarly, CADS is also broadly defined: it refers to a research approach that focuses on social phenomena, rather than purely linguistic ones.

The aim of this element is to provide a ‘how-to’ guide (p.2) for CADS, explaining the implementation of corpus tools in discourse studies, and critically reflecting on the methodology. The book is structured according to these goals, guiding the reader through the process of a model CADS study in Sections 2-5, followed by a reflective part in Sections 6 and 7. The authors acknowledge potential shortcomings of the book, highlighting the brief nature of Cambridge Elements and the spatial limitations it imposes. They also transparently explain their focus on the English language and the tradition of British Linguistics.

Section 2 discusses the advantages of using corpora in discourse studies. It begins with a historical note on CADS. The authors emphasize meta discussions around CADS as a methodology. They briefly consider the distinction between ‘corpus-based’ and ‘corpus-driven’, as well as other labels ascribed to corpus linguistic approaches applied in DS, opting for the use of the label ‘corpus-assisted’ as an “umbrella term” (p.5). The section discusses limitations of traditional DS methods, particularly their poor scalability due to close reading. It then highlights the advantages of using corpus tools, which offer better scalability and focus on uncovering and systematically describing patterns. These patterns are then interpreted using qualitative discourse analytical approaches, taking into account the sociopolitical context in which they appear. The main advantage of CADS is, as stated by the authors, achieving “a useful synergy between CL [Corpus Linguistics] and DS [Discourse Studies]” (p.7) through triangulation, the combining of corpus tools with discourse analytical methods.

Section 3 begins the practical ‘how-to’ part by discussing corpus building. The authors argue for a definition of corpus including representativeness of a particular language variety. The discussion is divided between two forms of corpora: reference corpora and specialised corpora. The first part critically examines reference corpora, questioning if they are in fact representative of the language variety they aim to depict, particularly large English corpora like COCA or BNC2014. The second part considers self-compiled specialized corpora and how to select data necessary for answering a specific research question. It provides some key questions to consider while building or choosing a corpus, including ethical concerns, working with oral texts, the size of a self-compiled corpus, and inclusion of markup.

Section 4 introduces key corpus tools that are useful for CADS. For a comprehensive description, each of the subsections is devoted to one tool: frequency, concordance analysis, collocation analysis, and keyword analysis. The frequency subsection covers creating wordlists, various techniques for calculating word frequencies, exploring frequent n-grams, visualizing dispersion, and the distinction between raw and relative frequency. The keyword analysis subsection emphasizes comparing corpora. In addition, the subsections on corpus tools cover basic terms that are useful for preparing data, such as tokenization, grammatical part-of-speech tagging, and semantic tagging. The importance of co-text is also highlighted, along with potential problems that may arise if the amount of co-text provided in a corpus program is insufficient for interpretation.

While the use of these established tools has been described before in textbooks on Corpus Linguistics (e.g. Stefanowitsch, 2020), the authors focus on their use for discourse studies. They provide multiple examples of how each tool has been used in previous research, demonstrating various ways in which corpus methods can be implemented. Additionally, they indicate which corpus software facilitates each tool.

After introducing corpus tools useful for CADS on each own, Section 5 demonstrates an example of how they can be used together in an ongoing project analyzing court dissents. The process begins by identifying ‘lexical hooks’, specific lexical elements that can be used in the query of a corpus program. Concordance and collocation analyses can then be used to uncover more interesting politeness markers. The example proves that not all previously described tools are always useful and some of them may lead to irrelevant findings. The authors emphasize the importance of balancing orderly progression with the messiness of the analytical process. However, they note that this is only an example. Other projects may use the tools in a different way and order.

While the previous sections offer a practical account of conducting CADS research, from selecting a corpus to interpreting findings, the final two sections critically reflect on the methodology. Section 6 outlines the most important limitations and pitfalls that need to be considered when undertaking a CADS project. Here, the authors discuss cases, where CADS may not be the optimal choice and some of the best practices for working with corpora. The second subsection addresses potential challenges that may arise at different stages of the research process.

Section 7 reflects further on triangulation at the centre of CADS. The authors also discuss the future of CADS, including new ways of measuring keyness and categorizing keywords. Furthermore, they provide some insight into work that has been done in the field on languages other than English. Another aspect the authors also address is accessibility of corpus research methods, literature, and skills, as well as political limitations. The final subsection reflects on dealing with messiness in CADS, which is inevitable as shown in Section 5. The authors suggest that the best way to deal with this mess is to accept it through flexibility, non-linearity, and finding balance in a well-reflected and protocolled process.

The element closes with a useful appendix that briefly describes some of the most common corpus programs.

EVALUATION

While the book is aimed at researchers and students new to the use of corpus linguistic tools in discourse studies, it is assumed that the readers already have some knowledge of traditional discourse studies methods and a basic understanding of Corpus Linguistics. Therefore, it is best suited for readers who already possess theoretical knowledge in both fields but want to develop practical skills in using corpus tools and programs in a combined approach. Regarding the systematic use of demonstrated tools, it can be confusing for early-career researchers to understand the rationale for applying the tools in a specific way and order that is suited for their own research. As the authors note when writing about the order in which they apply the tools “[t]here is nothing canonical about these choices, but experience suggested that they would be reasonable for the project at hand” (p.40). The choices concerning the use of tools seem intuitive in the provided examples, and it would be beneficial to suggest a starting point for those lacking experience.

A broad definition of discourse makes the element applicable to a wide range of uses in discourse studies. However, concerning the definition of CADS, the methodology seems to have a more limited scope than anticipated. The authors define CADS projects as having “a social question at their center rather than a purely linguistic one” (p.1), especially concerning power structures and social hierarchies tied to linguistic choices. In the present state of research this is not necessarily true; and it is inconsistent with the previously stated broad definition of ‘discourse’ (see Flowerdew 2023: 126–127). In fact, the demo project discussed in Section 5 has a linguistic question at its core (what role “language plays in ‘doing’ dissent’” p.39) and is only informed by the social context. Although the definition of CADS theoretically limits the scope of the element, the tools and methods presented can still be applied to a broad range of research projects, not necessarily those focusing on social questions.

Considering the space limitations, it is worth questioning the inclusion of certain topics and aspects of CADS that might be less relevant to the overarching scope of the element, for instance, the discussion of representativeness of large reference corpora in Section 3 or the discussion of Critical Discourse Analysis in Section 2. These are important aspects that should be discussed in a full-length book. However, given the practical aim of the element, it would be more advantageous to use the space to address methodological questions in more depth, for example, elaborating on the label 'corpus-assisted'.

The element builds on previous work on technical aspects of CADS, most importantly Baker 2006. While it covers the same topics and tools as the Baker’s monograph, it is more up to date given the developments in corpus tools availability and technical solutions. By providing many examples from recent CADS research, the presented solutions are applicable to various research questions. The authors have bridged the 17-year gap between both publications, by providing an engaging and concise introduction, especially to readers new to CADS. The text introduces the technicalities of CADS and invites readers to reflect on the possibilities, potential, and limitations of the methodology, regarding both the theory (by reflecting on what approach to discourse we take) and the praxis (by reflecting on balance between close reading of discursive texts and the use of statistical tools).

Most importantly, readers should engage more with the question of how they define ‘discourse’ and CADS for the purpose of their research, as well as what approach they understand under ‘corpus-assisted’. The broadness of these terms, as demonstrated, is advantageous in the case of the compact element, bringing together various perspectives and approaches and making the demonstrated tools accessible to a broad audience. Considering this, the element provides a comprehensive ‘how-to’ CADS, much needed given the developments in the past years, bridging the gap in the literature on methodology. An important advantage of the book are the extensive and transparent reflections on best practices in research tied to CADS. Therefore, the element is a great starting point for everyone interested in engaging not only with the practical use of corpus tools for discourse studies but also with broader theoretical reflections on the methodology.

REFERENCES

Baker, Paul. 2006. Using corpora in discourse analysis (Continuum Discourse Series). London, New York: Continuum.

Flowerdew, Lynne. 2023. Corpus-based discourse analysis. In Michael Handford & James Paul Gee (eds.), The Routledge Handbook of Discourse Analysis. 2nd edn. 126-138: Routledge.

Stefanowitsch, Anatol. 2020. Corpus linguistics: A guide to the methodology. Zenodo.

ABOUT THE REVIEWER

Aleksandra Uttenweiler holds a master's degree in German Linguistics from Leipzig University. She is a PhD student at Leipzig University and Leiden University. Her research interests include Positioning Theory, Discourse Analysis, and Corpus Pragmatics.




Page Updated: 08-Apr-2024


LINGUIST List is supported by the following publishers: