The Meaning of Corpus Linguistics

Wolfgang Teubert

University of Birmingham

It is content that sets language apart from other immaterial structures. A musical score or a mathematical notation represents indexical signs, but not arbitrary ones. They are devoid of content. What they stand for is not up to negotiation. Language signs, on the other hand, are arbitrary. We can and do argue what a sign means. A language sign does not refer to a discourse-external ‘thing’; it refers to other signs. This is why linguistics cannot be a strict science. There is no true meaning. For meaning is not reducible to statistics or any other formal algorithm. Our quest of meaning has to turn to fuzzy textual evidence, and it is only corpus linguistics that can help us. It has the answer to the hermeneutical enterprise, this greatest of human achievements, initiated by Aristotle in the west and Zhuangzi in the east. It is what makes sense of the social, spiritual and natural world.

For me, the corpus-driven approach to meaning implies that we have to closely listen to what utterances tell us about the meaning of other utterances. The meaning of a singular text, or of one of the recurrent text segments of which it is composed, is the entirety of what has been said about it, of how it has been paraphrased. What we need is a tool picking up candidates of what may be such paraphrases. Another tool that would come in handy would detect the intertextual links tying any new utterance to those earlier utterances to which it is a reaction. Discourse is plurivocal. It speaks in many voices. We need software that shows which voice infects subsequent texts, while leaving other texts untouched. We have to learn to which extent one network of texts, held together by a set of lexical items defining it, isolates itself from other such networks, or undermines them. This is why we have to study the diachronic dimension of discourse. The meaning of a recurrent text segment, of a lexical item, is never stable; it evolves, and it evolves differently in different networks. Statistics are, of course, an indispensible tool to extract promising candidates from the corpus. But statistical findings are not a substitute for meaning. It all depends on our research question what constitutes the meaning of a lexical item. Only humans can sort through the candidates picked up by the program and decide what is relevant or not. The fullness of meaning is never available to us, just as any map is not an accurate representation of the land it shows, but an interpretation of it. When we query the meaning of a text or a recurrent text segment, we aim for an interpretation, an interpretation that will be biased by the categories underlying our research question. Corpus linguistics, as I imagine it, provides the practical foundation to the programme of hermeneutics, i.e. the art of interpretation.

It is we, the discourse participants, and not the lexicographers, who make meaning, and we do it together. By interpreting what has been previously said, by remarking and commenting on it, new ideas will emerge. If they are picked up in subsequent utterances they will have an impact on discourse. Making sense of the world is a collective enterprise, and we all can take part in it. Together we have the power to change the reality confronting us in what we are told. Discourse is, in principle at least, democratic. Every person has a voice. It is, however, a freedom we must take care to guard. The current battle of the internet might easily do away with it. We are increasingly swamped with narratives that aren’t ours, and if those in charge don’t like our narratives, they call them fake news and edit them out. They want only their reality to confront us, depriving us of our voice. Corpus linguistics, as I envisage it, has what it needs to take stock of such subtle changes of meaning designed to force us to accept their reality, not ours. It is up to us to fight back.

