Earlier this year, I compiled two small corpora of article abstracts from the most prominent journals in the American fields of rhetoric and writing studies: Rhetoric Society Quarterly and College Composition and Communication, respectively. The RSQ abstracts stretch from Winter 2000 (30.1) to Fall 2011 (41.5), for a total of 220 abstracts. The CCC abstracts stretch from February 2000 (51.3) to September 2011 (63.1), for a total of 261 abstracts. I think that article abstracts are a good vantage point for looking at disciplinary trends, because (in the humanities, anyway) researchers tend to write abstracts that function like movie previews. Designed to appeal to a specific disciplinary audience, abstracts signal that their articles ‘belong’ in the field by using all the right buzz words, name-dropping all the right researchers, and making all the right stylistic moves that make other researchers want to read the article.
Using Python and the Natural Language Toolkit to explore these two corpora of abstracts, I’ve discovered both interesting and unsurprising things about how rhetoric and writing studies have taken shape, over the last decade, as separate but ambivalently related disciplines. One of the more interesting pieces of capta demonstrated by the corpora is that the words ‘writing’ and ‘rhetoric’ share grammatical contexts with very different lexical items, suggesting that each word means something different in each journal.
Before I get to the details, though, here’s a bit about my methodology:
With Python and NLTK, you can chart how a word is used similarly or differently in two corpora. For instance, a concordance of the word ‘monstrous’ in Moby Dick reveals contexts such as ‘the monstrous size’ and ‘the monstrous pictures’. Running a few extra commands, you discover that words such as ‘impalpable‘, ‘imperial’, and ‘lamentable’ are also used in these same contexts. Running an identical search on Sense and Sensibility, however, reveals that ‘monstrous’ shares contexts with quite different terms: ‘very’, ‘exceedingly’, and ‘remarkably’. Dissimilar contexts reveal different connotations for ‘monstrous’ in each novel, positive or neutral in Austen but negative in Melville. This, basically, was the method I applied for mining the usage of ‘rhetoric’ and ‘writing’ in the abstracts corpora (more details below the tables, though).
‘Rhetoric’ occurred 244 times in RSQ abstracts and 69 times in CCC abstracts. ‘Writing’ occurred 22 times in RSQ and 251 times in CCC. I compiled common grammatical contexts for each term in each corpus. Each context took the form,
where N was any term and x was ‘rhetoric’ or ‘writing’, respectively.
The more and more contexts shared by two terms, the more and more likely it is that the two terms, within the specific corpus, are used interchangeably. One way to get your head around this fact is by looking at grammatical contexts without an operative term:
(1) I _ you
In an English corpus, the words that appear in that _ context will be semantically limited. Hundreds, if not thousands of words, will indeed fit in that context, but given such a large list of lexical items, all the items will nonetheless share some kind of discerning semantic value: for example, all the words that can appear in the context of (1) can only be transitive or di-transitive verbs, and they cannot be 3rd person present verbs. Right off the bat, this context has limited its possible terms down to a fraction of all the words in the English lexicogrammar. Throw in a second context, and the list of terms grows even smaller:
(2) is _ by
Given rules of English morphology and semantics, most of the words that appear in this context will be past tense action or emotive participles (e.g., loved, felt, killed, written, eaten, trapped). Terms that can appear in both (1) and (2) are quite limited: only transitive or di-transitive verbs, no 3rd person presents, and now, no irregular verbs (e.g., written, wrote, eaten, ate).
If we start using contexts that contain more than just semantically null stopwords on both sides, it’s easy to see how the list of terms can grow very short very quickly:
(3) I _ girls
What kind of words can appear in (1), (2), and (3)? No irregular verbs, no 3rd person present verbs, and now, probably no di-transitive verbs, given the lack of a definite article before ‘girls’ (e.g., I put the girls to bed). Words that can appear in all three of these contexts would likely be words that are easily grouped together in some meaningful way.
So, when a corpus analysis tells us that two words share half a dozen or more contexts in a specific corpus, you can see how these words might share not only grammatical but semantic and definitional attributes within the corpus. The simple example of ‘monstrous’, ‘lamentable’, and ‘imperial’ in Melville demonstrates this statistical fact. This fact is also proved by the large number of contexts (20!) shared by ‘writing’ and ‘composition’ in the CCC abstracts, two words that I knew, a priori, were synonymous in the American field of writing studies. The analysis bears out this a priori knowledge, thus confirming the methodology.
While the terms sharing 2 or 3 contexts in the tables above are interesting, our attention should be focused on the terms near the top of the lists. In RSQ, ‘language’, ‘discourse’, ‘art’, ‘persuasion’, ‘theory’, and ‘texts’ tell us indirectly what the word ‘rhetoric’ means in that journal; in CCC, ‘writing’, ‘composition’, ‘education’, ‘place’, and ‘theory’ provide the same information.
The highlighted terms are the terms that overlap between each journal’s set of common contexts. The overlap is minimal. For ‘rhetoric’, only a single word (‘theory’) overlaps and surfaces in more than 3 distinct contexts in each journal; for ‘writing’, no word overlaps and surfaces in more than 3 distinct contexts. More telling is that ‘writing’ and ‘rhetoric’ themselves possess a high degree of interchangeability in CCC, sharing 7 distinct contexts, but a very low degree in RSQ, sharing only 2 distinct contexts. In other words, these capta suggest that ‘writing’ and ‘rhetoric’ mean nearly the same thing in CCC but do not mean the same thing at all in RSQ.