риторика in Soviet Russia, according to Google’s Ngram Viewer

(File this post under “Low Hanging Fruit” or “Things That Are Really Obvious, so Why Haven’t I Seen Anyone Mention It?”)

It’s well known in rhetorical theory that rhetoric, after a centuries-long hiatus, was finally reclaimed as a legitimate topic of inquiry in the middle parts of the 20th century. Using Google’s multilingual N-gram corpora, we can watch this interest in rhetoric rekindle and rise not only in America but across the Western world . . .

One corpus, however, displays a much more interesting trend and suggests that its country was about, oh, 20 years late to the rhetorical party:

In the 1850s, there was an early spark in printed Russian materials discussing риторика (rhetoric). A quick scan of Russian history shows that those years correspond to the rule of Alexander II and the lead-up to the emancipation of the serfs. However, there’s a sharp drop off around 1922. I wonder what happened in Russia circa 1922 . . .

To be fair, any term in the Russian N-grams corpus would drop off around 1922. Nevertheless, while Europe and America rediscovered the rhetorical tradition in the 1960s and 70s, the rhetorical rekindling didn’t spread to Russia until the late 80s and 90s. I wonder what happened in Russia circa 1991 . . .

I guess Communism isn’t the most fertile soil for a discussion of language, persuasion, and the ways in which words, as George Campbell put it, “enlighten the understanding, please the imagination, move the passion, and influence the will.” Otherwise, why the delay?

Of course, I know very little about the rhetorical tradition in Russia. This is just a trend I noticed while twiddling drunkenly with the N-gram tool. Am I missing something? Did Soviet-era Russians print books about rhetorical issues without using the precise term? What about Russian scholars in the 60s and 70s, when the rest of the Western world was rediscovering rhetoric? I do know that one of the greatest rhetoricians and semioticians of the 20th century, Mikhail Bakhtin, wrote during the early Soviet era.

Then again, he was also exiled to Kazakhstan. His works remained generally unknown or simply unprinted until after his death in 1975.

Text networks of two States of the Union, 1790 and 2012

For my final presidential-themed post, I decided to use Auto Map and Gephi to create text networks of State of the Union addresses. I’d planned to post more than I have below. I went back and forth, creating networks out of first older and then more contemporary addresses, but after a dozen or so, I began to notice an obvious trend . . .

When visualized as a network of lexical communities (or clusters), older addresses—circa 18th and 19th century—tend to display much more cohesion. Certain words are used to discuss certain concepts and in connection with certain additional words; and they aren’t used to discuss other concepts or in connection with other additional words.

In contrast, newer addresses are a mess. They have far more communities but they aren’t as distinct as in the older addresses. Terms are more malleable.**

Text network of Obama’s SOTU address, 24 Jan 2012

Text network of Washington’s SOTU report, 8 Jan. 1790

Take a close look at the text networks. It’s much easier to chart the connections in Washington’s address.

Length is one reason for the greater cohesion in older SOTU addresses. The written reports of Washington, Adams, Jefferson, and Madison are simply much shorter than the delivered speeches of our contemporary leaders. Nevertheless, I think there’s a deeper stylistic preference at work here. The style is tied, perhaps, to the less ambiguous exigencies of a new nation (Washington wasn’t saddled with a centuries’ old bureaucracy or a constituency made of 300 million individuals from varied racial, religious, and class backgrounds). But the stylistic preference is also tied, I think, to the rhetorical ideals that imbued the writings and orations of centuries past.

In his superb essay, “The Spaciousness of Old Rhetoric,” Richard Weaver attempts to explain those old ideals, writing that “the archaic formalism of the old orator was a structure imparted to his speech [or writing] by a logic, an aesthetic, and an epistemology” (185). His essay explores that logic, aesthetic, and epistemology in detail. For my purposes here, it’s enough to note that the ‘archaic rhetor’ was a Platonic rhetor who knew what words to use in what places because all words partook in a larger Truth, a priori deduced. A slippage of signifiers was out of the question. The old rhetor did not equivocate through his lexical choices. 

In contrast, our contemporary leaders let their signifiers slip freely. So, we are left with a much messier network, in which a node that has high betweenness centrality, such as TAX, can be only a short distance from nodes as disparate as CUTS, SUBSIDIES, REFORM, and BIGGER . . .

“Tax” connectivity, Obama’s SOTU

. . . And completely disconnected from nodes that seem much more apropos:

“Help” connectivity, Obama’s SOTU

**Methods note: I left “will” in the network, even though it’s typically a stopword and hence removed. I just found it interesting how often modals such as “will” appear in SOTU addresses, though one would think such addresses were neither the time nor place for discussing the future. The state of the Union up until now . . . but no. How do we go forth? A willingness to go forth seems to define the nation’s state of strength.

Language convergence and divergence

In Graphs, Maps, Trees, Franco Moretti writes the following:

Divergence prepares the ground for convergence, which unleashes further divergence: this seems to be the typical pattern. Moreover, the force of the two mechanisms varies widely from field to field, ranging from the pole of technology, where convergence is particularly strong, to the opposite extreme of language, where divergence is clearly the dominant factor; while the specific position of literature—this technology-of-language—within the whole spectrum remains to be determined (80).

I’m taking this quote out of its context, but I want to zero in on Moretti’s assumption that the rule of linguistic evolution is divergence rather than convergence.

The assumption is true in many respects, but it’s not the whole picture. It’s the whole picture only if we idealize the data . . . and idealize it beyond acceptable measures, in my opinion.

(Note: I can’t say enough good things about Moretti; read his stuff if you haven’t already. I’m just using him as a foil here.)

I’ve written elsewhere on how English itself is a converged language . . . it was birthed in the convergence of Old French and Old English. But the same linguistic convergence occurs at different levels all the time. We call it ‘borrowing’, ‘creoles’, ‘pidgins’, et cetera. As if giving these occurrences a special name obviated the need to deal with them in our linguistic phylogenies and our theories of language change.

Even if most whole languages aren’t always the result of convergence, there is nevertheless convergence on smaller scales. Phonology, morphology, syntax, the lexicon . . . there are many levels at which convergence can occur.


I think there’s much insight to be had from comparing human populations and human languages. Human races do not have distinct boundaries (they seem to form fuzzy clusters); that doesn’t mean they don’t exist, just that they don’t have distinct boundaries. Likewise, ‘languages’ obviously exist, they just don’t have distinct boundaries. In human populations, genotypes and phenotypes separate the clusters to a certain extent; in linguistics, we have phonology, morphology, and syntax.

Fuzzy boundaries . . . i.e., certain dialects (populations) of a language will have received some input from some other linguistic population at the phonological, morphological, syntactic, or lexical level.

faxear . . . an English lexeme (fax) has been absorbed at the level of Spanish verb morphology.

Along the Mexican border, if you show a gringo a made up word that has two adjacent l’s in it (e.g., kohilla), they’re going to pronounce those l’s with Spanish phonology [j].

German has changed at the lexical, syntactic, and idiomatic level thanks to the popularity of English. Try finding heiße used to mean “popular” before 1970. (Or Booty shaken kann, for that matter.)

Languages can be seen as networks of influence at different scales, with more or fewer edges depending on the kind of contact two linguistic populations have had. Not always, of course, but more often than most people think. And there are obvious diachronic implications. Does this mean we should throw out the notion of standard languages? Of course not. It just means that even the purest standard will likely have some admixed influence from another language at some level or in some lexical entry.

Research Soundtrack

I thought I’d start posting the music to which I think, write, read, code, plot, chart, compile, and calculate. I like electronic music for working because it has no lyrics to derail my thoughts.

I still don’t think anyone has discovered a good way to incorporate music and webpages. (“Where’s the STOP button?!”). But embedding some clips won’t hurt. This one dropped last week.


Lexical trends in State of the Union addresses, 1945-2006

Here are some lexical trends in State of the Union addresses from 1945 to 2006. I won’t comment too much on these, which show obvious trends alongside some unexpected ones. Pay attention to more than the rise and fall of the y-axes. Also pay attention to the dates along the X-axes. The more dates listed, the more often the issue is evoked in a SotU address. A term appearing a few times each year can be as interesting as a term that appears a lot every few decades. Like always, I ran these frequency distributions using a startswith command in Python. So, if I searched for a term ‘religio’, it was to count ‘religious’ and ‘religion’ as one, or if I searched ‘German’, it ensured ‘German’ and ‘Germany’ were both counted.

1. What nations earn a presidential shout-out throughout the decades?

Apparently, we’re far more invested in countries far away from us than countries that share our borders. I think it shows how solid our relationships have been with Canada and Mexico for the past six decades. And why not? Mexico and America are in a symbiotic relationship: Mexico gets to avoid revolution by sending their starving proletariat across the border and American business gets an endless supply of cheap labor in the process. And Canada . . . well, what can you say about Canada? Not a lot, according to SotU speeches.

2. Other issues . . .

No one cares about going to space, anymore . . .

The 90s. Bills for LBJ’s Great Society programs start rolling in.

I have no idea what’s up with this trend. Fewer marriages? Dissolution of the nuclear family?

3. Terrorism: the new Communism. 

4. Religious references . . . Regan really liked to make them.

6. Gender and race . . .

Post-1970s equality of the sexes (defined as equality of reference in SotU speeches). Gender-inclusivity is the new norm in these speeches. This is probably the most striking trend I’ve uncovered.

The spikes are obviously in response to the 1965 Civil Rights act.

That’s all for now. If anyone has any search requests (either in the SotU corpus or the Inaugural Address corpus), leave a comment and I’ll post them.

Searching lexical trends in Inaugural Addresses, 1789-2009

[Methods note: Following the advice in Natural Language Processing with Python, I ran the following frequency distributions using the startswith command, so that, e.g., when I search for the distribution of America, it includes words such as American, Americans, and America’s.]

The Natural Language Toolkit libraries come with two presidential corpora: the Inaugural Address corpus and the State of the Union corpus, both courtesy of C-SPAN. The Inaugural Address corpus contains every speech from George Washington (1789) to Barack Obama (2009). The SOTU corpus contains all speeches from Truman (1945) to George W. Bush (2005).

The queries we can run on these corpora are almost endless. Downloading NLTK and learning a wee bit of corpus analysis are worth it, if for no other reason than to mine these two collections, which allow us to chart the American zeitgeist across the decades and centuries*. Ours is not an oratorical culture (alas!), but when our presidents deliver these addresses, we listen in a way that, I think, recaptures the importance of our rhetorical past. In general, we don’t listen when presidents take the mic at a press conference or speak in front of rose bushes about some particular matter. And rightly so. Unscripted, presidents can sound very stupid. But when they deliver the inaugural and the state of the union, they know the nation and the world are listening. In SOTU speeches, the leader of the free world speaks to the great concerns of his time; in the inaugural, he issues forth a rhetoric designed to appeal to as many Americans as possible. So, again, I think we can spot some interesting trends in our history by mining the lexical choices in these corpora.

*Now, we have to be careful about how we go about this kind of mining operation. For example, while discussing frequency distributions, the authors of Natural Language Processing with Python point out that, in the Inaugural Address corpus, the words duty and duties become much less frequent in 20th century addresses compared to earlier addresses:

However, (and with all due respect to the authors of that wonderful textbook), it would be misguided to draw from this single lexical fact too many conclusions about our country’s lack of responsibility . . .

Mining these corpora, there may be patterns that may allow us to draw conclusions about a general, abstracted American worldview; however, more often, we will discover things about usage and tone. JFK once said that posterity will judge past generations  solely on their literary tone, and we should primarily think of these corpora as a means by which we can track the evolution of that tone. Other conclusions should be drawn only after the most rigorous of pattern discovery and detection.

For example, based on the following search, I might be tempted to conclude that our nation’s leaders, contra pundits on the religious Right, were decidedly secular until halfway through the 19th century. There’s not a single use of God in inaugural addresses prior to 1821, and multiple reference doesn’t become a regular occurrence until the 20th century.

However, a wider search of synonymous terms would muddle that grand conclusion, and force me to admit that what I’m tracking is not the zeitgeist so much as the style of the zeitgeist:

Many other religious words are flying beneath the radar, as well. George Washington’s First Inaugural references the deity with terms like Invisible Hand and Great Author. So, our early leaders were not entirely secular; they simply had a much wider lexicon for referencing deity. Caution, then, is required when mining corpora for interesting patterns. Patterns must be checked against each other, and they must be robust across many different terms before we rely on them for making wider claims.

Another example: war undergoes an almost exponential decrease in inaugural addresses after World War I.

Obviously, this referential decrease has nothing to do with a decrease in wars to refer to . . . so is it a simple shift in style, in reference, like we saw with God?

Hmm . . . not in this case, at least not with the terms I’ve chosen to test against war. Like Washington’s Great Author, there may be war synonyms flying beneath the surface here. Or there might not be. Maybe we’ve stumbled upon a genuine pattern that speaks—really and robustly—to a new inaugural trend: not talking about our foreign conflicts, either at all or in anything that resembles conflict terminology. War is Peace.

Here’s another interesting pattern in the Inaugural Address corpus: the use of America.

Early presidents didn’t evoke America as often as their 20th century counterparts. Again, there may be a clue here to larger trends and worldviews related to the evolution of an American identity. Or it may simply be that, until the 20th century, presidents evoked “Americanness” in a different way.

Prior to the 20th century, America was less preferred than nation, government of and for the people, and union. The question to ask now: is this trend a matter of style or substance or both? The grand theory I’d love to spin is that America has become a catch-all term in a nation-state no longer unified by a common ethnic and religious heritage, no longer bound in a common union of peoplehood. The highest peak usage for America corresponds to the lowest peak usage for people . . . both of which correspond to the last two decades, when non-European immigration has been at an all-time high and white births have been at an all-time low. Or perhaps the popularity of the phrase “government of the people” has waned as cynicism in the political process has waxed.

Of course, my grand theories are probably wrong. The point is, a pattern does seem to exist here, and it’s worth checking out further. Is it just noise after all? Or does it correspond with the rise and fall of similar terms? And how probable is it that presidents would start rallying a nation around a single titular term (like America) as the nation becomes more ethnically and religiously pluralized? Did Roman statesmen start using Roman more and more often as they extended citizenship to the Celts and Germanians?

More to come as we go into the election . . .