Searching lexical trends in Inaugural Addresses, 1789-2009

[Methods note: Following the advice in Natural Language Processing with Python, I ran the following frequency distributions using the startswith command, so that, e.g., when I search for the distribution of America, it includes words such as American, Americans, and America’s.]

The Natural Language Toolkit libraries come with two presidential corpora: the Inaugural Address corpus and the State of the Union corpus, both courtesy of C-SPAN. The Inaugural Address corpus contains every speech from George Washington (1789) to Barack Obama (2009). The SOTU corpus contains all speeches from Truman (1945) to George W. Bush (2005).

The queries we can run on these corpora are almost endless. Downloading NLTK and learning a wee bit of corpus analysis are worth it, if for no other reason than to mine these two collections, which allow us to chart the American zeitgeist across the decades and centuries*. Ours is not an oratorical culture (alas!), but when our presidents deliver these addresses, we listen in a way that, I think, recaptures the importance of our rhetorical past. In general, we don’t listen when presidents take the mic at a press conference or speak in front of rose bushes about some particular matter. And rightly so. Unscripted, presidents can sound very stupid. But when they deliver the inaugural and the state of the union, they know the nation and the world are listening. In SOTU speeches, the leader of the free world speaks to the great concerns of his time; in the inaugural, he issues forth a rhetoric designed to appeal to as many Americans as possible. So, again, I think we can spot some interesting trends in our history by mining the lexical choices in these corpora.

*Now, we have to be careful about how we go about this kind of mining operation. For example, while discussing frequency distributions, the authors of Natural Language Processing with Python point out that, in the Inaugural Address corpus, the words duty and duties become much less frequent in 20th century addresses compared to earlier addresses:

However, (and with all due respect to the authors of that wonderful textbook), it would be misguided to draw from this single lexical fact too many conclusions about our country’s lack of responsibility . . .

Mining these corpora, there may be patterns that may allow us to draw conclusions about a general, abstracted American worldview; however, more often, we will discover things about usage and tone. JFK once said that posterity will judge past generations  solely on their literary tone, and we should primarily think of these corpora as a means by which we can track the evolution of that tone. Other conclusions should be drawn only after the most rigorous of pattern discovery and detection.

For example, based on the following search, I might be tempted to conclude that our nation’s leaders, contra pundits on the religious Right, were decidedly secular until halfway through the 19th century. There’s not a single use of God in inaugural addresses prior to 1821, and multiple reference doesn’t become a regular occurrence until the 20th century.

However, a wider search of synonymous terms would muddle that grand conclusion, and force me to admit that what I’m tracking is not the zeitgeist so much as the style of the zeitgeist:

Many other religious words are flying beneath the radar, as well. George Washington’s First Inaugural references the deity with terms like Invisible Hand and Great Author. So, our early leaders were not entirely secular; they simply had a much wider lexicon for referencing deity. Caution, then, is required when mining corpora for interesting patterns. Patterns must be checked against each other, and they must be robust across many different terms before we rely on them for making wider claims.

Another example: war undergoes an almost exponential decrease in inaugural addresses after World War I.

Obviously, this referential decrease has nothing to do with a decrease in wars to refer to . . . so is it a simple shift in style, in reference, like we saw with God?

Hmm . . . not in this case, at least not with the terms I’ve chosen to test against war. Like Washington’s Great Author, there may be war synonyms flying beneath the surface here. Or there might not be. Maybe we’ve stumbled upon a genuine pattern that speaks—really and robustly—to a new inaugural trend: not talking about our foreign conflicts, either at all or in anything that resembles conflict terminology. War is Peace.

Here’s another interesting pattern in the Inaugural Address corpus: the use of America.

Early presidents didn’t evoke America as often as their 20th century counterparts. Again, there may be a clue here to larger trends and worldviews related to the evolution of an American identity. Or it may simply be that, until the 20th century, presidents evoked “Americanness” in a different way.

Prior to the 20th century, America was less preferred than nation, government of and for the people, and union. The question to ask now: is this trend a matter of style or substance or both? The grand theory I’d love to spin is that America has become a catch-all term in a nation-state no longer unified by a common ethnic and religious heritage, no longer bound in a common union of peoplehood. The highest peak usage for America corresponds to the lowest peak usage for people . . . both of which correspond to the last two decades, when non-European immigration has been at an all-time high and white births have been at an all-time low. Or perhaps the popularity of the phrase “government of the people” has waned as cynicism in the political process has waxed.

Of course, my grand theories are probably wrong. The point is, a pattern does seem to exist here, and it’s worth checking out further. Is it just noise after all? Or does it correspond with the rise and fall of similar terms? And how probable is it that presidents would start rallying a nation around a single titular term (like America) as the nation becomes more ethnically and religiously pluralized? Did Roman statesmen start using Roman more and more often as they extended citizenship to the Celts and Germanians?

More to come as we go into the election . . .


