Technology and the empirical study of writing

A materialist theory of literary form will ultimately have to concern itself with the organic processes of reading and composition, but the way to do this is through empirical study of readers and writers, not more interpretation of texts, or armchair ruminations.  –Cozma Shalizi in a response to Franco Moretti’s Graps, Maps, Trees (128)


Janet Emig initiated the writing process movement when she published The Composing Processes of Twelfth Graders, an attempt to study writing in an empirical way (lower case e; no Lockean baggage implied) by closely observing and polling several high school seniors as they wrote essays. Today, the shortcomings of her study are obvious—the sample size was small, and she had no way to track granular textual changes as they were made in real time. However, despite its limitations, Emig’s work introduced an important assumption to the field of writing studies:

Writing is a natural, organic phenomenon that can be studied empirically.

Unfortunately for Emig and the process movement, writing studies was and is situated in an academic context that requires a sharp pedagogical focus, and the empirical study of writing has little to no educational value. Studying writers can tell us how people write, but it doesn’t necessarily tell us how to teach writing, academic or creative or otherwise.

Intimately tied to the pedagogical critique of the process movement was the political critique. In early studies of writing processes, certain contextual elements (read: race, gender, class) were ignored. Emig, for example, did not deeply address the racial differences of her subjects. Critics claimed that the study of writing processes would not pay enough attention to relevant cultural factors that affect how, where, and why people write. This critique was weak, however, because all empirical pursuits must by design bracket out certain contextual elements in its early stages. As the pursuit progresses and gathers knowledge, the causes and effects (if any) of various contextual factors can be coded and controlled for. Race, class, and gender are such factors—important ones at that—but we needn’t stop there (cross-linguistic differences would be first and foremost on my mind). Dozens of material factors must be taken into consideration when studying writing. Had the process movement not been abandoned, researchers would have gotten around to controlling for all of them.

Then there was the philosophical critique. The goal of studying writing is to build evidence-based theories about this unique human practice from a variety of angles—stylistic, material, cognitive, neuronal, linguistic—and eventually to see how these levels interact. (E.g., What areas of the brain are operational at various stages of writing and re-writing? Are small stylistic changes or large organizational changes more often influential of a text’s shape?  How does textual cohesion emerge? What roles do vision and memory play in the way writers work with their texts on word processors? Do writers across languages and writing systems have completely different processes?)  However, like many humanities disciplines, writing studies has been influenced by postmodernism and is thus adverse to data-driven, quantitative, empirical methods, and not interested in questions—like the ones above—that require these methods. Gary Olson typifies the philosophical critique when he writes that the process movement attempts to “systematize something [writing] that simply is not susceptible to systematization.”

Of course, no evidence is provided for this claim—but then, none is needed. It isn’t a claim at all. It is an a priori assumption, a statement of faith, designed to obviate any empirical work on the subject.

Ironically, the most valid critique of the process movement in writing studies was one that no one ever made: the technological critique. In the 1980s, when the process movement was jostling for academic legitimacy, researchers simply did not have access to the technology that could enable a more robust inquiry into the material, organic processes of writing.

Today, we do have access to that technology. What’s more, the postmodern zeitgeist has waned in its influence, and the political critique was never quite valid. The time seems right for a return, not necessarily to process theory, but to the assumptions it made about data-driven, quantitative, empirical studies of the way humans compose.

Scholars like Chris Anson, Richard Haswell, and Chuck Bazerman are leading the way. Haswell’s call for “RAD” research—replicable, aggregable, data-supported—is essentially a call for empiricism in writing studies. And Anson’s recent report on the use of eye-tracking devices to study writers at work demonstrates how new technologies will enable and benefit this empirical endeavor. Anson’s line of research could lead to major insights into the ways writers access their ‘textual memory’ in order to manage the many semantic strands that comprise any written text. Indeed, this is a perfect example of how technology and RAD methods can test old ideas in writing studies, confirm or complicate those ideas, and fill them in with data-driven details. In 1999, for example, Christina Haas wrote about the way writers manage their texts in Writing Technology: Studies in the Materiality of Literacy:

Clearly, writers interact constantly, closely, and in complex ways with their own texts. Through these interactions, they develop some understanding—some representation—of the text they have created or are creating . . . As the text gets longer and more ideas are introduced and developed, it becomes more difficult to hold an adequate representation in memory of that text, which is out of sight. (117, 121, qtd in Brooke’s Lingua Fracta)

In 2012, enter the eye-tracking software, which can show us where writers look, physically, to develop representations of their texts as they are constructed: What kinds of words or phrases do writers reference most often, as ‘anchors’ for their intellectual wanderings? What are the outside limits of textual vision? Where do writers focus their vision at different stages in the writing process—choosing words, writing sentences, organizing paragraphs, et cetera? Do accomplished writers use their eyes differently than novice writers? Do high-IQ individuals use their eyes less or more while writing; are their visual memories more robust, requiring less visual tracking to make sense of their texts’ cohesion?

Without eye-tracking devices and empirical methodologies, researchers could never hope to answer these and other questions. They would never even think to formulate them.

Outside the field of writing studies, researchers are already using technology to capture and study authorial processes. An IBM study used an application called history flow to study contributions in Wikipedia articles and how numerous contributions endure, change, or are phased out entirely over time. Ben Fry built an amazing visualization of the multiple changes Darwin made to On the Origin of Species across six editions. And on a lesser scale, Timothy Weninger created a time-lapse video that shows the writing of a research paper in various stages (I’m in the process of figuring out how to do something similar, using track changes in Word).

The interest in organic authorial processes extends beyond writing studies, so it boggles my mind that writing studies scholars aren’t at the forefront of this research, which, I grant, is in its early stages. Luckily, things are looking up for RAD, empirical research in writing studies. Now that we can start grounding our theories of composition in real data, it’s only a matter of time before we start gaining empirical insight into this strange, relatively recent human behavior that we call ‘writing’.

Robot Economy


Exiting the Womb is Messy

Hayek and Hazlitt assure us we needn’t worry about the loss of jobs to technological advances because said losses translate to newer jobs elsewhere, specifically in the manufacture and servicing of the advanced technology. This is true, but not the whole story.

Often, the newer jobs are more scarce and harder to obtain—Ford needs only a few engineers to oversee the computers doing the work once undertaken by dozens of assemblers. Efficiency, efficiency, efficiency. What’s more, new technologies are rarely as ‘in-demand’ as the old ones—Ford produces more cars than KUKA produces industrial robots used by Ford to replace workers. Lastly, the manufacture of new technologies rarely occurs where the old technologies were manufactured. Even if total job numbers are equalized, laid-off assembly workers can’t be expected to move from Detroit to South Asia.

The result: today, there aren’t many jobs that allow high school and college graduates to build or create things of value. Luckily, however, the sheer efficiency of the nascent robot economy and the blessed cheapness of outsourced, non-Western labor means that costs are kept low on the products we love in the West. Low costs (buttressed by the welfare state) have fostered the development of the service industry—people of all classes can afford to buy things, and by God, someone has to be there to ship them, retail them, exchange them, install them, repair them, and update them. Following the death of the ‘making things’ economy, the service economy has single-handedly staved off widespread economic depression and mass revolution in the West.

What happens, then, when robots and other advanced, efficient, human-redundant technologies are introduced to the service industry? Surely, it’s cheaper and more efficient to robot-ize certain jobs in retail, wholesale, and every other node in the service network? It is. We already see it happening.

Post robot-ization of the service economy, will some other economy rise and busy the masses with labor? If not, things will get interesting. And messy.

Union busters

Union busters

The service and welfare economies absorb surplus labor with ease, but not without cost. The entire point of the robot economy is to save money and increase efficiency by replacing the surplus laborers with machines who don’t claim disability or need a 401k. The new rich in this scenario will be the designers and programmers of the machines, people who can complete a Computer Sci degree while everyone else bails for Communication Studies; the new poor in this scenario will be the erstwhile service laborers and middle managers, people with low to middling IQs who didn’t even learn HTML and now can’t find a job managing a rental car agency or teaching community college because—Yeah, There’s A Robot For That ™.

This scenario bodes well for high achievement populations—who, however, will begin to gate themselves off in high security neighborhoods as the Rest of Us slowly realize that the Employers of the Next Economy are only looking for advanced robots and the nerds who can make them. (Attacking the robots only has so much symbolic value.)

But it doesn’t stop there, even after the messy revolts of the redundant laborers. If Kurzweil is even half right, the IQ of AI will advance to the point at which robots can replace more than service workers. What happens, then, when even high-skilled, high IQ positions are taken over by more efficient and more perfect machines?

Out of the Womb: Deep Space Employment

When Apollo 11 reached lunar orbit, only 3 men were aboard the space craft. But the safety—indeed, the possibility—of the mission relied on a well-staffed control center. Even today, the comparatively dull, low-earth orbit labors of the symbolic sapiens aboard the international space station are enabled and monitored by dozens of engineers and scientists on the ground.

However, when the U.S.S. Enterprise exits low-earth orbit, it does so on the assumption that a mission control center is no longer needed. Technology has advanced. The pink slips have been sent out in Florida and Houston. Now the only person on NASA payroll in those old places is the nice lady at the front desk of the Mission Control Center Museum. Everyone required for a safe, successful mission is on board the Enterprise. Granted, in Roddenberry’s optimistic vision, the numbers on board are quite large: depending on the series, the Enterprise boasts upwards of thirty crew members. The mission control centers experienced some attrition, but in the end, many of the geeks in suits simply became astronauts.

Humans need not apply

Humans need not apply

Ridley Scott provides a less sanguine vision of the robot economy as seen in deep space. The Nostromo only needed 7 people aboard—no, forgive me, 6 people on board. Ash, the science officer, was an android, and if Oder 937 was any indication, he was really the only necessary crew member. The humans were expendable. An android and an intelligent Auto Pilot are all you need to explore and mine the stars.

The Nostromo, of course, was a cargo ship, hauling not only hundreds of thousands of tons of raw materials bound for earth but also a refinery for processing those materials en route. How many workers were needed in the refinery? Zero.

Aboard the Prometheus is an equally minimalist crew as well as a machine that epitomizes—even more than the most advanced Auto Pilot—the success of the robot economy: the MedPod.



The MedPod diagnoses, treats illness, and performs fine-tuned surgeries with laser-like precision. How many doctors are needed aboard the Prometheus to perform an emergency C-section? Zero. How many nurses? Zero. How many techs? None. The MedPod even sutures. And naturally, it’s self-contained and self-cleaning, so low-wage orderlies are certainly unnecessary. A single, high-tech machine has entirely obviated the need for humans to fill the most human-centric of careers.

The Post-Scarcity Endgame

In this vision, the robot economy has advanced so far that even deep space missions—from the most mundane (the Nostromo) to the most profoundly important (the Prometheus)—can be undertaken with minimal human employment because the bulk of the labor is undertaken by technology, machines, and androids.

What, then, can we assume about the comparably low-tech missions and jobs back on earth? We can assume that the surplus labor has long since died off, which probably only took a few generations given that it was quickly weaned into far-below-replacement birth rates with the help of mass welfare, entertainment, and liberal arts education. The earth is now inhabited by robots—who do the hard labor, the service labor, and even, as we have seen, some of the medical labor, at no cost to the beneficiaries of that labor—and the high-IQ individuals who design, build, and maintain the robots. Put another way: the earth is inhabited by an intelligent upper class and their laboring machines. We can assume, in other words, that the robot economy has stabilized into something like a Post-Scarcity Society. The human population has shrunk and stabilized; the robot population (probably larger than the human one) is entirely low-impact and efficient: they don’t eat, they don’t breed, they can be recycled . . . And as the MedPod demonstrates, many of them don’t even look like us, so any anxiety about Android Rights has proven to be needless.

Back to today . . .

How soon does all of this begin to unfold? we may ask. How quickly will the robot economy rise and advance?

As T.S. Eliot once observed,

The robot economy arrives not with a bang but a burger-making machine.


Text network of Christopher Dorner’s manifesto

I explored the Unabomber manifesto in my last post because another (much less interesting) manifesto has been in the news lately. Here’s a text network of Christopher Dorner’s manifesto. The largest nodes represent words with the highest levels of betweenness centrality.


I don’t think the network shows anything we wouldn’t expect. I was mildly surprised to see that friend does so much work in the text, but it’s likely due to the sections in which Dorner gives (for lack of a better term) “shout outs” to various individuals. A particularly amusing meaning cluster is located toward the center on the left-hand side: “fucking-Christian-awesome-person.”

Otherwise, the network is exactly what we might expect from a text that is part manifesto, part life story, and part “whistleblowing.” (I put that word in quotes because the truth behind Dorner’s accusations is by no means certain.) I had hoped the visualization would uncover something more interesting.

(A side note about news coverage of Christopher Dorner: This article from CNN is fairly typical. It discusses the way Dorner plays on the LAPD’s corrupt past in order to bolster his claims about present corruption.  In doing so, the article lumps together the Rodney King beating and the “Rampart scandal” as representative examples of that corrupt, racist past. However, as all L.A. area denizens (such as myself) know, the Rampart scandal had very little to do with racist officers. The main players in the CRASH unit were black, and that entire scandal was about gang influence in the LAPD. So, it’s interesting to watch that incident being represented as comparable to the Rodney King beating, even though the two incidents pointed to very different problems in the LAPD.)

Text Network and Corpus Analysis of the Unabomber Manifesto


The Unabomber Manifesto—Industrial Society and its Future—was sent to major newspapers in 1995, with an accompanying promise from its author, Ted Kaczynski, to stop exploding things if someone printed the 35,000 word text in full. The New York Times and the Washington Post obliged in September of that year. The manifesto became a major clue in the hunt for the Unabomber, but only a few forensic linguists concluded that Kaczynski, a suspect at the time, had written it. The majority failed to see a connection between the manifesto and other writings by Kaczynski (these are the same people, I can only guess, who remain skeptical about who wrote Romeo and Juliet). In the end, none of it mattered anyway. Evidence found in Kaczynski’s cabin was far more damning than forensic linguistic analyses of the manifesto.

The Manifesto

You expect the manifesto of a domestic terrorist to be insane. Kaczynski is not your average domestic terrorist. A former Berkeley professor of mathematics with a Michigan PhD, Kaczynski could have feasibly published the essay with a legitimate press or magazine and gained a wide academic audience had he not retreated into the woods and his own head. The manifesto is a real argument that, minus its calls for violence, could have been inserted into a legitimate discourse, albeit one that would have resulted in criticism coming Ted’s way.

Ostensibly, the manifesto is a strong critique of contemporary techno-capitalist society. However, if you took a knife to the text, divided it into little passages, you would discover that half of them bend far leftward and could be read aloud without protest in Harvard Yard, while the other half bend far rightward and could only be read aloud without protest at Hillsdale College.

So, there are passages such as this one, which would send heads nodding in every humanities department in America:

The Industrial Revolution and its consequences have been a disaster for the human race. They have greatly increased the life-expectancy of those of us who live in “advanced” countries, but they have destabilized society, have made life unfulfilling, have subjected human beings to indignities, have led to widespread psychological suffering (in the Third World to physical suffering as well) and have inflicted severe damage on the natural world.

Then comes this curveball:

One of the most widespread manifestations of the craziness of our world is leftism, so a discussion of the psychology of leftism can serve as an introduction to the discussion of the problems of modern society in general.

Like many on the left, Kaczynski blames technology and The System for the sad state of the earth and its inhabitants, yet he suggests that the contemporary left (the “oversocialized” left, as Ted puts it) is in fact The System’s most malformed, though logical outgrowth.

At first, I couldn’t recognize the motive behind the manifesto. Its politics seemed too conflicted. Then I noticed a brief mention in Kaczynski’s Wikipedia article that ties him to the anarcho-primitive tradition, and suddenly the text became more philosophically cohesive.

The Manifesto’s Motive

There are two types of anarcho-primitivists: the Rousseau types and the Hobbes types (my own ad hoc terms). The former are human-centric and collectivist. They believe that dismantling techno-capitalist society will usher in an era of equality and harmony between men and women of all races. The latter are earth-centric and individualistic. They believe that dismantling techno-capitalist society will put a halt to overpopulation and environmental degradation, and allow individuals to live more spiritually and physically fulfilled lives.

The goals aren’t mutually exclusive, but nor are they necessarily aligned. (When it comes to immigration, they are outright opposed.) The Hobbesian primitivists tend to believe that nature, for all its beauty and desirability, isn’t a progressive utopia. Who are these Hobbesians? They are the Monkey Wrench Gang radicals, the Edward Abbeys and Doug Peacocks of the environmental movement, the Garret Hardins of ecology, the survivalists, the Timothy Treadwells, the (typically) men who love nature more than humanity but harbor no romanticism about either. Kaczynski would have gotten along well in the Monkey Wrench Gang, who held no love for humans or community or society in the aggregate because, to them, human communities are precisely the problem.

Let’s put these categorizations aside for now and look to the text of the manifesto itself. A text network analysis and an analysis with the Natural Language Toolkit (NLTK) can provide us with grounded data about Kaczynski’s motives as they appear in his manifesto. The motives of all authors—or at least their traces—are always left behind in the lexical choices of their texts. Deliberate, written language is like a rhetorical fingerprint.

Text Network Analysis

As I’ve discussed in other posts, a text network analysis proceeds in the following way: a text is copied into a .txt file; it is imported into some analytic tool (I use Auto Map) in order to remove stop words and to lightly stem the text; then, using the same tool, the text—which has now been expunged of all but significant content words—is run through an algorithm that treats the content words like a network and creates a co-reference list in .csv format. What words are connected to what other words, and how often? (In this analysis, I used a two word gap and a five word gap.) The .csv file is then opened in a network analysis tool (I use Gephi) in order to visualize these semantic connections. Each word is visualized as a node in the network, and words that appear next to each other—again, within a certain word gap—appear as edges.

The two most important network visualizations, in my opinion, show nodes with the highest levels of Betweenness Centrality and the highest levels of Degree Centrality. The latter measures how many total connections a node has to other individual nodes; so, a node with high degree centrality will simply be connected to many other nodes. The former measures whether or not a node is connected to other nodes that themselves have many connections; so, a node with high betweenness centrality will in essence be an important ‘passageway’ between communities within the network. (Here’s an excellent visual description of the concepts.)

In a textual network, a word with high degree centrality is a word used in connection with myriad other words. This simply tells you that a word is used frequently in a text and in a variety of contexts. A word with high betweenness centrality is a word used frequently and in conjunction with other words that also connect to other nodes to form community clusters. This tells you that a word is not only used frequently and not only in many contexts but also that it is used in connection with words that also do a lot of semantic work in the text. A word with high betweenness centrality is a word through which many meanings in a text circulate.

For example, as you see below, psychological has a high degree centrality in the Unabomber Manifesto but not a high betweenness centrality. This lexical item was therefore used frequently and connected to many different words, such as:

psychological techniques

psychological methods

However, the words to which psychological is connected (techniques and methods) do not themselves perform a lot of semantic work elsewhere in the text. Words like psychological are essentially productive creators of bigrams but not pathways of meaning.

Society, on the other hand, not only has a high degree centrality but also a high betweenness centrality. So, the words that it connects to also have further connections and thus do perform semantic work elsewhere in the text.

Here are the text network visualizations:

Nodes with the highest Degree Centrality in the manifesto

Nodes with the highest Degree Centrality in the manifesto

Nodes with the highest Betweenness Centrality in the manifesto

Nodes with the highest Betweenness Centrality in the manifesto

The text is long, so its network is messy. In the 5-word gap network, the manifesto had over 200 separate meaning clusters. In the 2-word gap (seen above), it still had over 150 clusters.

Social, society, people, and human are the words with the highest levels of degree centrality in Kaczynski’s manifesto. Also visible in this network are technology and its derivations, psychological, system, freedom, physical, power, leftist, and modern.

Social, society, and people are the words with the highest levels of betweenness centrality in Kaczynski’s manifesto. Also visible in the network are human, problems, system, change, and natural.

As I mentioned earlier, most commentary on the Unabomber manifesto focuses on a) its attack on technology, and b) its attack on leftism. However, as these text networks demonstrate, the words that do the most semantic work in the text—the words through which most meanings flow—suggest that Kaczynski’s sights were set on society as a whole—its people, its systems. Three other words with relatively many connections—psychological, power, and freedom—further suggest that the ostensible screed against leftism and technology masks a deeper motive that circulates in a diffuse, though nonetheless salient way throughout the text. And in the light of Kaczynski’s possible connection to an anarcho-primitivist tradition, these particularly noticeable nodes make much more sense than they would if we tried to paint him as a madman or, worse, a bitter, conservative academic. If he were only that, we might expect other terms to be more noticeable in the network (e.g., the various derivations of leftism).

One thing a text network does, beyond providing an interesting visualization, is to point the researcher in the direction of terms and n-grams that might be explored more granularly in a corpus analysis tool, such as the NLTK. It provides a map of a text’s semantic circulation, a map that can be followed when we return to the world of pure textuality.

Corpus Analysis

Here is a raw count of the most frequent words in the manifesto:


Certain words weren’t visually important nodes in the text network but were nonetheless used frequently (e.g., goal/s, individual/s, process, industrial, way, work, man, behavior, control ); these words were deployed often but in conjunction with a limited number of other terms. Nevertheless, the 20 most frequent words signify a dual emphasis that makes sense if Kaczynski is a certain kind of primitivist: there is the left-wing emphasis on the ills of society, the system, technology, and control; but there is also the right-wing emphasis on individuals and freedom.

The NLTK can also generate a dispersion plot, which shows where in a text individual words fall. Here is a dispersion plot of the 10 most frequent words:


A striking pattern emerges. Although much has been made of the manifesto’s condemnation of the left, the dispersion plot demonstrates that anti-leftism is not a continuous theme in the text but rather forms the bookends: the manifesto opens and closes with references to leftists, but the bulk of the text does not mention them at all. The focus is elsewhere.

The dispersion of technology and technological provides another striking pattern. More than a third of the text passes before Kaczynski begins to deploy these words in earnest, even though a surface reading of the text leaves the reader with the impression that technological anxieties anchor every aspect of the manifesto.

But compare the dispersion of these supposedly central terms—leftist/s, technology and technological—with the dispersion of other terms in the list. Society, system, people, power, human, and, to a lesser extent, modern all have much more uniform dispersions throughout the manifesto. In other words, these concepts appear more regularly and consistently in each of the manifesto’s 232 numbered paragraphs, and that is precisely what we should expect if Kaczynski is indeed a primitivist who loves nature more than humanity. His ire is most obviously directed at leftists, but more subtly, the motivated energy of his manifesto is pointed in all directions at all society in its malformed, destructive development.