All Your Data Are Belong To Us

In the blink of an eye, sci-fi dystopia becomes reality becomes the reality we take for granted becomes the legally enshrined status quo:

“One of our top priorities in Congress must be to promote the sharing of cyber threat data among the private sector and the federal government to defend against cyberattacks and encourage better coordination,” said Carper, ranking member of the Senate Homeland Security and Governmental Affairs Committee.

Of course, the pols are promising that data analyzed by the state will remain nameless:

The measure — known as the Cyber Threat Intelligence Sharing Act — would give companies legal liability protections when sharing cyber threat data with the DHS’s cyber info hub, known as the National Cybersecurity and Communications Integration Center (NCCIC). Companies would have to make “reasonable efforts” to remove personally identifiable information before sharing any data.

The bill also lays out a rubric for how the NCCIC can share that data with other federal agencies, requiring it to minimize identifying information and limiting government uses for the data. Transparency reports and a five-year sunset clause would attempt to ensure the program maintains its civil liberties protections and effectiveness.

Obama seems to suggest that third-party “cyber-info hubs”—some strange vivisection of private and public power—will be in charge of de-personalizing data in between Facebook and the NSA or DHS:

These industry organizations, known as Information Sharing and Analysis Organizations (ISAOs), don’t yet exist, and the White House’s legislative proposal was short on details. It left some wondering what exactly the administration was suggesting.

In the executive order coming Friday, the White House will clarify that it envisions ISAOs as membership organizations or single companies “that share information across a region or in response to a specific emerging cyber threat,” the administration said.

Already existing industry-specific cyber info hubs can qualify as ISAOs, but will be encouraged to adopt a set of voluntary security and privacy protocols that would apply to all such information-sharing centers. The executive order will direct DHS to create those protocols for all ISAOs.

These protocols will let companies “look at [an ISAO] and make judgments about whether those are good organizations and will be beneficial to them and also protect their information properly,” Daniel said.

In theory, separating powers or multiplying agencies accords with the vision of the men who wrote the Federalist Papers, the idea being to make power so diffuse that no individual, branch, or agency can do much harm on its own. However, as Yogi Berra said, “In theory there is no difference between theory and practice, but in practice there is.” Mark Zuckerberg and a few other CEOs know the difference, too. They decided not to attend Obama’s “cyber defense” summit in Silicon Valley last week.

The attacks on Target, Sony, and Home Depot (the attacks invoked by the state to prove the need for more state oversight) are criminal matters, to be sure, and since private companies can’t arrest people, the state will need to get involved somehow. But theft in the private sector is not a new thing. When a Target store is robbed, someone calls the police. No one suggests that every Target in the nation should have its own dedicated police officer monitoring the store 24/7. So why does the state need a massive data sharing program with the private sector? It’s the digital equivalent of putting police officers in every aisle of every Target store in the nation—which is likely the whole point.

Target, of course, does monitor every aisle in each of its stores 24/7. But this is a private, internal decision, and the information captured by closed circuit cameras is shared with the state only after a crime been committed. There is no room of men watching these tapes, no IT army paid to track Target movements on a massive scale, to determine who is a possible threat, to mark and file away even the smallest infraction on the chance that it is needed to make a case against someone at a later date.

What Obama and the DHS are suggesting is that the state should do exactly that: to enter every private digital space and erect its own closed circuit cameras, so that men in suits can monitor movement in these spaces whether a crime has been committed or not. (State agencies are already doing it, of course, but now the Obama Administration is attempting to increase the state’s reach and to enshrine the practice in law.)

“As long as you aren’t doing anything wrong, what do you care?”

In the short term, that’s a practical answer. In the future, however, a state-run system of closed circuit cameras watching digital space 24/7 may not always be used for justified criminal prosecution.

The next great technological revolution, in my view, will be the creation of an entirely new internet protocol suite that enables some semblance of truly “invisible” networking, or perhaps the widespread adoption of personal cloud computing. The idea will be to exit the glare of the watchers.

Hindi 101

I’m taking Hindi 101 this semester. The Devangari script feels mildly ornate in my hand compared to the angularity of alphabets descended from the Phoenician script (including the English alphabet), but it is quite lovely and not as challenging as I had imagined. It is still an alphabet, after all, with a much closer sound-grapheme correspondence than one finds in English, where each letter—particularly vowels—can correspond to multiple phonemes. (English grammar is absurdly simple compared to all other major languages, but our spelling system must be a nightmare for foreign learners. There’s something to be said for language academies that control the drift between pronunciation and spelling.) Devanagari does, however, omit some vowel sounds and uses secondary or “dependent” vowel forms in most contexts, so it has something of the syllabary about it. In fact, the biggest mistake I make in class is to confuse two dependent vowels,  ी and  ो. The former is long “ee”, the latter is “o”, but in certain fonts (including my own handwriting), they look nearly identical.

The script’s biconsonantal conjuncts are mostly intuitive, though a few bizarre ones need to be memorized as separate graphemes. We have conjuncts in English, but I believe they are a relatively new innovation with limited usage. One example is the city logo of Huntington Beach, California. Hindi has a lot of these, and they are quite common.

clip_image002_0001.201144028_std

An English biconsonantal conjunct.

Apart from learning a new script, the most enjoyable part of Hindi class has been coming across Romance or Germanic cognates. At an intellectual level, I know and have long known that Hindi and English, both Indo-European languages, share a genetic ancestry, which means that at some point in the distant past all Indo-European speakers spoke the same language. It’s easy to get a handle on the concept when talking about Romance languages: Spanish, Italian, and French all used to be Latin. There, we have a well documented history, stretching back through the Renaissance and middle ages to the familiar  world of Rome. However, when it comes to Proto Indo-European, we are faced with a deeper and wider canyon of time and an ancient world that is mostly unknown to us. The PIE speakers were probably living in the Pontic-Caspian steppe lands, but some evidence suggests that they may have been living in the greater Anatolian region; perhaps the most direct descendants of Proto Indo-Europeans are today’s Armenians, Turks, and Persians. They apparently kicked ass and took names because Indo European now stretches from the Pacific to the Indian Oceans.

But whoever they were, the PIE speakers are remote in a way that the Romans or Germanic tribes are not. Yet while doing my Hindi homework, every now and again I come across a word that clearly indicates the ancient linguistic (and genetic) connectedness between the Romans, the Germans, and the Hindi speakers. Kamiz for shirt; mez for table; kamra for room; mata for mother; pita for father; nam for name; darvaza for door . . . In Hindi class, when I say a word out loud that is clearly related to a European word, I am intoning sounds close to the ones that came from the lips of those ancient Indo-Europeans before they split eastward and westward to conquer Eurasia. To language nerds like me, it’s a chilling sensation.

Distorting time to deny inevitability

The latest issue of Rhetoric Society Quarterly has its authors engaging with “untimely historiography,” which, as near as I can tell, is an attempt to complicate the notion of time as a one-way river of cause and effect. Most of the essays (I’ve read two and skimmed the others) seem to share a common distrust of grand narratives and a distaste for histories that look beyond the contingency of particular events. Cause and effect, linear time—these are human constructs that make sense of distort an otherwise irreducibly complex mess of events.

The chronological anxiety in these essays is of the sort recently addressed by Ted Underwood in Why Literary Periods Mattered. There is of course good reason to be skeptical about grand narratives and historical theories, so I’m sympathetic to much of what is said in these new essays, and I find value in taking a critical look at constructions of linearity in history. However, as genetics blogger Razib Khan notes, acknowledging the dangers of over-generalization presents us with “problems to be grappled with, not a ‘get out of jail’ card to be thrown at any attempts to construct a formal system of interpretation.” Khan’s post is aptly entitled “Human History is Both Contingent and Inevitable,” and I think this both/and worldview is intellectually useful. It makes room for the radical contingency argued for by Michelle Ballif and others without foreclosing on legitimate linear interpretations of history. Thinking about history as both contingent and inevitable leads us to ask where it’s one or the other, to disentangle where it’s more one than the other.

Not everyone would agree with my sentiment, to put it mildly. As an example, I’ll quote from Hans Kellner’s essay “Is History Ever Timely?”*, in which he recounts a talk given by Hayden White:

In 1967, Hayden White . . . journeyed to Colorado to deliver a talk at a conference on biology. At this conference he spoke on the topic “What is a Historical System?” in which he contrasted a historical system with a biological system. In effect, he said that biological—that is, genetic—systems are timely. By this he meant that one’s biological state had been determined in the past by genetic ancestral code. Today we would speak of DNA. But is this true of historical, cultural ancestry? Are we historically determined in the matter of who we are? Is our historical identity as fixed by the timeliness of time and genetic logic as our biological identity is? At that conference, White said, “no.”

A resounding answer, one that, I believe, many scholars in the humanities would echo. It also rejects my olive branch to both sides of the question. It implicitly denies the possibility that culture and history might exhibit large-scale patterns or processes due to the influence of biology, geography, demographics, economics, and so on.

Kellner continues with an example that White used to prove his point: the Christianization of Europe as a culturally created event that needn’t have occurred:

Cultural communities are constituted on the basis of a shared agreement about the choice of historical ancestors. There are times, however, when people lose faith in their chosen identities . . . The example White cited at the time was the crisis of the seventh and eighth centuries in Northern Europe, when a Romanized world saw that the source of their identity had been changed beyond recognition, and a new candidate for that identity had emerged in the teachings of Christian missionaries. As White put it, when the Germanic peoples of northern Europe decided that they were no longer the cultural descendants of ancient Romans or of pagan barbarians, and that their cultural ancestors were Palestinian Jews with whom they had no biological connection at all, a new culture was formed. Backwards. This did not need to happen. Just as the pin on which one sat might have never been noticed if the pain had not caused it to exist for us, so the “Christianization” might have never happened . . .

But is it true that Northern Europe switched identities and cultures as effortlessly as Kellner’s gloss implies? It seems to me a highly contested statement. The Holy Roman Empire was a hegemon among Europe’s warring monarchs and tribes for a time, and, as White describes, the Church Fathers went to great lengths to adopt for themselves and for Europe a foreign Jewish culture and history, but to suggest that the Scots, the Anglos, the Franks, and the Iberians stopped being Scots, Anglos, Franks, and Iberians just because they became Christian is a gross overstatement belied by the constant warfare and power-plays that constitute European history (you’d think White and Kellner would be more careful about hasty generalizations!). It’s like saying the Persians stopped being Persian when they were conquered by the Muslims. Culture runs deep, precisely, I think, because it is tied to and influenced by processes much more intransigent than individual human whim. I don’t believe culture is a costume ready to be changed in a generation or two, and any attempts to do so often result in backlashes or corrections. One might even argue that during the middle ages Europe was just waiting for its monarchs to re-assert their power over Rome so they could all go back to fighting one another again. And indeed they did.

Now, I’m sympathetic to the political sensibility from which I think all this emerges—the idea that if history is not inevitable then the future is, to some extent, in our hands, ready to be constructed in a more just and moral way. On the other hand, if the movement of history is inevitable, then humans can have no agency over their (often unjust) cultures and behaviors, no more agency than they have over their genetics. Such is the “Cormac McCarthy” view of the world, McCarthy having famously said that wishing the species could be “improved in some way . . . will make your life vacuous.” It is an antipathy to this view that brings out the poststructuralist and postmodern tendencies in these RSQ essays, whose authors deny inevitability to history by denying the linear shape of time altogether. Get rid of linear time and any notion of inevitability disappears with it.

I grew up watching wildlife documentaries, so I was inured from a young age to the McCarthy view. It probably didn’t help that I read Blood Meridian in tenth grade. Nevertheless, I try not to err in extremes, so although my default position on culture is determinism of all types—genetic, geographic, demographic, historical—I enjoy challenging and often replacing my default assumptions. I think those who err on the other side—no determinism of any type, history is always contingent—should likewise challenge their default assumption. Hopefully we can meet in the middle.

Hayden White asked:  Are we historically determined in the matter of who we are? Is our historical identity as fixed by the timeliness of time and genetic logic as our biological identity is? He answered no, but I think we should answer, Sometimes yes and sometimes no. It depends on what you’re talking about. The intellectual challenge is to figure out what is (or was) contingent and what is (or was) inevitable. Does history exhibit patterns and cycles? What are the large-scale processes which stand outside of but influence cultural expressions? Do certain cultural expressions change according to broadly identifiable patterns, while others exhibit no patterned changes whatsoever? How do irreducibly contingent moments interact with larger historical processes? Interesting questions, in my opinion, ones that the cliodynamicists are trying to answer mathematically. Will they be successful? Maybe, maybe not. But before the fact, I don’t think we should, to quote Khan again, “throw our hands up in the air and assume that all of history is a contingent darkness from which we can’t infer general patterns.”

 

*Kellner’s essay is a sensible discussion of the ways that texts, films, and images create connections across great gaps of time to re-figure the past in terms of the present. It’s an excellent piece, and I’m simply using these carefully extracted quotes as a foil.

Elliot Rodger’s Manifesto: Text Networks and Corpus Features

Analyzing manifestos is becoming a theme at this blog. Click here for Chris Dorner’s manifesto and here for the Unabomber manifesto.

Manifestos are interesting because they are the most deliberately written and deliberately personal of genres. It’s tenuous to make claims about a person’s psyche based on the linguistic features of his personal emails; it’s far less tenuous to make claims about a person’s psyche based on the linguistic features of his manifesto—especially one written right before he goes on a kill rampage. This one—“My Twisted World,” written by omega male Elliot Rodger—is 140 pages long, and is part manifesto, part autobiography.

I’ve made a lot of text networks over the years—of manifestos, of novels, of poems. Never before have I seen such a long text exhibit this kind of stark, binary division:

RodgersBetweennessCentrality

This network visualizes the nodes with the highest betweenness centrality. The lower, light blue cluster is Elliot’s domestic language; this is where you’ll find words like “friends”, “school,” “house,” et cetera . . . words describing his life in general. The higher, red cluster is Elliot’s sexually frustrated language; this is where you’ll find words like “girls,” “women,” “sex,” “experience,” “beautiful,” “never”  . . . words describing his relationships with (or lack thereof) the feminine half of our species.

It’s quite startling. Although this text is part manifesto and part autobiography, I wasn’t expecting such a clear division: the language Elliot uses to describe his sexually frustrated life is almost wholly severed from the language he uses to describe his life apart from the sex and the “girls” (Elliot uses “girls” far more frequently than he uses “women”—see below). It’s as though Elliot had completely compartmentalized his sexual frustration, and was keeping it at bay. Or trying to. I don’t know how this plays out in individual sections of the manifesto. Nor do I know what it says about Elliot’s mental health more generally. I’ve always believed that compartmentalizing frustrations is, contra popular advice, a rather healthy thing to do. I expected a very, very tortuous and conflicted network to emerge here, indicating that each aspect of Elliot’s life was dripping with sexual angst and misogyny. Not so, it turns out.

Here’s a brief “zoom” on each section:

RodgersDegreeCentralityDomestic

RodgersDegreeCentralityWomen

In the large, zoomed-out network—the first one in the post—notice that the most central nodes are “me” and “my.” I processed the text using AutoMap but decided to retain the pronouns, curious how the feminine, masculine, and personal pronouns would play out in the networks and the dispersion plots. Feminine, masculine, personal—not just pronouns in this particular text. And what emerges when the pronouns are retained is an obvious image of the Personal. Rodgers’ manifesto is brimming with self-reference:

RodgersPronouns

Take that with a grain of salt, of course. In making claims about any text with these methods, one should compare features with the features of general text corpora and with texts of a similar type. The Brown Corpus provides some perspective: “It” is the most frequent pronoun in that corpus; “I” is second; “me” is far down the list, past the third-person pronouns.

Here’s another narcissistic twist, found in the most frequent words in the text. Again,  pronouns have been retained. (Click to enlarge.)

RodgersFreqWords

“I” is the most frequent word in the entire text, coming before even the basic functional workhorses of the English language. The Brown Corpus once more provides perspective: “I” is the 11th most frequent word in that general corpus. Of course, as noted, there is an auto-biographic ethos to this manifesto, so it would be worth checking whether or not other auto-biographies bump “I” to the number one spot. Perhaps. But I would be surprised if “I,” “me,” and “my” all clustered in the top 10 in a typical auto-biography—a narcissistic genre by design, yet I imagine that self-aware authors attempt to balance the “I” with a pro-social dose of “thou.” Maybe I’m wrong. It would be worth checking.

More lexical dispersion plots . . .

Much more negation is seen below then is typically found in texts. According to Michael Halliday, most text corpora will exhibit 10% negative polarity and 90% positive polarity. Elliot’s manifesto, however, bursts with negation. Also notice, below, the constant references to “mother” and “father”—his parents are central characters. But not “mom” and “dad.” I’m from Southern California, born and raised, with social experience across the races and classes, but I’ve never heard a single English-only speaker refer to parents as “mother” and “father” instead of “mom” and “dad.” Was Elliot bilingual? Finally, note that Elliot prefers “girl/s” to “woman/en.”

RodgersGirlsGuys

RodgersMotherFather

RodgersNegation

RodgersSexEtc

Until I discover that auto-biographical texts always drip with personal pronouns, I would argue that Elliot’s manifesto is the product of an especially narcissistic personality. The boy couldn’t go two sentences without referencing himself in some way.

And what about the misogyny? He uses masculine pronouns as often as he uses feminine pronouns; he refers to his father as often as he refers to his mother—although, it is true, the references to mother become more frequent, relative to father, as Elliot pushes toward his misogynistic climax. Overall, however, the rhetorical energy in the text is not expended on females in particular. This is not an anti-woman screed from beginning to end. Also, recall, the preferred term is “girls,” not “women.” Elliot hated girls. Women—middle-aged, old, married, ensconced in careers, not apt to wear bikinis on the Santa Barbara beach—are hardly on Elliot’s radar. (This ageism also comes through in his YouTube videos.) Despite the “I hate all women” rhetorical flourishes at the very beginning and the very end of his manifesto, Elliot prefers to write about girls—young, blonde, unmarried, pre-career, in sororities, apt to wear bikinis on the Santa Barbara beach.

I noticed something similar in the Unabomber manifesto. Not about the girls. About the beginning and ending: what we remember most from that manifesto is its anti-PC bookends, even though the bulk of the manifesto devotes itself to very different subject matter. The quotes pulled from manifestos (including this one) and published by news outlets are a few subjective anecdotes, not the totality of the text .

Anyway. Pieces of writing that sally forth from such diseased individuals always call to mind what Kenneth Burke said about Mein Kampf:

[Hitler] was helpful enough to put his cards face up on the table, that we might examine his hands. Let us, then, for God’s sake, examine them.

 

Demographic distribution: Gender of citations in CCC, RSQ, and RR abstracts

This post follows up on my discussion of citation frequencies in abstracts in rhetoric and composition journals. To reiterate, a safe assumption to make is that citations in abstracts are “central” to the arguments presented and the research undertaken in the articles themselves; they are particularly informative about overall trends. The genre of the humanities article demands more citations than a core argument actually requires, so looking at citations in abstracts should control for that genre requirement, distilling down all citations to the most vital ones.

The journals: College Composition and Communication (CCC), Rhetoric Society Quarterly (RSQ), and Rhetoric Review (RR). The CCC abstracts run from February 2000 (51.3) to September 2011 (63.1), a total of 261 abstracts. The RSQ abstracts run from Winter 2000 (30.1) to Fall 2011 (41.5), a total of 220 abstracts. The RR abstracts run from 2002 (21.3) to 2011 (30.4), a total of 154 abstracts.

The previous post discussed the “long tail” distribution that emerged from the citation frequencies and what it means for disciplinary identity. This post presents information on the gender of the sources cited in the abstracts, then makes a few comments about demographic distributions in general.

There are 79 unique citations in the CCC abstracts; 159 unique citations in the RSQ abstracts; and 121 unique citations in the RR abstracts. (See previous post for .xls data files.) Here’s how the gender distribution falls: in CCC, 23 out of the 79 sources are female; in RSQ, 39 out of the 159 sources are female; in RR, 36 out of the 121 sources are female.

And here are graphs of the raw counts and of the percentages:

Abstract citations by gender (raw count)

Abstract citations by gender (raw count)

Abstract citations by gender (percentage)

Abstract citations by gender (percentage)

In Authoring a Discipline, Maureen Daly Goggin has shown that by 1990 total contributors to 9 of rhetoric and composition’s major journals—including the 3 analyzed here—had equalized to a nearly 50/50 split between males and females. I imagine this trend has continued into the new millennium, but it would be worthwhile to determine whether or not that’s the case.

What has not equalized, however, is the gender contribution in terms of citations. Odds are, counting all citations in the articles themselves would alleviate the large gap seen in the graphs above. But insofar as we accept that abstract citations represent the most vital sources in each journal, then an obvious gender gap still exists in CCC, RSQ, and RR citations.

In RSQ and RR, this gap, in part, likely has something to do with these journals’ tendencies to publish work on rhetorical history. I pointed this out in the last post: 27 (or 22%) of the RR citations are sources from the 17th century or earlier. 26 (or 16%) of RSQ citations are from the same period. Those numbers would grow if they included figures from the 18th and 19th centuries, as well. The reality is, most of these historical sources are male: Plato, Cicero, Aristotle, Quintilian, et cetera.

I have no ready explanation for why CCC citations should have as large a gender gap as the other journals’ citations, given that CCC builds most of its scholarship on sources from the middle part of the 20th century or later. If we look at the 102 most cited figures in CCC between 1987 and 2011 (Mueller, “Grasping”), we discover that 43/102 (42%) of the sources are female: a gender imbalance, but one not nearly as pronounced as the imbalance that surfaces in abstract citations. I’d be curious to see the gender distribution in Mueller’s entire data set. Is there a nearly 50/50 split between male and female sources across all citations in CCC between 1987 and 2011? If so, we could model the gender imbalance in this journal’s citations as an emergent feature: 50/50 across the entire data set; 58/42 in the most popular citations between 1987 and 2011; 71/29 in abstracts between 2000 and 2011. It’s unfortunate that CCC did not publish abstracts until the late 1990s, so that the dates of the abstracts and the articles could be uniform.

The question of demographic balance is one that spills a lot of digital ink. Just this morning, Scott Weingart visualized the gender (im)balance of Digital Humanities Conference attendees: about a 70/30 split that favors males. And Google recently released the demographic characteristics of its workforce: 30% of its employees are women; 17% of its technical employees are women. 60% of its employees are white; 30% of its employees are Asian (read: East Asian and Indian); and only 3% of its employees are Non-Asian Minorities.

I asked Scott why our default assumption should be uniform demographic distribution. When looking at statistical trends that emerge at large scales, we shouldn’t be surprised to discover that human populations cluster differently. At least, that’s my default assumption. The DH Conference draws more males, but then, an Early Childhood Education conference will draw more females. (I once attended a conference on speech and behavior therapy for autistic children; there were no more than three or four males amid about seventy females.) Or take a look at the National Association for the Education of Young Children. Although we often hear about the male-ness of executive boards, the NAEYC’s executive team is entirely female, and its 17-member governing  board boasts 13 females and only 4 males. Looking at all the Early Childhood Education associations and organizations in the country, what gender trends would we expect to find?

The first question to ask about demographic distribution in any particular population (like Google’s workforce or citations in abstracts) is this: What are the characteristics of the larger population from which this particular population is drawing? As long as rhetorical scholars continue to look at rhetorical history, where most of the figures are male, then we can continue to expect many citations in these historical journals to be male. (This may change, however, as more and more rhetorical historians re-discover the history of female oratory.) Or, in Google’s case, if we take the American population as the baseline, assuming a 50/50 gender split, then clearly there is a gender imbalance. But in terms of race and ethnicity, its white workforce is in fact under-represented. Raising the percentage of blacks and Hispanics at Google would mean firing a lot of the Chinese and Indians, unless we want to make whites more under-represented than they already are. (A fairer baseline population would be the percentage of working-age adults in America, or, better yet, the percentage of working-age adults with college degrees; however, those stats are much harder to come by. Total population is a decent but imperfect proxy.)

The point is that we do not always find particular populations boasting a uniform or near-uniform demographic distribution. Why is this? A complex question. Given the totality of the human population (or, more humbly, the totality of any total population in a given geographic area), why do we find the smaller population clusters clustering the way they do around different practices? Why are there more males in CCC citations? Why are there more males at the DH Conference? Why are East Asians and Indians so over-represented at Google? Why are there so few East Asians and Indians in the NFL and the NBA? That populations cluster differently around different practices seems to be a statistical fact. Is it also a future inevitability?

A possible explanation for the emergence of quotative “like” in American English

So Monica was like, “What are you doing here, Chandler?” and Chandler was like, “Uhh nothing” and then Monica was like, “Why are you here with Phoebe?” and Chandler was like, “I don’t know,” and Monica was like, “Whatever!”

Quotative “be like” probably gets on your nerves. Unfortunately for you, it spread like wildfire in the latter half of the 20th century and today is used by native and non-native speakers alike as often as they use traditional say-type quotatives. What is its structure, when did it arise, and why did it spread so quickly? This post offers a possible explanation, based on evidence dragged up from the depths of the Google Books Corpus. To appreciate that evidence, however, we need to start with some discussion of this quotative’s formal properties.

1

One interesting property of quotative “be like” is its ambiguous semantics. In some contexts, it is a stative predicate that denotes internal speech, i.e., thoughts reflexive of an attitude. In other contexts, it is an eventive predicate denoting an actual speech act. Sometimes, the denotation is ambiguous, as in (1):

(1) Monica was like, “Oh my God!”

. . . Did Monica literally say “Oh my God!” or did she just think or feel it?

Another interesting property of quotative “be like” is that it disallows indirect speech.

(2a) Monica was like, “I should go to the mall.”

(2b) *Monica was like that she should go to the mall.

(2c) *Monica was like she should go to the mall

Quotative say of course allows indirect speech:

(3a) Monica said, “I should go to the mall.”

(3b) Monica said that she should go to the mall.

(3c) Monica said she should go to the mall.

Haddican et al. (2012) recognize that quotative “be like” is immune to indirect speech due to its mimetic implicature. (2b) cannot be allowed because quotative “be like” always means something more along these lines:

(4) Monica was like: QUOTE

Given the implied mimesis of this construction, it makes no sense, as in (2b) and (2c), to add an overt complementizer and to change person/tense to produce an indirect, third person report. This property is shared by all uses of quotative “be like,” whether in their stative or eventive readings.

But there’s more to it than a mimetic implicature. Schourup (1982) points out that quotative “go” also shares this mimetic property (although he does not frame it as such). As expected of a quotative with a mimetic implicature, quotative “go” likewise does not allow an indirect speech interpretation via addition of an overt complementizer and shifts in person/tense:

(5a) Monica goes, “I should go to the mall.”

(5b) *Monica goes that she should go to the mall.

Why should these innovative quotatives be so immune to indirect speech and so committed to direct quote marking? Schourup suggests that quotative “go” (and, by extension, quotative “be like”) arose precisely to meet English’s need for a mimetic, unambiguous direct quotation marker. Prior to the occurrence of these new quotatives, English lacked such a marker. Consider (6a) and (6b) below:

(6a) When I talked to him yesterday, Chandler said that you should go to the doctor.

(6b) When I talked to him yesterday, Chandler said you should go to the doctor.

There is no ambiguity in (6a). The speaker of this utterance clearly intends to convey to his interlocutor that Chandler said the interlocutor should go to the doctor. (6b), however, introduces ambiguity. The utterance in (6b) can be interpreted in two ways: a) Chandler said the speaker of the utterance (i.e., I) should go to the doctor; b) Chandler said the speaker’s interlocutor (i.e., you) should go to the doctor. With orthographic conventions, of course, this ambiguity disappears:

(6c) When I talked to him yesterday, Chandler said, “You should go to the doctor.” (So I went.)

However, unlike other languages, spoken English has no “quoting” conventions—it has no direct quote markers for unmarked speech. It is unclear if (6b) is a true quotative or merely an indirect report on speech with a null complementizer.

QuotvsInt

We can imagine speakers needing to clarify this ambiguity:

JOEY: When I talked to him yesterday, Chandler said you should go to the doctor.

ROSS: Wait, he said I should go or you should go?

This ambiguity arises with say-type verbs whenever the complementizer that is omitted. It is traditionally understood that English differentiates between direct quotatives and indirectly reported speech via shifts in person and/or tense. However, the overt complemetizer is really the central feature of this differentiation. Without an overt complementizer, it is never entirely clear if the embedded clause is a direct quote or an indirect report of speech. Here’s another example:

(7) JOEY: Chandler said I will be responsible for the cat’s funeral.

Without the aid of quote marks, we cannot know whether Chandler or Joey is responsible for the cat’s funeral, even though the embedded clause contains a shift in both person and tense. Of course, if Joey wants to convey that Joey himself will be responsible for the cat’s funeral, he can simply add the overt complementizer: “Chandler said that I will be responsible . . .” However, if Joey wants to convey that Chandler has decided to be responsible, Joey has no way to convey it unambiguously with say-type verbs. He must resort to an indirect speech construction with an overt complementizer. Alternatively, he can resort to non-structural signals: a short pause, a change in intonation, or a mimicry of Chandler’s voice. Or he must abandon say-type constructions altogether and convey his meaning some other way.

Quotative “go” and quotative “be like” solve this ambiguity. These innovative quotatives always signal that the following clause is mimetic, a direct quote of speech or thought. Many languages—Russian, Japanese, Georgian, Ancient Greek, to name just a few— have overt markers to ensure that interior clauses are understood as being directly quoted material, whether or not those quoted clauses contain grammatical shifts (though of course they often do). The quotatives “go” and “be like” serve this same purpose. They are structural, unambiguous markers for direct speech, which is why one cannot use them for indirect speech, and which is also why they have spread so widely and quickly: they have met a real need in the language.

Quotative “go,” however, is attested long before quotative “be like.” The Oxford English Dictionary puts the earliest usage in the early 19th century, initially as a way to mime sounds people made, then later as a way to report on actual speech. Here’s an example from Dickens’ Pickwick Papers:

DickensPickwick

So, although I have said that both quotative “be like” and quotative “go” met a need in English for an unambiguous direct quotation marker, it was “go” that in fact met the need first, by at least a century. This historical fact leads me to suspect that quotative “be like” met a slightly different need: while quotative “go” became a direct quotation marker for speech acts, quotative “be like” became a direct quotation marker for thoughts. As Haddican et al. rightly note, an innovative feature of these quotatives is that they allow direct quotes to be descriptors of states. In other words, the directly marked quotes of “go” denote external speech; the directly marked quotes of “be like” primarily denote internal speech, i.e., thoughts or attitudes. I believe this hypothesis is supported by the earliest uses of quotative “be like,” to which we now turn:

2

Today, young native and non-native speakers of English frequently use “like” as a versatile discourse marker or interjection in addition to its use as a quotative (D’Arcy 2005). D’Arcy provides two extreme examples of discourse marker “like.” Both are taken from a large corpus of spoken English:

(8) I love Carrie. LIKE, Carrie’s LIKE a little LIKE out-of-it but LIKE she’s the funniest, LIKE she’s a space-cadet.      Anyways, so she’s LIKE taking shots, she’s LIKE talking away to me, and she’s LIKE, “What’s wrong with you?”

(9) Well you just cut out LIKE a girl figure and a boy figure and then you’d cut out LIKE a dress or a skirt or a coat, and LIKE you’d colour it.

This usage does not become noticeable in available corpora until the 1980s, so nearly all papers that I have read assume that discourse marker “like” and qutoative “be like” arose more or less in tandem during the 1970s, becoming common by the 1980s. However, using the Google Books Corpus, I was able to find an early use of “like” that presages quotative “be like.” This early use also seems to set the stage for the versatile discursive uses of “like” seen in (8) and (9). This early use is the expression, “like wow.” It seems to have arisen during the 1950s (though perhaps earlier) in the early rock n’roll scenes in the Southern United States. Here are some examples.

The first is from 1957: a line from a rock n roll song by Tommy Sands:

(10) When you walk down the street, my heart skips a beat—man, like wow!

The second is from a 1960 issue of Business Education World:

(11) Like, wow! I’m taking a real cool course called general business. It’s the most.

BusinessEducationWorld

The third is from a novel called The Fugitive Pigeon, published in 1965:

(12) But all of a sudden you’re like wow, you know what I mean?

And by 1971, we have a full example of quotative “be like,”— note that this early occurrence uses an expletive as the subject:

(13) But to me it was like, “Oh, why can’t you say, ‘Gee that’s wonderful . . .’”

LifeMagazine1971

These early uses of “like wow” in (10) and (11) denote a stative feeling or attitude rather than any kind of eventive speech act. This is especially clear in (11), where the expression is a direct response to a question about how the speaker is feeling. The quotative in (13) likewise seems to be a stative predicate rather than an eventive one. In fact, in nearly all of the earliest 1uses of quotative “be like”—from the 1970s and early 1980s, as reported in the Google Books Corpus—the intention is to denote a feeling or attitude, not a direct quote of a speech act. Such eventive predications don’t become common until the 1990s and 2000s.

“Like wow,” then, arose in 1950s slang as a stative description. However, the sentence in (14) below suggests that wow was not interpreted as a structurally independent interjection but as an adjective. This is from a 1960 edition of Road and Track magazine:

(14) Man, that crate would look like wow with a Merc grille.

RoadTrack

It is possible that like is an adverb here, but in my estimation it is most likely still a garden variety manner preposition that has innovatively selected for a bare adjective. Typically, like as a preposition only selects NPs as its complement. However, with the advent of “like wow,” it loosened its selection requirements and began to select for adjectives as well. And not just adjectives. The bottom line in this advertisement from Billboard magazine in May 1960 demonstrates that it also began to select for adverbs:

BillboardLikeWowAd

Apparently, in the 1950s and early 1960s, like became a popular and versatile manner preposition. Once like loosened its requirements to select AP complements, it’s easy to see how it could start selecting quotes, thus becoming a new direct quote marker (like narrative “go”); and given the stative denotation of the original phrase “like wow,” it’s also easy to see why stative to be would become the verbal element in this quotative rather than a lexical verb like act or go. Indeed, it appears that the first uses of quotative “be like” were entirely restricted to the phrase “like wow,” ensuring that subsequent uses would likewise have stative readings. (The ad above also shows how easy it would be for like to become an all-around discourse marker once it began to select for a wider range of phrases.)

So, based on the timeline of evidence in the corpus, I posit the following evolution:

LikeEvolution

The emergence of quotative “like”

I follow Haddican et al. in assuming that like in quotative “be like” is still a manner preposition. However, while they assume the preposition did not undergo any change, I argue that like became more versatile in its selection restrictions. This versatility allowed it first to select APs, then to select quotes. Initially, this quotative construction was just an extension of the phrase “like wow,” but it soon began to select any quoted material. And from the beginning, this quotative possessed two features: a) it had an obvious mimetic implicature, ensuring that it would be a direct quote marker, similar to narrative “go”; and b) it had a stative denotation, due to the stative dentation of the original phrase “like wow,” ensuring that the directly marked quotes were reflective of internal speech, i.e., thoughts or attitudes.

A corpus analysis by Buchstaller (2001) has shown that, even today, quotative “go” is much more likely than quotative “be like” to frame “real, occurring speech” (pp. 10); in other words, “be like” continues to be used more often as a stative rather than eventive predicate. As I mentioned earlier, Haddican et al. are correct that one innovative aspect of quotative “be like” is that quotes are now able to be descriptors of states; however, I believe they overstate the eventive vs. stative ambiguity that arises in these quotatives. Most of the time, in real contexts, they are as unambiguously stative as they are unambiguously mimetic of the state. Haddican et al. themselves note that even these eventive readings are open to clarification. Asking whether or not someone “literally” said something sounds much odder following a say-type quotative than a “be like” quotative with a putatively eventive reading.

3

Nevertheless, as I showed at the very beginning of this post, there are instances where quotative “be like” seems to denote an eventive speech act. Linguistically, this is odder than it sounds at first. A single verbal construction—like quotative “be like”—should not have a stative and eventive reading. This ambiguity can only happen for two reasons: either there is some special semantic function at work in this construction, or there are in fact two separate quotative constructions, each with its own syntactic structures.

It is tempting to see a correlation between this ambiguity and the putative ambiguity between stative be and eventive be, also known as the be of activity. Consider the following sentences:

(15) Joey was silly.

(16) Rachel asked Joey to be silly.

Both forms of be select an adjective; however, (16), unlike (15), can be taken to mean that Joey performed some silly action. In other words, the small clause in (16) seems to be an eventive predication, not a stative one. It has been argued (Parsons 1990) that this eventive be is not the usual copular form but a completely different verb that means something like “to act”—in other words, English to be is actually a homophonous pair of verbs, similar to auxiliary have and possessive have. Perhaps this lexical ambiguity in be is related to the eventive vs. stative ambiguity in quotative “be like.” The stative reading arises when stative be is involved; the eventive reading arises when the eventive, lexical be is involved.

Haddican et al. argue against this line of thought. Diachronically, we know that quotative “be like” has arisen rapidly in many varieties of English, and that in all of these varieties, the semantics are ambiguous. But if there are in fact two be verbs that underwent this quotative innovation, then we would need to posit two unrelated channels of change: one in which like+QUOTE became a possible complement of stative be and one in which like+QUOTE became a possible complement of eventive be.

This is actually a problematic claim, given that, presumably, stative and eventive be have different structures. The former undergoes its typical V to T movement in English; the latter, given its eventive semantics, would be expected to remain in the VP like any other lexical verb. These underlying structures would demand that we devise different processes by which qutoative “be like” arose. However, given the rapidity with which it did in fact arise, it is more probable that it arose via a single process—and the inevitable conclusion is that there is a single, stative verb to be that underwent the process. This conclusion is also verified by the auxiliary-like behavior of be in quotatives involving adverbs and questions:

(17) Ross was totally like, “I don’t care!”

(18) Was Ross like, “I don’t care”?

Although the ambiguous stative vs. eventive reading still occurs here, (17) exhibits raising above AdvP, and (18) exhibits subject-aux inversion. In other words, be in these quotatives behaves like an ordinary copular auxiliary, not a lexical verb. We therefore should not posit a separate, eventive be verb. We need another way to explain the semantic ambiguity of these quotatives.

Haddican et al. explain this ambiguity with Davidsonian semantics. Briefly stated, they argue that there is a single stative be verb—both in these qutoative constructions and in English more generally. However, be has a semantic LOCALE function that, in certain contexts, can localize the state in a short-term event, and this localization of an event can force an agentive role onto the subject, even when an adjective has been selected by be. So, in a sentence such as (19), be will have a denotation as in (20):

(19) Joey is being silly.

(20) [[be]] = λSλeλx. ∃s ϵS [e = LOCALE(s) & ARGUMENT(x,e)]

(20) takes a property of state S and localizes it into an event (a moment in which Joey was silly); in the right context, it is not a great leap to coerce this experiencer event into an agentive one. The application of these semantics to “(be) like” quotatives is straightforward:

In the state reading, be like is simply a stage level use of the copula, localised to the event in which the subject of be exhibited the relevant behaviour. The eventive reading arises when the event mapped to is an agentive one, where the most plausible event of an agent behaving in a quotative manner is the relevant speech act. (Haddical et al. 2012 pp. 85)

In short, the ambiguity between stative and eventive “be like” arises from a semantic property that forces certain “states of being” to be processed as localized events whereby the experiencer of the event takes on an agentive role. In certain quotative contexts, the embedded quote is processed as an event, and the subject is understood as having caused that event, i.e, as actually saying something rather than just experiencing an attitude.

I agree that it would be better not to posit two homophonous verbs (stative be vs. be of activity) to account for the ambiguous stative vs. eventive denotations of quotative “be like.” Doing so requires two separate analyses and two separate channels of diffusion, which seems unlikely given the rapidity with which this quotative did in fact spread across many varieties of English. However, Haddican et. al’s application of Davidsonian semantics to explain the ambiguous readings runs into a problem in sentences like (21) below, as well as in the earlier example in (13):

(21) It was like, “Oh Mom, Can I film a movie in the house, it won’t be any problem at all.”

This is clearly an eventive predication of quotative “be like.” But instead of an agentive subject we have expletive it. Recall that Haddican et. al’s analysis relies on the notion that stative be has a LOCALE function that locates the state into a temporary moment or event. This localization can coerce an experiencer subject into the role of an agentive subject when the most likely reading (as above) suggests that the temporary event was an actual speech act. As Haddican et al. say themselves, “this event assigns an agentive role to the subject” (pp. 85). However, by definition, the expletive in (21) receives no theta role and can therefore be neither the experiencer of a state nor the agent of an event. And yet (21) clearly denotes an eventive reading: the speaker actually spoke the words, or something like them.

The fact that “be like” quotatives can take an eventive (or even a stative) reading when an expletive surfaces in spec-TP suggests that Davidsonian semantics do not explain the ambiguous eventive vs. stative readings associated with these quotatives. (The fact that “be like” quotatives exhibit both experiencer subjects and expletive subjects also suggests that the quote CP is the only obligatory argument assigned by “be like.”)

The only alternative seems to be that there are in fact two homophonous be verbs, and quotative “be like” makes use of both. Maybe this isn’t such a big deal. If I’m right about the diachronic process by which quotative “be like” arose, then we can at least see a two-step process: quotative “be like” was solely a stative predicate in its early use and for most of its early history; only later did it begin to be used as an eventive predicate. And if there are in fact two be verbs, the eventive sounds exactly like the stative and is in fact much rarer than the stative, so I suppose one can see how these facts laid the groundwork for the eventual use of stative “be like” as an eventive predicate.

Lying with Data Visualizations: Is it Misleading to Truncate the Y-Axis?

Making the rounds on Twitter today is a post by Ravi Parikh entitled “How to lie with data visualization.” It falls neatly into the “how to lie with statistics” genre because data visualization is nothing more than the visual representation of numerical information.

At least one graph provided by Parikh does seem like a deliberate attempt to obfuscate information–i.e., to lie:

y-axis2

Inverting the y-axis so that zero starts at the top is very bad form, as Parikh rightly notes. It is especially bad form given that this graph delivers information about a politically sensitive subject (firearm homicides before and after the enacting of Stand Your Ground legislation).

Other graphs Parikh provides don’t seem like deliberate obfuscations so much as exercises in stupidity:

y-axis3

Pie charts whose divisions are broken down by % need to add up to 100%. No one in Fox Chicago’s newsroom knows how to add. WTF Visualizations—a great site—provides many examples of pie charts like this one.

So, yes, data visualizations can be deliberately misleading; they can be carelessly designed and therefore uninformative. These are problems with visualization proper, and may or may not reflect problems with the numerical data itself or the methods used to collect the data.

However, one of Parikh’s “visual lies” is more complicated: the truncated y-axis:

y-axis1

About these graphs, Parikh writes the following;

One of the easiest ways to misrepresent your data is by messing with the y-axis of a bar graph, line graph, or scatter plot. In most cases, the y-axis ranges from 0 to a maximum value that encompasses the range of the data. However, sometimes we change the range to better highlight the differences. Taken to an extreme, this technique can make differences in data seem much larger than they are.

Truncating the y-axis “can make differences in data seem much larger than they are.” Whether or not differences in data are large or small, however, depends entirely on the context of the data. We can’t know, one way or the other, if a difference of .001% is a major or insignificant difference unless we have some knowledge of the field for which that statistic was compiled.

Take the Bush Tax Cut graph above. This graph visualizes a tax raise for those in the top bracket, from a 35% rate to a 39.6% rate. This difference is put into a graph with a y-axis that extends from 34 – 42%, which makes the difference seem quite significant. However, if we put this difference into a graph with a y-axis that extends from 0 – 40%—the range of income tax rates—the difference seems much less significant:

y-axis4

So which graph is more accurate? The one with a truncated y-axis or the one without it? The one in which the percentage difference seems significant or the one in which it seems insignificant?

Here’s where context-specific knowledge becomes vital. What is actually being measured here? Taxes on income. Is a 35% tax on income really that much greater than a 39.6% tax? According to the current U.S. tax code, this highest bracket affects individual earnings over $400,000/year and, for  married couples, earnings over $450,000/year. Let’s go with the single rate. Let’s say someone makes $800,000 per year in income, meaning that $400,000 of that income will be taxed at the highest rate:

35% of 400,000 = 0.35(400,000) = 140,000

39.6% of 400,000 = 0.396(400,000) = 158,400

158,400 – 140,000 = 18,400

So, in real numbers, not percent, the tax rate hike will equal $18,400 to someone making 800k each year. It would equal more $$$ for those earning over a million. So, the question posed a moment ago (which graph is more accurate?) can also be posed in the following way: is an extra eighteen grand lost annually to taxes a significant or insignificant amount?

And this of course is a subjective question. Ravi Parikh thinks it’s not a significant difference, which is why he used the truncated graph as an example in a post titled “How to lie with data visualization.” (And as a graduate student, my response is also, “Boo-freaking-hoo.”) However, imagine a wealthy couple, owners of a successful car dealership, being taxed at this rate (based on a combined income of ~800k). They have four kids. Over 18 years, the money lost to this tax raise will equal what could have been college tuition for two of their kids. I believe they would think the difference between 35% and 39.6% is significant. (Note that the “semi-rich” favor Republicans, while the super rich, the 1%, favor Democrats.)

What about the baseball graph? It shows a pitcher’s average knuckleball speed from one year to the next. When measuring pitch speed, how significant is the difference between 77.3 mph and 75.3 mph? Is the truncated y-axis making a minor change more significant than it really is? As averages across an entire season, a drop in 2 mph does seem pretty significant to me. If Dickey were a fastball pitcher, averaging between 92 mph and 90 mph would mean fewer pitches under 90mph, which could lead to a higher ERA, fewer starts, and a truncated career. For young pitchers being scouted, the difference between an 84 mph pitch and an 86 mph pitch can apparently mean the difference between getting signed and not getting signed. Granted, there are very few knuckleballers in baseball, so whether or not this average difference is significant in the context of the knuckleball is difficult to ascertain. However, in the context of baseball more generally, a 2 mph average decline in pitch speed is worth visualizing as a notable decline.

So, do truncated y-axes qualify as the same sort of data-viz problem as pie charts that don’t add up to 100%? It depends on the context. And there are plenty of contexts in which tiny differences are in fact significant. In these contexts, not truncating the y-axis would mean creating a misleading visualization.