Some Quick Text Mining of the 2015 CCCC Program

During CCCC last week, Freddie deBoer made a couple comments about the conference: first, that there weren’t as many panels on the actual work of teaching writing compared to panels on sexier topics, like [insert stereotypical humanities topic here]; and second, that not much empirical research was being presented at the conference.

https://twitter.com/freddiedeboer/status/578562988677857280

https://twitter.com/freddiedeboer/status/578571207412334594

https://twitter.com/freddiedeboer/status/578552201431232512

Testing these claims isn’t easy, but as a first stab, here’s a list of the most frequent unigrams and bigrams in the conference’s full list of presentation titles, as found in the official program. Make of these lists what you will. It’s pretty obvious to me that the conference wasn’t bursting at the seams with quantitative data. Sure, research appears at the head of the distribution, but I’ll leave it to you to concordance the word and figure out how often it denotes empirical research into writers while writing.

Then again, big data was a relatively popular term this year. It was used in titles more often than case studies, though case studies was used more often than digital humanities.

To Freddie’s point, the word empirical only appears 11 times in the CCCC program; the word essay appears only 16 times. Is it therefore fair to say there weren’t many empirical studies on essay writing presented this year? Maybe. Maybe not.

CCCCUnigramsCCCCBigrams

One way to get a flavor for the contexts and connotations of individual words and bigrams is of course to create a text network. I’ve begun to think of text networks as visual concordances.

Here is a text network of the tokens writing, write, writer, writers, writing_courses, classroom, and classrooms in the CCCC program. One thing to notice here is that each of these words is semantically related, but in the panel and presentation titles, they exist in clusters of relatively unrelated words. I had expected to discover a messy, overlapping network with these terms, but they’re rather distinct, as judged by the company they keep in the CCCC program. Even the singular and plural forms of the same noun  (e.g., from classroom to classrooms, writer to writers) form distinct clusters.

CCCCProgramNetwork

In relation to Freddie’s point, this network demonstrates that words or bigrams that are prima facie good proxies for “teaching writing” often do lead us to presentations that are pedagogical in nature. However, just as often, they lead us to presentations that are only tangentially or not at all related to the teaching of writing and to the empirical study of writers while writing.

Thus, writer forms a cluster with FYC, student, and reader but also with identity, ownership, and virtual. The same thing occurs with the other terms, though writing by far occurs alongside the most diverse range of lexical items.

CCCCWriter

CCCCWriters

CCCCClassroom

CCCCWriting

This is about as much work as I’m interested in doing on the CCCC program for now. In my last post, I put a download link for a .doc version of the program, for anyone interested in doing a more thorough analysis, whether to test Freddie’s claims or to test your own ideas about the field’s zeitgeist.

However, it’s always important to keep in mind that a conference program might tell us more about the influence of conference themes than about the field itself.

ADDED: Here is a list of all names listed at the end of the CCCC program (CCCCProgramNames). Problem is, it’s a list of the FIRST and LAST names, with each given its own entry. If someone is inclined, they can go through this list and delete the last names, which will leave you with a file that can be run through a Gender Recognition algorithm, to see what the gender split of CCCC presenters was.

University representation at CCCC

Here’s a list of the universities and colleges best represented at the 2015 CCCC conference. I used NLTK to locate named entities in the CCCC program, so the graph simply represents a raw count of each time a university’s name appears in the program. Some counts might be inflated, but in general, each time a school is named = a panel with a representative from that school.

The graph shows only those schools that were named at least 10 times in the program (i.e., the schools that had at least 10 individual panels). Even in this truncated list, Michigan State dominates. Explanations for this gross inequality in representation are welcome in the comments.

CCCCColleges

Program (in .docx form because WordPress doesn’t allow .txt files)