• Jonathan Cheng

We don’t have a good language for describing the relationship between numbers and literary impressio


Bechdel Test of American Cinema

Image 1. The gender distribution in international cinema according to four sociological studies. Courtesy of Wikipedia.

In May of 2017, PMLA published Johanna Drucker’s article “Why Distant Reading Isn’t.” Echoing various critics of digital humanities (ex. Stanley Fish & Timothy Brennan), Drucker argues that using text mining practices for literary research, often referred to as “distant reading,” is fundamentally incompatible with humanities inquiry. Literary scholars who currently use these text mining practices in their research broadly do so in two ways: 1) In a structuralist approach to literary history, they use computational methods to scrape macroscopic samples from large digital libraries (ex. Can we model changes in gendered characterization in 104,000 works of fiction?). Macroscopic in the sense that the samples will ideally demonstrate literary patterns visible throughout a broad swath of texts. 2) In a more sociological vein, they use quantitative methods to model and test literary hypotheses (ex. Do students describe their instructors using gendered language?). Pushing back against these approaches to literary study, Drucker asserts that computation and quantification are fundamentally at odds with developing the rich anecdotal impressions that the humanities, as a discipline, tends to value.

Drucker’s disagreement with computational approaches stems from a pretty classic premise: machines don’t read like humans. The essay repeatedly configures distant reading with an ominously industrial sense of automation. For Drucker, when scholars use computers to extract language patterns from large digital collections, they’re mechanically reducing rich narrative experiences into linguistic trends. The fear, as the story goes, is that literary scholars will exclusively privilege these linguistic repetitions over personal engagements with the text. And this quantitative shift will overlook a variety of literary experiences not reflected in a text’s literal registers. In short, Drucker sees text mining as a radical break from human reading where “the distinction between mechanical and hermeneutic reading, between the automatic and the interpretive, between unmotivated and motivated encounters with texts, is essential. Processing is not reading” (630). If we treat reading as mechanical, then we’ll erroneously conflate literal and figurative language. If we treat reading as automatic or unmotivated, then we’ll lose the ability to critically discuss complex interactions between active readers and the text. Drucker’s concern is well founded too. Ethnic studies and feminist scholars have repeatedly needed to remind high theorists not to abstract various readers into erasure. But Drucker’s premise suggests that computational work necessarily results in that erasure. Drucker understands that computational processes can extract linguistic patterns. She also acknowledges that quantification can lead scholars to macroscopic historical questions. She just doesn’t think that numbers are at all related to texturing one’s narrative experiences. There is a salient assumption here about the nature of literary interpretation: it has very little to do with managing quantitative impressions.

If I understand that premise correctly, Drucker’s argument is less motivated by a narrow fear of computers than a narrow notion of what counts as reading. And if we consider how literary disciplines teach students to refine their anecdotal impressions, Drucker’s assumptions about reading are arguably well grounded. Broadly speaking, the discipline does not configure literary interpretation with an eye for obvious frequencies or repetitions. At the undergraduate level, we strongly encourage students to develop an eye for textual subtleties. And we train them to articulate the significance of those details within a particular context (often, the narrative itself). At the graduate level, we learn that refining one’s literary impressions involves expanding our contextual awareness (historical, racial, sexual, theoretical, etc.). This allows us to unpack the significance beyond the text itself. Your impressions are not supposed to come from frequencies, proportions, or probabilities of particular textual details. Those impressions purportedly won’t tell you much about culture and how readers experience narrative. They come from carefully “unpacking” discrete details and their significance within specific cultural contexts.

Defining reading along these lines has profound implications for how we handle other forms of textual engagement. In this vein, quantitative observations will seem too heavy-handed for treating subtlety. Computational models will seem too contextually agnostic to render any meaningful interactions. That’s why, and I fall into this trap too, literary folk will often say that their favorite pieces of computational research are ones that carefully “balance” quantification and close reading. There’s just such a powerful notion that literary and quantitative impressions are on opposite ends of a spectrum and can only emerge at the other’s cost. Like Drucker, they know that numbers can provide perspective on a structural register. They just don’t think that rich engagements with a text has any relationship to quantification—hence the need for balance.

Google Ngram Graph of Balance x Imbalance

Image 2.The distribution of the words’ balance’ and ‘imbalance’ in Google Books’ English fiction collection. Apparently books describe a lot more in terms of balance instead of its nonexistence. Curious.

Up until very recently, I think I may have been in the same boat. Not that I felt quantification was at ends with literary scholarship, but there was a sense that reading by numbers was categorically removed from my anecdotal impressions. After all, the 2000s and 2010s were marked by computational scholarship that had a polemical ethos of structuralism. The hope, as with most forms of structuralism, was that quantitatively rendering these structures would figure into individual encounters with the literature itself. Nonetheless, there was a sense that it was far removed from a personal engagement with a text. Franco Moretti’s Graphs, Maps, Trees (2005) and Distant Reading (2013) certainly fit the bill. Matthew Jockers’s Macroanalysis: Digital Methods and Literary History (2013) felt not too far removed. Not to say that these aren’t invigorating pieces of scholarship. They just seemed to reinforce a close-distant/micro-macro binary and placed computational methods on the latter ends of the spectrum.

But a couple of things have changed since then, and I’m fairly convinced that numbers can and, dare I say it, always have figured into our individual engagement with textual materials...we’re just deciding what language best communicates that relationship. Ted Underwood wrote a very different “Genealogy of Distant Reading” (2017) than what I was used to. Instead of attributing distant reading to Moretti’s brand of structuralism, he shows how quantification was central to Janice Radway’s Reading the Romance (1984). Radway’s numbers were integral to her monograph becoming “a monument of feminist scholarship by challenging the widespread premise that popular literature simply transmitted ideology… Studying a community of women linked by a particular bookstore, Radway concluded that readers have more control over the meaning of stories than critics assume” (15). At the same time, while Andrew Piper’s Enumerations: Data and Literary Study (2018) signals a structural focus, it investigates the notion of literary quantity. In particular, his chapter on punctuation changes in a large sample of novels from 1790-1990 raises several questions about how readers manage narrative time.

Aside from changing tones in computational scholarship, two of my experiences as a graduate student have also played key roles. One the one hand, I’ve just finished my field portfolio where I’m told to read forty or so texts. And I now have the pleasure of reading another list for the focus portfolio. One of my committee members teasingly claimed, “It’s good for you.” Let’s assume that they genuinely believe that statement. Reading lists, if they are good for anything, seem to be predicated on the notion of literary quantity. The idea that exposure would instill some particular quantitative impressions from which we begin to recognize ideological themes or historical frameworks (ex. As a Victorian reader, how often would you find yourself reading a marriage plot? Often enough. But then more of this other kind of marriage plot emerged.). On the other hand, I’ve begun teaching Introduction to Fiction. We recently finished reading Agatha Christie’s Murder of Roger Ackroyd (1926), and I’ve given them the basics about point of view and characterization. At first, I get into a bit of a huff when they don’t structure their impressions around these subtleties. After all, I’m busily trying to wrench them out of their quantitative impressions and get them to elaborate the significance of x-figurative device when it comes to y-affect. But it clashes with their meaningfully quantitative engagements with the text: “How much of this is the narrator and Detective Poirot talking to each other?” “How many rooms are in these peoples’ houses?” “Why does Poirot only say one French thing per chapter?” They are masters of these sorts of impressions, and they can tell us a lot about how we’re interacting with these texts (and help us imagine how other readers might perceive these details as well). I have a hunch we’ve avoided this approach simply because we don’t have good answers to those questions yet (If you asked me, on average, how many rooms houses have in literature, and how that changes over time, I could not tell you. Even though it is probably a significant detail).

It’s these experiences that make me rethink the unclear relationship between numbers and what kinds of reading literary disciplines tend to value. It’s not just about counting repetitions, but about developing meaningful impressions through our automatic interactions with a text. Like many, I’m curious and undecided about whether computational work should remain nested in literary departments or if it should migrate to information and library sciences. What I’m more decided on is that numbers do have a place in building anecdotal impressions of texts. As I linked above, Feminist Frequency and the Bechdel Test show how it is entirely possible to react to media in a way that involves numbers. The bigger question is: whose language do we want to adopt to have these conversations? Underwood points towards the social sciences, because their discipline is predicated on building persuasive impressions through quantitative surveys and polls. Piper advocates for computer and information sciences, because those fields are often debating what can and can’t be learned through statistical impressions. Berenike Herrman has even suggested that the language of cognitive science is best suited for this task, because their psychological tradition is best suited for describing the significance of seemingly automatic and repetitive experiences. In reality, we’ll need a bit of each, because they provide different intellectual perspectives and traditions about quantitative impressions.

But this is where I go off the deep end.

I think we’re also going to need some language that captures the hermeneutic playfulness associated with literary interpretation. A language that presents literary quantities, their affective properties, while garnering a sense that we’re still operating on an anecdotal register. The problem of solely using the terms of computer, social, or cognitive science is that there is still a very real sense that their quantification are far removed from individual impressions. That’s not to say that these fields successfully achieve objectivity. Just that their use of numbers, for various reasons, still feel “distant” from individual perception. But where can we find a language that negotiates quantitative and social information while maintaining an interpretative anecdotal decorum?

This is where I lose you, I think we might find that language in the realm of gambling and, more specifically, poker. This is, after all, a game where players must make persuasive choices by handling social and quantitative forms of evidence while being cognizant of their own personal engagement with that evidence. And there is a robust vocabulary for navigating that evidence (character ranges, positional awareness, etc.). But the language for handling these different forms of evidence do not elevate the practice into an objective science, as they are still situated within an anecdotal context (you are, after all, making calculated, yet impressionistic gambles). I get that, in this metaphor, literary interpretation involves adopting a gambling hermeneutic. But I’m insanely curious, what would it look like if we treated literary interpretation as a series of calculated wagers, and the calculations were based on both social and quantitative forms of evidence? Throughout its history, the discipline of literary studies has transformed itself by imagining its scholars as detectives, historians, psychoanalysts, activists, environmentalists, etc. And with each role-play our perception of reading has adjusted slightly. How does reading change if we were gamblers in addition to our existing roles?