Monday, October 31, 2011

Language Log » On the front lines of Twitter linguistics

In this post on language log, Ben Zimmer elaborates on this op-ed piece that appeared in Sunday’s New York Times.  The language log post elaborates on and links to some of the research that was only briefly mentioned in the op-ed piece.  Here’s a charming example of the elaboration:

  • Eisenstein and Bamman are currently conducting research with Tyler Schnoebelen of Stanford University that looks at how gender plays a role in language variation on Twitter. But they're going well beyond simply analyzing which language forms are associated with women and which are associated with men. Using information on people's Twitter followers, they can also take into consideration the gender makeup of people's networks. Thus, a man with a predominantly female network may show different linguistic patterns compared to a man with a male or mixed network. Earlier today, at NWAV 40, Schnoebelen presented some of his research on one aspect of Twitter discourse, emoticons. The abstract of his paper includes this great line: "Emoticons with noses are historically older." It's true! Not only that, but emoticons with noses, like :-), show distinctly different patterns of distribution than the noseless kind, like :) . Noseless emoticons tend to be used by younger Twitter users and are associated with more informal discourse. Women use them more than men, too, but women use more of all types of emoticons. I'll be looking forward to the definitive study of emoticon nosedness.
  • Language Log » On the front lines of Twitter linguistics

    Thursday, October 27, 2011

    Hugh’s on First: How Claude Shannon Could Help Baseball Managers

    Game six of the World Series between the Texas Rangers and the St. Louis Cardinals was postponed because of rain. Cardinals manager Tony La Russa said he would use the night off to see “Moneyball,” the movie about data analytics in the major leagues. 

    Moneyball, Schmoneyball.  What La Russa really needs is a primer on information theory a la Claude Shannon.  Some reporters covering the series could benefit too.

    During game five of the series, some costly telephonic miscommunications occurred between La Russa or pitching coach Dave Duncan (in the dugout) and bullpen coach Derek Lilliquist (in the bullpen).  At various times during two phone conversations, three pitchers’ surnames were mentioned or heard or both.  Those surnames are Motte, Lynn, and Rzepczynski (zep-CHIN-ski).  The miscommunications yielded the farcical situation of a pitcher entering the game to issue an intentional walk to the only batter he would face.  (A chronology of this comedy of errors appears at the end of this post.)

    Much has already been written about these events, including commentary on La Russa’s aggressiveness in deploying relief pitchers, the quaintness of land lines that connect dugouts to bullpens in major-league ball parks, and the difficulty of hearing a phone call while 50,000 nearby fans are screaming.

    Because game six was postponed, sportswriters continue to discuss the game-five debacle.  Thus we get this story in Thursday’s New York Times, accompanied by the following graphic:


    The story—and the caption in the graphic above—suggest that of the three names, Rzepczynski’s is the most likely to be mis-communicated because it is hardest to say.  This analysis does not comport with what we know about information theory.  It is thoroughly un-amazing that Rzepczynski’s name was conveyed successfully.

    Likewise, it is not terribly surprising that Motte’s name was misheard.  You’re just asking for trouble if your name rhymes with “not.”  Lynn’s name rhymes with “win,” “in,” and “I cannot hear your instructions over the stadium din.”

    But it’s not really about rhyming; it’s about information distance.  The problem with Motte’s name is that it is a short information distance to any number of other words that might reasonably uttered during a conversation about baseball: not, hot, got, dot.  Lynn’s name has the same problem.  Rzepczynski’s name emphatically does not.

    If the bullpen coach mishears a single consonant of Motte’s name or Lynn’s name, he might reasonably believe that he heard another word; miscommunication can result.  But if he mishears a consonant or other small portion of Rzepcnynski’s name, so what?  He’s very likely to get the message anyway.

    Case in point:  Did you even notice that I mis-spelled Rzepczynski’s name at the end of the last paragraph? 

    Background for baseball nerds:

    And now, as promised, here is the reported chronology of some events during the eighth inning of game five:

    1. The Rangers are batting, facing Cardinals pitcher Octavio Dotel.  From the Cardinals’ dugout, La Russa or Duncan calls the bullpen and requests that two pitchers start warming up:  Rzepczynski and Motte.
    2. In the bullpen, coach Lilliquist hears only Rzepczynski’s name. (La Russa later said that Lilliquist had hung up too early, before he said Motte’s name.)
    3. Rzepczynski comes in the game to face the left-handed-hitting David Murhpy, who hits an infield single.
    4. La Russa notices that Motte is not warming up in the bullpen, and calls again, reiterating that request.
    5. In the bullpen, coach Lilliquest mishears La Russa’s words as a request for Lynn to begin warming up.
    6. Because Motte is not ready, Rzepczynski remains in the game to face right-handed-hitting Mike Napoli, who hits a two-run double.  (La Russa had wanted Motte to pitch to Napoli.)  Rzepczynski then strikes out the next batter, Mitch Moreland.
    7. After Moreland strikes out, La Russa removes Rzepczynski from the game and requests the next relief pitcher.  La Russa believes he is summoning Motte, and is surprised when Lynn walks to the pitcher’s mound from the bullpen.
    8. Because the rules of baseball require that any pitcher who enters the game must face at least one batter and because Lynn is supposed to be resting his arm that day, La Russa instructs Lynn to intentionally walk the next batter with four gentle, low-stress pitches.
    9. Motte finally enters the game, three batters later than La Russa had envisioned.

    Thursday, October 6, 2011

    Don’t Get Mad, Get Lucid

    A recent kerfuffle has reminded me of a not-so-recent one.

    Recent Kerfuffle:

    Last month, upon hearing her professor cite a noxious, indefensible anti-Semitic opinion as an example of a noxious, indefensible opinion, York University student Sarah Grunfeld heedlessly stormed out of class and accused her professor of anti-Semitism.  Later, Grunfeld doubled down on her mistake, saying “The words, ‘Jews should be sterilized’ still came out of his mouth, so regardless of the context I still think that’s pretty serious.”

    Not-So-Recent Kerfuffle:

    In 2002, the creators of the New York State Regents exams in high-school English were found to have sanitized literary excerpts used in those exams. Removed from the passages were any references to race, religion, ethnicity, sex, nudity, alcohol, and even words like “fat” and “skinny.” These modifications were made without the authors’ knowledge or permission, and were not acknowledged on the exams.


    When students are not required to think seriously about disturbing ideas, they end up incapable of thinking clearly and responsibly about disturbing ideas. Is this what happened to Sarah Grunfeld?

    If your reading comprehension nosedives whenever you get agitated about something, you are insufficiently equipped to participate in civil society. You probably make a lousy employee too, and I’m unlikely to enjoy sitting near you at a dinner party.

    Note to high school English students: The New York State Board of Regents is saying that the thing of it is, if you can keep your head while all about you are losing theirs, like, so what?

    Note To Sarah Grunfeld:  If you insist on blaming educators for your mistake (“The words … came out of his mouth...”), don’t blame your current educator whose words made you, like, lose your head; blame your previous educators who failed to teach you, like, how to keep it.

    The remainder of this post contains background information on the recent kerfuffle and the old one.

    Background (from 2002) on the Regents Exams:

    The New York Times covered the story here.  Some excerpts:

    In a feat of literary sleuth work, Ms. Heifetz, the mother of a high school senior and a weaver from Brooklyn, inspected 10 high school English exams from the past three years and discovered that the vast majority of the passages -- drawn from the works of Isaac Bashevis Singer, Anton Chekhov and William Maxwell, among others -- had been sanitized of virtually any reference to race, religion, ethnicity, sex, nudity, alcohol, even the mildest profanity and just about anything that might offend someone for some reason. Students had to write essays and answer questions based on these doctored versions -- versions that were clearly marked as the work of the widely known authors.

    In an excerpt from the work of Mr. Singer, for instance, all mention of Judaism is eliminated, even though it is so much the essence of his writing. His reference to ''Most Jewish women'' becomes ''Most women'' on the Regents, and ''even the Polish schools were closed'' becomes ''even the schools were closed.'' Out entirely goes the line ''Jews are Jews and Gentiles are Gentiles.'' In a passage from Annie Dillard's memoir, ''An American Childhood,'' racial references are edited out of a description of her childhood trips to a library in the black section of town where she is almost the only white visitor, even though the point of the passage is to emphasize race and the insights she learned about blacks.

    The modifications to the passages ranged widely. In the Chekhov story ''The Upheaval,'' the exam takes out the portion in which a wealthy woman looking for a missing brooch strip-searches all of the house's staff members. Students are then asked to use the story to write an essay on the meaning of human dignity.

    A paragraph in John Holt's ''Learning All the Time'' is truncated to eliminate some of the reasons Suzuki violin instruction differs in Japan and the United States, apparently not to offend anyone who might find the particulars somehow insulting. Students are nonetheless then asked to answer questions about those differences.

    One passage was derived from Frank Conroy's memoir, ''Stop-Time.'' The changes include replacing ''hell'' with ''heck'' in one sentence and excising references to sex, religion, nudity and potential violence (in the form of the declared intent of two boys to kill a snake) that are essential to an understanding of the passage.

    ''I was just completely shocked,'' Mr. Conroy said. ''It's going through and taking out the flavor of the month. It's terrible.''

    A number of the writers and scholars Ms. Heifetz contacted have written indignant letters that have also been submitted to the education commissioner. Mr. Conroy wrote in part: ''Who are these people who think they have a right to 'tidy up' my prose? The New York State Political Police? The Correct Theme Authority?''

    Background on the Bogus Anti-Semitism Claim:

    Excerpt from the Toronto Star (full story here):

    A half-listening student, a hypersensitive campus and the speed at which gossip travels on the Internet conspired to create a very damaging game of broken telephone for one York University professor this week.

    Cameron Johnston, who has been teaching at York for more than 30 years, has been forced to respond to allegations that he made anti-Semitic remarks in a lecture on Monday afternoon after a student misunderstood his comments and began sending emails to Jewish groups and the media.

    Johnston was giving his introductory lecture to Social Sciences 1140: “Self, Culture and Society,” when he explained to the nearly 500 students that the course was going to focus on texts, not opinions, and despite what they may have heard elsewhere, everyone is not entitled to their opinion.

    “All Jews should be sterilized” would be an example of an unacceptable and dangerous opinion, Johnston told the students.

    He didn’t notice Sarah Grunfeld storm out. Grunfeld, a 22-year-old in her final year at York, understood Johnston’s example to be his personal opinion.

    Excerpt from commentary by the Atlantic Monthly (full post here):

    Grunfeld, who is not the world's best listener, was off to complain to Jewish groups about the outrageous opinion that her professor didn't have. Soon campus Jewish groups contacted Johnston for an explanation, and the whole thing turned into a very big deal on campus. Johnston, understandably, says he's "terribly upset." Grunfeld, meanwhile, is sticking to her guns: “The words, ‘Jews should be sterilized’ still came out of his mouth, so regardless of the context I still think that’s pretty serious," were words that came out of her mouth.

    Excerpt from commentary on The Language Log, which points out some of the technical aspects of “use/mention distinction,” upon which this whole mess hinges:

    It's a dangerous path one treads when one tries to give examples of obnoxious propositions in a classroom where not all the students have a firm grasp of the fundamental distinction between the use and the mention of a linguistic expression.

    Lest readers of the “Information Pleas” Blog think I’m bashing Sarah Grunfeld because she is a young adult (everything I’ve read seems to mention her age), here is an excerpt from comments by Jonathan Kay of the National Post about how older users of the internet can be more credulous than younger, and are often implicated in propagating internet falsehoods such as the bogus claim of anti-Semitism at York University:

    Putting Sarah herself to one side, I found it interesting how quickly the story made the rounds of the internet. Within the space of a few hours on Monday night, five different middle-aged or senior-citizen Jewish correspondents sent me variations on this story. “York U [prof] makes anti-Semetic remark. Verifiable,” read one subject line. Another woman asked: “Do we think he’d say ‘All Muslims are terrorists,’ or ‘All blacks should be slaves’?” With every cycle of mass email forwarding, the story was getting more sensational.

    This is part of a trend. When I started this job in 1998, most of the bogus stories I got by email were from younger correspondents — because there just weren’t that many older people online. But then two things happened.

    First, young web surfers taught themselves how to check facts, by using Wikipedia and Snopes and other reputable sites. To avoid making reply-all fools of themselves, they stopped mass-forwarding bogus stories of the York U variety.

    Second, when those young adults started going off to college, or moving away — their parents had to figure out email and Facebook and Webcams in order to communicate with their kids and view pictures of their grandchildren. But these 50-, 60-, and 70-year old Internauts, having grown up in the age of print, never figured out that most of what you read online is made up. So when their sister-in-law’s hairdresser sends them something shocking, they uncriticially pass it on to their friends.

    This explains why many middle-aged people and senior citizens I meet are actually more misinformed and radicalized than their children. Many Tea Party fanatics, in particular, are older white people who have cobbled a political philosophy together from nonsense Internet stories claiming that Barack Obama is Muslim, that global warming has been “debunked” or that universal health care means sending grandma to a “death panel.”