Thursday, December 6, 2012

The Rosetta Stone: Bad at Languages « Our Mechanical Brain

Embarrassing collapse of information quality…

What she means is that they’ve taken the noun form of snow in all three languages, rather than the verb. Which is embarrassing enough for a company trying to sell you language learning software. But it gets better: the press release for the campaign has now been corrected for one word (“sneeuw” > “sneeuwen”), but not the other two:


The Rosetta Stone: Bad at Languages « Our Mechanical Brain

Wednesday, December 5, 2012

Karl Rove Is in Time Out - Politics - The Atlantic Wire

A step toward information responsibility.

Less than a month after his strange, dramatic and sort of sad election night meltdown on air, Fox News is distancing itself from Karl Rove. New network rules require producers to get special permission before booking Rove and staunch anti-Obama commentator Dick Morris on their shows. New York Magazine's Gabriel Sherman got the scoop on Tuesday evening. "Multiple sources say that Ailes was angry at Rove's election-night tantrum when he disputed the network's call for Obama," Sherman reports. "While the moment made for riveting television -- it was Ailes's decision to have Kelly confront the statisticians on air -- in the end, it provided another data point for Fox's critics."

Then again, …

This will hardly be the end of old Karl, though…. he's not been kicked out of the Fox News family yet. He's just in time out.

Karl Rove Is in Time Out - Politics - The Atlantic Wire

Thursday, November 29, 2012

Reverse-Anthropomorphic, High-Tech Metaphors On Parade

Follow the link below for an interesting take on the future of Big Data. Lots of good stuff there, but also this gem:

“You and I are streaming data engines.”

Uh huh.  And during the early history of the computer age you and I were information processors. Likewise, during the industrial revolution you and I were thinking machines.  It seems that regardless of the particular moment in intellectual history, the dominant set of metaphors get applied to explain the essence of humanness. 

In fact, it says here that you can recognize a new era of intellectual history by noticing when a new metaphor of humanness takes root.  When someone starts characterizing humans as “meme vectors,” that’s strong evidence that we’re in the genetics age.


Jeff Hawkins Develops a Brainy Big Data Company -

Saturday, November 24, 2012

Neuroscience - Under Attack -

A recurring theme on this blog: The principles of information responsibility demand that each information consumer/producer must know the ways that your brain can let you down.  Awareness of confirmation bias is part of scientific ethics, but it is increasingly part of responsible consumption mainstream journalism.  (Confirmation bias is hardly the only issue. For posts about others, see here, here, and here.)

Now we have this perspective:

A team of British scientists recently analyzed nearly 3,000 neuroscientific articles published in the British press between 2000 and 2010 and found that the media regularly distorts and embellishes the findings of scientific studies. Writing in the journal Neuron, the researchers concluded that “logically irrelevant neuroscience information imbues an argument with authoritative, scientific credibility.” Another way of saying this is that bogus science gives vague, undisciplined thinking the look of seriousness and truth.

Wicked meta. Yes, information responsibility still demands that we understand how our brains can let us down. And yes, most non-neuroscientists learn about that (if at all) from the mainstream media—the same media that the principles of information responsibility demand that we consume skeptically. But don’t give up. Skeptical consumption of mainstream media—or skepticism about anything—does not mean blunt, rhetorical scowling about things you don’t want to believe.  (Exhibit A: Politically motivated dismissals of Nate Silver’s work.) It means developing a finely calibrated bullshit detector, built up from numeracy, logic, awareness of the shortcomings of metaphor and other rhetorical figures, and knowing the strengths and weaknesses of various media sources.

Neuroscience - Under Attack -

Saturday, November 10, 2012

Why political journalists can’t stand Nate Silver: The limits of journalistic knowledge | Mark Coddington

Follow the link and read this entire post.  Worth it.  

The more I think about the rift between political journalism and Nate Silver, the more it seems that it’s one that’s fundamentally an issue of epistemology — how journalists know what they know. Here’s why I think that’s the case.

When we talk about the epistemology of journalism, it all eventually ties into objectivity. The journalistic norm of objectivity is more than just a careful neutrality or attempt to appear unbiased; for journalists, it’s the grounds on which they claim the authority to describe reality to us. And the authority of objectivity is rooted in a particular process.

That process is very roughly this: Journalists get access to privileged information from official sources, then evaluate, filter, and order it through the rather ineffable quality alternatively known as “news judgment,” “news sense,” or “savvy.” This norm of objectivity is how political journalists say to the public (and to themselves), “This is why you can trust what we say we know — because we found it out through this process.” (This is far from a new observation – there are decades of sociological research on this.)

Silver’s process — his epistemology — is almost exactly the opposite of this:

Where political journalists’ information is privileged, his is public, coming from poll results that all the rest of us see, too.

Where political journalists’ information is evaluated through a subjective and nebulous professional/cultural sense of judgment, his evaluation is systematic and scientifically based. It involves judgment, too, but because it’s based in a scientific process, we can trace how he applied that judgment to reach his conclusions.

Why political journalists can’t stand Nate Silver: The limits of journalistic knowledge | Mark Coddington

Framing Political Messages with Grammar and Metaphor » American Scientist

One reason why it is so difficult to be a responsible information consumer: we are easily manipulated by linguistic subtleties. 

A few years ago, I began exploring the idea of grammatical framing. In an article with Caitlin Fausey, “Can Grammar Win Elections?” published in Political Psychology, we explored the consequences of tweaking grammatical information in political messages. We discovered that altering nothing more than grammatical aspect in a message about a political candidate could affect impressions of that candidate’s past actions, and ultimately influence attitudes about whether he would be re-elected. Participants in our study read a passage about a fictitious politician named Mark Johnson. Mark was a Senator who was seeking reelection. The passage described Mark’s educational background, and reported some things he did while he was in office, including an affair with an assistant and hush money from a prominent constituent. Some participants read a sentence about actions framed with past progressive (was VERB+ing): “Last year, Mark was having an affair with his assistant and was taking money from a prominent constituent.” Others read a sentence about actions framed with simple past (VERB+ed): “Last year, Mark had an affair with his assistant and took money from a prominent constituent.” Everything else was the same. After the participants read the passage about Mark Johnson, they answered questions. In analyzing their responses, we discovered differences. Those who read the phrases “having an affair” and “accepting hush money” were quite confident that the Senator would not be reelected. In contrast, people who read the phrases “had an affair,” and “accepted hush money” were less confident. What’s more, when queried about how much hush money they thought could be involved, those who read about “accepting hush money” gave reliably higher dollar estimates than people who read that Mark “accepted hush money.” From these results, we concluded that information framed with past progressive caused people to reflect more on the action details in a given time period than did information framed with simple past.

Framing Political Messages with Grammar and Metaphor » American Scientist

Empathy represses analytic thought, and vice versa: Brain physiology limits simultaneous use of both networks

Read it and weep, but not simultaneously.

New research shows a simple reason why even the most intelligent, complex brains can be taken by a swindler's story -- one that upon a second look offers clues it was false.

When the brain fires up the network of neurons that allows us to empathize, it suppresses the network used for analysis, a pivotal study led by a Case Western Reserve University researcher shows.

Empathy represses analytic thought, and vice versa: Brain physiology limits simultaneous use of both networks

Cassette tapes are the future of big data storage - tech - 19 October 2012 - New Scientist

I have all my data on 8-track because that’s grooviest.

THE cassette tape is about to make a comeback, in a big way. From the updates posted by Facebook's 1 billion users to the medical images shared by healthcare organisations worldwide and the rise of high-definition video streaming, the need for something to store huge tranches of data is greater than ever. And while hard drives have traditionally been the workhorse of large storage operations, a new wave of ultra-dense tape drives that pack in information at much higher densities, while using less energy, is set to replace them.

Cassette tapes are the future of big data storage - tech - 19 October 2012 - New Scientist

Language Log » Ignorance about ignorance

There are many reasons to be discouraged about the state of personal “information responsibility,” including confirmation bias, the internet echo chamber, the malign effect that Texas has on the contents of American textbooks, et cetera. So it is nice to read some responsible, measured discourse indicating that things are sometimes not quite as horrible as typically reported.  Not a cause for rejoicing, mind you, but you take what you can get.

And I recently heard a talk by Arthur Lupia ("Challenges and Opportunities in Open-Ended Coding", presented at a workshop on The Future of Survey Research) that made me even less willing to accept  at face value claims of the form "Fewer than X% of Americans Know Y". Arthur reported on some forensic analysis, so to speak, of the internal records of the American National Election Study.  He learned that the standard methodology, used in this and other surveys for asking, recording, and scoring open-ended questions (and especially open-ended recall questions), systematically underestimates respondents' knowledge.

Language Log » Ignorance about ignorance

Sunday, October 7, 2012

Open-access deal for particle physics : Nature News & Comment

This is good news, but it still doesn’t make me want to be a particle physicist.

After six years of negotiation, the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3) is now close to ensuring that nearly all particle-physics articles — about 7,000 publications last year — are made immediately free on journal websites. Upfront payments from libraries will fund the access.

Open-access deal for particle physics : Nature News & Comment

Google Spanner: I told you so.

The pendulum swings back (toward relational-style features).

Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings [9, 10, 19]. We believe it is better to have application programmers deal with performance  problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

You can read the technical paper here.

Research fraud exploded over the last decade | Ars Technica

Sigh.  Information quality and information responsibility in the news.

A number of studies have spotted a worrisome trend: although the number of scientific journals and articles published is increasing each year, the rate of papers being retracted as invalid is increasing even faster. Some of these are being retracted due to obvious ethical lapses—fraudulent data or plagiarism—but some past studies have suggested errors and technical problems were the cause of the majority of problems.

A new analysis, released by PNAS, shows this rosy picture probably isn't true. Researchers like to portray their retractions as being the result of errors, but a lot of these same papers turn out to be fraudulent when fully investigated.

Research fraud exploded over the last decade | Ars Technica

Twitter, PayPal reveal database performance - Software - Technology - News -

This is why the definition of Big Data as “data too big for traditional technologies like RDBMS” is a Big Load.

Jeremy Cole, database administration team manager at Twitter, told attendees that the micro-blogging network uses a commercial instance of MySQL because there are "some features we desperately need to manage the scale we have and to respond to problems in production".

Cole revealed that Twitter's MySQL database handles some huge numbers — three million new rows per day, the storage of 400 million tweets per day replicated four times over — but it is managed by a team of only six full-time administrators and a sole MySQL developer.

Twitter, PayPal reveal database performance - Software - Technology - News -

E-Health Insider :: NHS staff should code - Kelsey

Uh oh.  This reminds me of the “democratization of data” phenomenon of the late 80’s, when “power users” of tools like FoxPro, DBase, and Lotus 1-2-3 went to town making departmental applications.  The effects on enterprise data quality were disastrous then and there’s no reason for optimism now.

Tim Kelsey, the NHS Commissioning Board’s first national director of patients and information, is to encourage doctors and nurses and other front-line staff to learn how to program.

The new NHS information chief, and former Cabinet Office transparency tsar, says encouraging NHS staff to code will give them with the skills to work with data and help unleash a powerful and disruptive wave of innovation.

E-Health Insider :: NHS staff should code - Kelsey

Sunday, September 30, 2012

Poll Averages Have No History of Consistent Partisan Bias -

Information Irresponsibility 101: If you don’t like the data, claim bias in its collection.

Presidential elections are high-stakes affairs. So perhaps it is no surprise that when supporters of one candidate do not like the message they are hearing from the polls they tend to blame the messenger.

In 2004, Democratic Web sites were convinced that the polls were biased toward George W. Bush, asserting that they showed an implausible gain in the number of voters identifying as Republicans. But in fact, the polls were very near the actual result. Mr. Bush defeated John Kerry by 2.5 percentage points, close to (in fact just slightly better than) the 1- or 2-point lead that he had on average in the final polls. Exit polls that year found an equal number of voters describing themselves as Democrats and Republicans, also close to what the polls had predicted.

Since President Obama gained ground in the polls after the Democrats’ convention, it has been the Republicans’ turn to make the same accusations.

Poll Averages Have No History of Consistent Partisan Bias -

Saturday, September 29, 2012

Overkill Analytics

An alternative to underkill analytics, I guess.

And therein lies the beauty of overkill analytics, a term that Carter might have coined, but that appears to be catching on — especially in the world of web companies and big data. Carter says he doesn’t want to spend a lot of time fine-tuning models, writing complex algorithms or pre-analyzing data to make it work for his purposes. Rather, he wants to utilize some simple models, reduce things to numbers and process the heck out of the data set on as much hardware as is possible.

It’s not about big data so much as it is about big computing power, he said. There’s still work to be done on smaller data sets like the majority of the world deals with, but Hadoop clusters and other architectural advances let you do more to that data in a faster time than was previously possible. Now, Carter said, as long as you account for the effects of overprocessing data, you can create a black-box-like system and run every combination of simple techniques on data until you get the most-accurate answer.

W.T.F.M: Write The Freaking Manual - Floopsy

WTF, Man?

It seems that nowadays, the original phrase R.T.F.M. is also quickly becoming the need to W.T.F.M.

Developers: You spend hours, days, months, perhaps years refining your masterpiece.  It is an expression of your life’s work, heart and soul.  Why, then, would you shortchange yourself by providing poor or no documentation for the rest of us?

W.T.F.M: Write The Freaking Manual - Floopsy

Saturday, September 22, 2012

Exploring Local » Blog Archive » Google Maps announces a 400 year advantage over Apple Maps

Of the vast commentary generated by the high-profile failure of Apple Maps, this bit stands out as highly perceptive.  Again, the human factor in data quality (see the previous post) makes itself known.

Perhaps the most egregious error is that Apple’s team relied on quality control by algorithm and not a process partially vetted by informed human analysis. You cannot read about the errors in Apple Maps without realizing that these maps were being visually examined and used for the first time by Apple’s customers and not by Apple’s QC teams. If Apple thought that the results were going to be any different than they are, I would be surprised. Of course, hubris is a powerful emotion.

If you go back over this blog and follow my recounting of the history of Google’s attempts at developing a quality mapping service, you will notice that they initially tried to automate the entire process and failed miserably, as has Apple. Google learned that you cannot take the human out of the equation. While the mathematics of mapping appear relatively straight forward, I can assure you that if you take the informed human observer who possesses local and cartographic knowledge out of the equation that you will produce exactly what Apple has produced – A failed system.

The issue plaguing Apple Maps is not mathematics or algorithms, it is data quality and there can be little doubt about the types of errors that are plaguing the system. What is happening to Apple is that their users are measuring data quality. Users look for familiar places they know on maps and use these as methods of orienting themselves, as well as for testing the goodness of maps. They compare maps with reality to determine their location. They query local businesses to provide local services. When these actions fail, the map has failed and this is the source of Apple’s most significant problems. Apple’s maps are incomplete, illogical, positionally erroneous, out of date, and suffer from thematic inaccuracies.

Exploring Local » Blog Archive » Google Maps announces a 400 year advantage over Apple Maps

DARPA combines human brains and 120-megapixel cameras to create the ultimate military threat detection system | ExtremeTech

Talk about “The Human Side of Data Quality.”  which, by the way, will be a theme of the 2013 MIT Chief Data Officer & Information Quality Conference.

There are two discrete parts to the system: The 120-megapixel camera, which is tripod-mounted and looks over the battlefield (pictured below); and the computer system, where a soldier sits in front of a computer monitor with an EEG strapped to his head (pictured above). Images from the camera are fed into the computer system, which runs cognitive visual processing algorithms to detect possible threats (enemy combatants, sniper nests, IEDs). These possible threats are then shown to a soldier whose brain then works out if they’re real threats — or a false alarm (a tree branch, a shadow thrown by an overheard bird).

DARPA combines human brains and 120-megapixel cameras to create the ultimate military threat detection system | ExtremeTech

Data Scientist: The Sexiest Job of the 21st Century - Harvard Business Review

If thought leaders believe the most basic, universal skill for data scientists is the ability to write code, data science is at risk of repeating the narrative already experienced in more conventional information management.  We now know that in designing information systems, it is a fool’s errand to favor technical prowess over technology-neutral skills like requirements analysis and conceptual modeling.  Yes, big data tools are young and a little rough around the edges, so folks who can master the technology will be needed. But the basic, universal skills for data scientists must be acknowledged: understanding how data (both structured and unstructured) work and how humans experience it.

Data scientists’ most basic, universal skill is the ability to write code. This may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards. More enduring will be the need for data scientists to communicate in language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both.

Data Scientist: The Sexiest Job of the 21st Century - Harvard Business Review

Saturday, September 15, 2012

Become Data Literate in 3 Simple Steps - The Data Journalism Handbook

While we’re on the topic of data quality and journalism…

Just as literacy refers to “the ability to read for knowledge, write coherently and think critically about printed material” data-literacy is the ability to consume for knowledge, produce coherently and think critically about data. Data literacy includes statistical literacy but also understanding how to work with large data sets, how they were produced, how to connect various data sets and how to interpret them.

Become Data Literate in 3 Simple Steps - The Data Journalism Handbook

He Said, She Said, and the Truth -

File this under “Naive notions of information quality, formalized in our cultural and civic institutions.”  

But while balance may be necessary to mediating a dispute between teenage siblings, a different kind of balance — some call it “false equivalency” — has come under increasing fire. The firing squad is the public: readers and viewers who rely on accurate news reporting to make them informed citizens.

Simply put, false balance is the journalistic practice of giving equal weight to both sides of a story, regardless of an established truth on one side. And many people are fed up with it. They don’t want to hear lies or half-truths given credence on one side, and shot down on the other. They want some real answers.

He Said, She Said, and the Truth -

Friday, September 14, 2012

Language Log » They cut me out

A lesson about the human side of information quality… sigh.

While Victor Mair sweats over sheets of Chinese characters and Mark Liberman generates graphs to see if the results of refereed papers can be replicated from reprocessed raw data, I just play. There's no linguistics at all in a piece like "I Wish I'd Said That", though it is sort of basically about language; and something similar is true for quite a few other posts listed on my reference page of Lingua Franca posts. But in today's piece, for once, everything I say is completely true, and I actually try to teach a tiny bit about syntactic ambiguity. And my reward was swift and cold: the compilers of the daily email newsletter through which The Chronicle points its subscribers to what they can find today on the web refused to include a pointer to my piece.

Language Log » They cut me out

Thursday, September 13, 2012

Write Good. Code Gooder.

Not surprising, but confirms what years of experience has already shown: Computer scientists are generally less articulate than we might hope.  The link leads to a table showing scores on the GRE exam, broken out by expected field of study. Those who plan to study computer and information sciences scored miserably in both the verbal and the analytical writing sections. Could this be a factor in the Agile movements aversion to formal specifications?  (Ya think?)

An Open Letter to Wikipedia About Anatole Broyard and "The Human Stain" : The New Yorker

Not the first time Wikipedia’s preference for—insistence on, actually—low-quality data has drawn attention. (See also here.)

Yet when, through an official interlocutor, I recently petitioned Wikipedia to delete this misstatement, along with two others, my interlocutor was told by the “English Wikipedia Administrator”—in a letter dated August 25th and addressed to my interlocutor—that I, Roth, was not a credible source: “I understand your point that the author is the greatest authority on their own work,” writes the Wikipedia Administrator—“but we require secondary sources.”

An Open Letter to Wikipedia About Anatole Broyard and "The Human Stain" : The New Yorker

United States Patent: 8254902

Is that a threat as in “We should turn off this fellow’s cell phone because he is texting about his autobiographical screenplay while driving,” or as in “Stop him—he is organizing a protest march against the politically powerful?”

Moreover, in certain situations, the communications capability that the wireless device accords to its user may be what poses the threat.

United States Patent: 8254902

Wednesday, September 5, 2012

SQL vs. NoSQL | Linux Journal

I am well acquainted with the benefits of non-SQL approaches to data management. And I honor those who speak responsibly about the differences between relational and non-relational approaches, including one of my current clients, a vendor of a highly scalable non-SQL DBMS.

I’ve also noticed some irresponsible prattle on this topic, as has the author of this excellent article in Linux Journal.

This scaling myth is perpetuated and given credence every time popular Web sites announce that such-and-such RDBMS doesn't meet their needs, and so they are moving to NoSQL database X. The opinion of some in the RDBMS world is that many of these moves are not so much because the database they were using is deficient in some fundamental way, but because it was being used in a way for which it wasn't designed. To make an analogy, it's like people using flat-head screwdrivers to tighten Phillips-head screws, because it worked well enough to get the job done, but now they've discovered it is better to tighten Phillips screws with an actual Phillips screwdriver, and isn't it wonderful, and we should throw away all flat-head screwdrivers, because their time is past, and Phillips is the future.

One recent SQL-to-NoSQL move involved moving from MySQL to Cassandra. As part of the move, Digg folks blogged about how they were using MySQL and why it didn't meet their needs. Others were skeptical. Dennis Forbes, in a series of posts on his site (see Resources), questioned whether Digg needed to use a NoSQL solution like Cassandra at all. His claims centered on what he considered very poor database usage on the part of Digg combined with inadequate hardware. In his mind, if Digg had just designed its database properly or switched to using SSDs in its servers, it would have had no problems. His best quote is this:“The way that many are using NoSQL is like discovering the buggy whip at the beginning of the automotive era.” Ouch.

SQL vs. NoSQL | Linux Journal

With Rise of Gene Sequencing, Ethical Puzzles -

Reminds me of the “Fruit of the Poisonous Tree” metaphor from criminal law—which dictates that certain evidence and all evidence that flows from it—must be ignored if it was collected illegitimately.   That case and this one shed some light a human aspect of information quality: That our civic institutions sometimes demand (legitimately, in my opinion) that high-quality data be ignored.

Dr. Arul Chinnaiyan stared at a printout of gene sequences from a man with cancer, a subject in one of his studies. There, along with the man’s cancer genes, was something unexpected — genes of the virus that causes AIDS.

It could have been a sign that the man was infected with H.I.V.; the only way to tell was further testing. But Dr. Chinnaiyan, who leads the Center for Translational Pathology at the University of Michigan, was not able to suggest that to the patient, who had donated his cells on the condition that he remain anonymous.

With Rise of Gene Sequencing, Ethical Puzzles -

Friday, August 24, 2012

Day4 - How we screwed (almost) the whole Apple community (updated)

A chilling—not terribly surprising—story about using the internet echo chamber against itself.

With each step further away from the source ,the perception that this would be true, increased. On Reddit, where the original entry was made we saw it as a 0 mode, the image was posted, nothing more or less. Newspapers and blogs who drew attention to the whole thing (Yahoo, Macworld, Wired) took it with a grain of salt, so the truth factor goes down a bit. The commentators to the articles however took it almost as 100% truth, raising the truth factor bar. The commentators / readers who tok it further in their own social media (Twitter, G +, Facebook) defined it as the truth, all doubt is gone. In what segment do you pick up your information, and which one affects people the most?

Day4 - How we screwed (almost) the whole Apple community (updated)

Language Log » One little adjective

More on data quality in the public sphere. A shame here is that even those who have the rhetorical wherewithal to create nuanced criticism can opt for sound-bite-ready snarkiness instead. Our friends at language log illuminate and clarify once again:

But in the meantime, a warning to candidates for elective office: watch every word, every little attributive adjective. You may not have meant it the way it sounds; they may not even believe you meant it; but if you utter even a two-word phrase that sounds outrageous, that'll be enough rope for them to hang you with. They are playing linguistic Gotcha, and the game is deadly serious, and losers don't get elected. Be careful out there. Get your adjectives checked out by a linguist up front. And stay away from TV studios if you aren't a master of on-the-fly self-editing.

Language Log » One little adjective

Botched Restoration of Ecce Homo Fresco Shocks Spain -

Yes, data quality is everyone’s responsibility, but that obviate the need for professionals.

A case of suspected vandalism in a church in a northeastern village in Spain has turned out to be probably the worst art restoration project of all time.

An elderly woman stepped forward this week to claim responsibility for disfiguring a century-old “ecce homo” fresco of Jesus crowned with thorns, in Santuario de la Misericordia, a Roman Catholic church in Borja, near the city of Zaragoza.

Botched Restoration of Ecce Homo Fresco Shocks Spain -

Friday, April 6, 2012

A Mistake Too Amusing To Let Stand

Imagine my glee when I encountered the first paragraph of a story in the dead-tree version of the New York Review of Books:

Tony Judt had a thing about railway trains.  We even know from his last book, a brilliant compilation of his ideas on history and politics, distilled just before his untimely death from a series of conversations with Timothy Snyder, that he had wanted to write a history of trains, entitled Locomotion.

I decided then and there to write a blog post, which is not what you are reading now because my plans changed when I sought out the on-line version of the same article and found this:

Tony Judt had a thing about railway trains. We even know from his last book, a brilliant compilation of his ideas on history and politics, distilled from a series of conversations with Timothy Snyder just before his untimely death, that he had wanted to write a history of trains, entitled Locomotion.

Okay, that’s a bit better, now we know that talking to Timothy Snyder was not fatal.  But who died?  The dead-tree draft at least made it clear that Tony Judt was dead.  The on-line version suggests that Timothy Snyder is the dead guy:

“…a series of conversations with Timothy Snyder just before his untimely death…”

The writer (Ian Buruma) seems unable to get out of his own way.  So, the snarky blog entry I had intended to write—about an apparently fatal conversation with Timothy Snyder—has expanded. 

The expanded scope includes:

  • Data quality for narrative data.
  • Clarity: A rule of thumb for amateur writers.
  • Obviousness in examples—expect stupid rebuttals.
  • The ethics of changing already-published material.

I’ll cover these topics in upcoming posts.  I’ll add the links to this post as they become available.  Stay tuned.

Friday, March 23, 2012

The Anatomy of Media Bias: Trayvon Martin, Mike Daisey, and the Press - Megan McArdle - Business - The Atlantic

The “right proportions of the truth.”  Too bad “fair and balanced” has already been co-opted. 

The peculiar problem of the information age is that we now have access to far more true stories than any one brain -- evolved for life in groups of a few hundred -- can possibly process. Our natural tendency to extrapolate from the subset we're exposed to means we can wind up with wildly inaccurate views of the world as a whole, even when all the stories we hear are true. For people with a storytelling gift as powerful as Mike Daisey's, or a job that empowers them to choose which of a hundred newsworthy tales makes the evening broadcast, that implies a responsibility beyond the traditional obligation to speak the truth. What we need today are the right proportions of truth.

The Anatomy of Media Bias: Trayvon Martin, Mike Daisey, and the Press - Megan McArdle - Business - The Atlantic

Tuesday, March 20, 2012

Enough to Make You Retch

Embarrassed by the gag-inducing realities of industrial meat production, the industry fights back by criminalizing investigative journalism. Information Irresponsibility, Legislative Division.

From The Atlantic:

Earlier this month, politicians in Iowa bowed to corporate pressure when they passed a law designed to stifle public debate and keep consumers in the dark. Instead of confronting animal cruelty on factory farms, the top egg- and pork-producing state is now in the business of covering it up. As one of the people this new law is designed to silence, I'm concerned that Iowa is shooting the messenger while letting the real criminals go unpunished.

HF 589 (PDF), better known as the "Ag Gag" law, criminalizes investigative journalists and animal protection advocates who take entry-level jobs at factory farms in order to document the rampant food safety and animal welfare abuses within. In recent years, these undercover videos have spurred changes in our food system by showing consumers the disturbing truth about where most of today's meat, eggs, and dairy is produced. Undercover investigations have directly led to America's largest meat recalls, as well as to the closure of several slaughterhouses that had egregiously cruel animal handling practices. Iowa's Ag Gag law -- along with similar bills pending in other states -- illustrates just how desperate these industries are to keep this information from getting out.

The Ag Gag Laws: Hiding Factory Farm Abuses From Public Scrutiny - Cody Carlson - Health - The Atlantic

Thursday, March 1, 2012

Data Quality is King. The King Is Dead. Long Live The King.

Twitter can spread false information just as quickly as true.  An essay on the website of the American Journalism Review suggests that this is why Twitter cannot be said to “break news.”  For starters:

While they might not mean it literally, bloggers and news organizations that credit Twitter and other social networks with "reporting" or "breaking" news are implying a contest between social networks and the press, in which lumbering news organizations are smacked down by a faster and more agile rival. And what journalist with healthy competitive instincts wouldn't feel a bit goaded or threatened by that? The far less provocative truth is that the media are working through Twitter, not racing against it.

What’s more, it appears that the pattern of tweets about Whitney Houston’s death—and actual death in this case—supports that view:

While nearly an hour passed between the first known mention of Houston's death and the AP's report, Twitter's timeline clearly shows that the story flatlined until the AP tweet. It was that properly attributed post by a credible news organization with a broad following that broke through the noise.

The relatively few people who saw the initial Whitney Houston tweets had reason to be extremely skeptical. Social media death hoaxes have befallen countless very-much-alive public figures, including President Obama, Lady Gaga, Eddie Murphy, Jon Bon Jovi and Chuck Norris (who, as fans noted, is invincible and cannot be killed). "Twitter Death" has become a near-daily occurrence, prompting a great many users to respond with caution when they hear that Madonna, Jackie Chan or Snookie has gone to the great beyond.

In that context, it's tough to make the case that a handful of dubiously sourced Twitter posts by unknown individuals with relatively small followings broke the news of Houston's death in any meaningful way. Those early tweets were indistinguishable from other celebrity death rumors, except that they turned out to be true.

And information consumers might be learning something about the differences between electronically supported rumor-mongering and actual journalism:

Rather than marginalizing the news media, Twitter and other social networks may be reinforcing their value. A generation of social media users is learning―through debunked reports and hoaxes ―the difference between saying something and reporting it. The ways in which people first hear information may have changed, but their reliance on reporters to separate fact from fiction and provide depth and context to the news has not.

The Twitter Death Epidemic  | American Journalism Review

Sunday, February 26, 2012

‘The Lifespan of a Fact,’ by John D’Agata and Jim Fingal -

A good review of a weird take on information responsibility, reporting, and the potential for non-fiction to attain the exalted status of “art.”

This brings us to D’Agata’s other outrageous proposition — that one needn’t concern oneself with facts because rarely are facts reliable, and that belief alone should be considered as muscular as fact, even when the belief has been proved to be based on invention. As long as a story “is believed by somebody,” he writes, “I consider it a legitimate potential history.” Hogwash.

‘The Lifespan of a Fact,’ by John D’Agata and Jim Fingal -

Friday, February 17, 2012

The 'Undue Weight' of Truth on Wikipedia - The Chronicle Review - The Chronicle of Higher Education

Another opportunity to recall that “A lie ca travel halfway around the world before the truth can get its boots on.”

A couple of years ago, on a slow day at the office, I decided to experiment with editing one particularly misleading assertion chiseled into the Wikipedia article. The description of the trial stated, "The prosecution, led by Julius Grinnell, did not offer evidence connecting any of the defendants with the bombing. ... "

Coincidentally, that is the claim that initially hooked me on the topic. In 2001 I was teaching a labor-history course, and our textbook contained nearly the same wording that appeared on Wikipedia. One of my students raised her hand: "If the trial went on for six weeks and no evidence was presented, what did they talk about all those days?" I've been working to answer her question ever since.

I have not resolved all the mysteries that surround the bombing, but I have dug deeply enough to be sure that the claim that the trial was bereft of evidence is flatly wrong. One hundred and eighteen witnesses were called to testify, many of them unindicted co-conspirators who detailed secret meetings where plans to attack police stations were mapped out, coded messages were placed in radical newspapers, and bombs were assembled in one of the defendants' rooms.

So I removed the line about there being "no evidence" and provided a full explanation in Wikipedia's behind-the-scenes editing log. Within minutes my changes were reversed. The explanation: "You must provide reliable sources for your assertions to make changes along these lines to the article."

That was curious, as I had cited the documents that proved my point, including verbatim testimony from the trial published online by the Library of Congress. I also noted one of my own peer-reviewed articles. One of the people who had assumed the role of keeper of this bit of history for Wikipedia quoted the Web site's "undue weight" policy, which states that "articles should not give minority views as much or as detailed a description as more popular views." He then scolded me. "You should not delete information supported by the majority of sources to replace it with a minority view."

"Explain to me, then, how a 'minority' source with facts on its side would ever appear against a wrong 'majority' one?" I asked the Wiki-gatekeeper. He responded, "You're more than welcome to discuss reliable sources here, that's what the talk page is for. However, you might want to have a quick look at Wikipedia's civility policy."

I tried to edit the page again. Within 10 seconds I was informed that my citations to the primary documents were insufficient, as Wikipedia requires its contributors to rely on secondary sources, or, as my critic informed me, "published books." Another editor cheerfully tutored me in what this means: "Wikipedia is not 'truth,' Wikipedia is 'verifiability' of reliable sources. Hence, if most secondary sources which are taken as reliable happen to repeat a flawed account or description of something, Wikipedia will echo that."

So I waited two years, until my book on the trial was published. "Now, at last, I have a proper Wikipedia leg to stand on," I thought as I opened the page and found at least a dozen statements that were factual errors, including some that contradicted their own cited sources. I found myself hesitant to write, eerily aware that the self-deputized protectors of the page were reading over my shoulder, itching to revert my edits and tutor me in Wiki-decorum. I made a small edit, testing the waters.

My improvement lasted five minutes before a Wiki-cop scolded me, "I hope you will familiarize yourself with some of Wikipedia's policies, such as verifiability and undue weight. If all historians save one say that the sky was green in 1888, our policies require that we write 'Most historians write that the sky was green, but one says the sky was blue.' ... As individual editors, we're not in the business of weighing claims, just reporting what reliable sources write."

The 'Undue Weight' of Truth on Wikipedia - The Chronicle Review - The Chronicle of Higher Education

Wednesday, February 15, 2012

Apple: App Access to Contact Data Will Require User Permission - John Paczkowski - Mobile - AllThingsD


After a week of silence, Apple has finally responded to reports that dozens of iOS applications have been accessing, transmitting and storing user contact data without explicit permission. Path was the first to be flagged for this, and others, including Twitter, Yelp and Foursquare, have since tidied up the way they ask for address book data. Apple has faced growing criticism that it has given iOS developers far too much access to address book information without requiring a user prompt.

Today, the company agreed with that assessment, and said that soon, apps that use address book data will require explicit user permission to do so

Apple: App Access to Contact Data Will Require User Permission - John Paczkowski - Mobile - AllThingsD

Your address book is mine: Many iPhone apps take your data | VentureBeat


Path got caught red-handed uploading users’ address books to its servers and had to apologize. But the relatively obscure journaling app is not alone. In fact, Path was crucified for a practice that has become an unspoken industry standard.

Facebook, Twitter, Instagram, Foursquare, Foodspotting, Yelp, and Gowalla are among a smattering of iOS applications that have been sending the actual names, email addresses and/or phone numbers from your device’s internal address book to their servers, VentureBeat has learned. Several do so without first asking permission, and Instagram and Foursquare only added permissions prompts after the Path flare-up.

Some of these companies deny storing the personal data, as Path was doing, but the transmission alone makes the private data susceptible to would-be intercepters.

Perhaps most concerning, however, is that these app makers could mask the real names, phone numbers, and email addresses during the transmission process, protecting your privacy in the process, but choose not to.

Your address book is mine: Many iPhone apps take your data | VentureBeat

Sunday, February 12, 2012

Serendipitous Juxtaposition

Big data that’s wrong is still wrong.

In the opinion pages of the Sunday New York Times—Sunday Review in the dead-tree version—we have one article that mentions on-line dating sites as effective users of Big Data techniques, and another suggesting that the matching algorithms used by these sites don’t work.

The first article, “The Age of Big Data” includes this:

Online dating services, like, constantly sift through their Web listings of personal characteristics, reactions and communications to improve the algorithms for matching men and women on dates.

The second, “The Dubious Science of Online Dating” has this:

One major problem is that these sites fail to collect a lot of crucial information. Because they gather data from singles who have never met, the sites have no way of knowing how two people will interact once they have been matched. Yet our review of the literature reveals that aspects of relationships that emerge only after two people meet and get to know each other — things like communication patterns, problem-solving tendencies and sexual compatibility — are crucial for predicting the success or failure of relationships. For example, study after study has shown that the way that couples discuss and attempt to resolve disagreements predicts their future satisfaction and whether or not the relationship is likely to dissolve.

When it comes to data, size isn’t everything.

Online Dating Sites Don’t Match Hype -

“One of the most pernicious uses of data”

A nod to information responsibility in the mainstream press:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”

Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.”

Big Data’s Impact in the World -

Thursday, February 2, 2012

Networks Resort to Trickery in an Attempt to Lift Ratings -

For those who follow information responsibility, here’s a story that has a little bit of everything:

  • A widespread tolerance of cynical data distortion, all to manipulate ratings and rankings.
  • Distortion of category boundaries to exclude unwanted data points.
  • Distortion of category boundaries to include desirable data points.
  • Manipulation of idiosyncratic metrics.
  • Abuse of the news cycle.
  • Anemic enforcement of ethical standards.
  • An exclusive set of informed consumers who are not hoodwinked.

Details on each of these bullets follow.  All quoted excerpts are from this article in today’s New York Times.

  • A widespread tolerance of cynical data distortion, all to manipulate ratings and rankings.

“This is the kind of programming sleight of hand that executives seize on as they seek to gain every possible edge in the television ratings game, at a time when each tenth of a point or two enhances their standing in the nightly ratings and the ability to pitch to advertisers who spend billions of dollars a year..”

  • Distortion of category boundaries to exclude unwanted data points:

“But as far as Nielsen ratings were concerned, four of the shows that week weren’t ‘Good Morning America’ at all. They were labeled ‘special’ programming by ABC, which told Nielsen that it would be called ‘Good Morning Amer.’

“ABC made the switch so that the final week of the year — typically the lowest rated of the year because of the holidays — would be ignored in the national ratings. The change allowed the network to claim — and it did — that ‘Good Morning America’ finished the year closer to NBC’s ‘Today’ show than it had in 16 years.”

  • Distortion of category boundaries to include desirable data points:

“NBC took the opposite path of ABC with the use of the term ‘special’ in its presentation of the Republican primary debate on Jan. 23. Careful viewers noticed that the debate was labeled a regular edition of the network’s ratings-challenged newsmagazine program, ‘Rock Center with Brian Williams’ — one that, as it turned out, just happened to double the show’s usual audience to just over 7.1 million viewers.”

  • Manipulation of idiosyncratic metrics:

“The manipulation of where national commercials are placed in a show has become one of the favorite shell games networks use to try to enhance their numbers. Shows receive national ratings from Nielsen only up to the point when the last national commercial is broadcast — after that, the numbers simply do not count.”

  • Abuse of the news cycle:

“Another ratings tactic that is now routine involves extending the duration of more popular shows, allowing them to run a minute or two past their scheduled end time. That means the show that follows — usually one a network wants to enhance with the best possible introduction — gets a ratings lift in early national ratings reports that are often widely reported by news outlets.

In December, for example, Fox ran its singing competition ‘The X-Factor’ a minute long to provide a strong entry for ‘I Hate My Teenage Daughter,’ the low-rated comedy that followed it.

In the initial national ratings that Nielsen reports every morning, these later-starting shows receive inflated numbers in their first half hour. Those numbers are corrected by the late afternoon, but by then media reporters intent on getting news up as fast as possible, have often bestowed some measure of success on the tagalong show.”

  • Anemic enforcement of ethical standards:

“Unless the gimmick results in something egregiously false, Nielsen does not step in. The worse that might happen would be a sternly worded letter.”

  • An exclusive set of informed consumers who are not hoodwinked:

“The tricks themselves are familiar to most in the business: smart commercial buyers know when the ratings are being spun for a better story in the media or a claim in a print ad, and they insist on paying for the real ratings, not the artificially enhanced versions.”

Networks Resort to Trickery in an Attempt to Lift Ratings -

Wednesday, February 1, 2012

Data Quality, Degrading Before Our Eyes

A high-profile example of a commonplace occurrence: Reality changes and data quality suffers.  Sir Fred is now just “Mr. Fred.”  Databases must be updated.

Fred Goodwin joined the ranks of Robert Mugabe and Nicolae Ceausescu last night when his knighthood was removed by order of the Queen.

The loss of his title is immediate and he will now have to return the official medal and ribbon that went with his knighthood to Buckingham Palace. His wife Joyce, formerly Lady Goodwin, becomes plain Mrs Goodwin.

Of course, this happens all the time—people are born, die, change their names when they marry, etc.  Poor data quality is not necessarily a sign of bad software, or sloppy data entry, or any other kink in the process.  Poor data quality can come from the gradual divergence between stored data and the reality it purports to describe. 

That’s why data quality is not a project. It is a permanent program, a process, a matter of eternal vigilance.

Fred Goodwin knighthood shredded 4 years after biggest banking disaster in British history | Mail Online

The Mainstream Press: Covering What You Thought You Didn’t Care About

Why we need a mainstream press, unwittingly illustrated by three comments in—of all places—the blogosphere.

Situation:  New York Times columnist Joe Nocera is scrutinizing the many ways that students are abused at the hands of the NCAA.  In the past few weeks he’s devoted quite a few columns to the topic. He has also blogged about it, including in this post today.

The post has induced some noteworthy comments, some of which contemplate whether Mr. Nocera should relegate his outrage to his blog, where people who are interested in such matters can find it but where people who are not can avoid it.  Here’s one from Adrienne in Scarsdale, NY:

Could Mr. Nocera continue to comment on this issue in his blog, and save the precious column inches in the Times for news of broader and deeper impact? I understand that the NCAA is doing repellent things and harming some innocent people, but at a time when our nation faces so many dreadful problems (not least the current debased state of politics), let's save this particular hobby horse for the bloggers who care most about it.

Adrienne’s comment yields this direct response from Jimmy in Long Island, NY:

I disagree. Joe needs to continue his NCAA critiques in his NYT columns as well as his blog. Not everyone who reads his columns reads his blog.

Scroll down a bit and you find this earlier comment from LT in Boston:

Mr. Nocera, I have little interest in sports of any kind but have been reading your series on the NCAA religiously with a growing sense of outrage. The bullying mistreatment of the poor and less fortunate at the hands of the NCAA is abhorrent. For me, as a mother of two young children, the story of the woman and team being penalized because she wanted to nurse her infant is an outrage too far. Please let us know what we can do. Who should I write? Who can I call? Where do I send my donation?

And there you have it.  For my money, LT in Boston takes the day.  An essential purpose of the mainstream press is to show us things that we should be concerned about, but didn’t realize it. Information responsibility dictates that we not fall into the trap of the internet echo chamber—that we not spend all our time reading about and contemplating an increasingly circumscribed set of issues.  Read the mainstream press to appreciate what you didn’t even know you weren’t appreciating.

Submit Your Stupid N.C.A.A. Rule and Win -

Tuesday, January 31, 2012

Claremont McKenna College Says It Exaggerated SAT Figures -

Lots of lessons here in information responsibility and naivety.

Claremont McKenna College, a small, prestigious California school, said Monday that for the past six years, it has submitted false SAT scores to publications like U.S. News & World Report that use the data in widely followed college rankings.

In a message e-mailed to college staff members and students, Claremont McKenna’s president since 1999, Pamela B. Gann, wrote that “a senior administrator” had taken sole responsibility for falsifying the scores, admitted doing so since 2005, and resigned his post.

The lessons?  First and most obviously, the senior administrator has utterly failed to live up to the principles of information responsibility.

Second, publishers of books of college rankings deserve our scorn for simplifying a highly textured phenomenon (quality of colleges and universities) into ordinal rankings.

Third, American consumers suffer from what might be called “rank frenzy.”  This appears in many guises, including:

  • Folks running around insisting that “We’re number one,” as if being second-best at something is a cause for shame.
  • Nike advertising campaigns that suggest “You don’t win silver; you lose gold.”
  • Credulous parents and students actually believing that it is better to go to the #6 school than the #7.

And finally, a happy lesson.  Kudos to the president of Claremont McKenna College for getting ahead of the story, going public when the malfeasance was discovered.

Claremont McKenna College Says It Exaggerated SAT Figures -

Saturday, January 21, 2012

Kudos to The Retraction Watch

From the Ministry of Information Responsibility, Division of Scientific Research, Departments of Quality Control, Conflict of Interest Disclosure, and Academic Misconduct:

The Retraction Watch is a website that tracks retractions of previously published scientific research “as a window into the scientific process.” 

Three cheers.

False Memory and Concocted Identity

For students of information responsibility: an artist’s take on false memory syndrome, and a snarky project about concocting more glamorous identities for ourselves.

For Hopwood, examining the ways we deceive ourselves through memory is perhaps a natural progression. He has worked with fellow artists as part of the WITH Collective on projects that expose and poke fun at the many ways we style our public selves. “Identity is not fixed,” he says. Instead, it shifts depending on the company we are in, and even the format of the interaction - be it social media or in person.

We’re extraordinarily preoccupied with sculpting our identities, as the glut of self-help books and pseudoscientific methods for personal development demonstrates. Through the WITH Collective, Hopwood has pushed this to the preposterous in a series of whimsical, biting and often hilarious “solutions” offering people alternate realities to claim as their own. In these fictitious scenarios, people can avail themselves of “traumaformer” for example, a “product” that conjures up a more traumatic past for the purchaser, or shift the blame to someone else with “scapegoad”. For the sexually curious but timid, there’s also “homoflexible”: “We perform your fantasies/fears for you, as you, so you don’t have to,” the site boasts.

These past projects have all been gleefully tongue in cheek, “cheerful antagonism” as Hopwood describes it. Yet these satirical takes on modern living have been cast in new light as his understanding of memory has grown, and with it his fascination for false memory in particular.  

To add your own false memories, go to

CultureLab: Remembering things that never happened

Friday, January 20, 2012

We are desperate for something metric

In news coverage, numbers connote authority merely by dint of their not being mere words.

But all this raises the question of why we overcover Iowa and New Hampshire in the first place. And the answer, as ever, is that we, the political reporting claque, cannot resist anything that looks like a scoreboard. We are desperate for something metric, and we are desperate for early returns. It hardly matters how many or from where. Or how they are counted.

Iowa Republicans To The GOP: Please Don't Ask Us Who Won : It's All Politics : NPR

Thursday, January 19, 2012

SOPA Boycotts and the False Ideals of the Web -

Jaron Lanier makes a point.

Our melodrama is driven by a vision of an open Internet that has already been distorted, though not by the old industries that fear piracy.

For instance, until a year ago, I enjoyed a certain kind of user-generated content very much: I participated in forums in which musicians talked about musical instruments.

For years, I was warned that old-fashioned control freaks like media moguls might separate me from my beloved forums. Perhaps a forum would be shut down because it was hosted on some server with pirated content.

While acknowledging that this is a possible scenario, a very different factor — proprietary social networking — is ending my freedom to participate in the forums I used to love, at least on terms I accept. Like many other forms of contact, the musical conversations are moving into private sites, particularly Facebook. To continue to participate, I’d have to accept Facebook’s philosophy, under which it analyzes me, and is searching for new ways to charge third parties for the use of that analysis.

And it’s not Facebook’s fault! We, the idealists, insisted that information be able to flow freely online, which meant that services relating to information, instead of the information itself, would be the main profit centers. Some businesses do sell content, but that doesn’t address the business side of everyday user-generated content.

The adulation of “free content” inevitably meant that “advertising” would become the biggest business in the open part of the information economy. Furthermore, that system isn’t so welcoming to new competitors. Once networks are established, it is hard to reduce their power. Google’s advertisers, for instance, know what will happen if they move away. The next-highest bidder for each position in Google’s auction-based model for selling ads will inherit that position if the top bidder goes elsewhere. So Google’s advertisers tend to stay put because the consequences of leaving are obvious to them, whereas the opportunities they might gain by leaving are not.

The obvious strategy in the fight for a piece of the advertising pie is to close off substantial parts of the Internet so Google doesn’t see it all anymore. That’s how Facebook hopes to make money, by sealing off a huge amount of user-generated information into a separate, non-Google world. Networks lock in their users, whether it is Facebook’s members or Google’s advertisers.

SOPA Boycotts and the False Ideals of the Web -