Sunday, February 26, 2012

‘The Lifespan of a Fact,’ by John D’Agata and Jim Fingal - NYTimes.com

A good review of a weird take on information responsibility, reporting, and the potential for non-fiction to attain the exalted status of “art.”

This brings us to D’Agata’s other outrageous proposition — that one needn’t concern oneself with facts because rarely are facts reliable, and that belief alone should be considered as muscular as fact, even when the belief has been proved to be based on invention. As long as a story “is believed by somebody,” he writes, “I consider it a legitimate potential history.” Hogwash.

‘The Lifespan of a Fact,’ by John D’Agata and Jim Fingal - NYTimes.com

Friday, February 17, 2012

The 'Undue Weight' of Truth on Wikipedia - The Chronicle Review - The Chronicle of Higher Education

Another opportunity to recall that “A lie ca travel halfway around the world before the truth can get its boots on.”

A couple of years ago, on a slow day at the office, I decided to experiment with editing one particularly misleading assertion chiseled into the Wikipedia article. The description of the trial stated, "The prosecution, led by Julius Grinnell, did not offer evidence connecting any of the defendants with the bombing. ... "

Coincidentally, that is the claim that initially hooked me on the topic. In 2001 I was teaching a labor-history course, and our textbook contained nearly the same wording that appeared on Wikipedia. One of my students raised her hand: "If the trial went on for six weeks and no evidence was presented, what did they talk about all those days?" I've been working to answer her question ever since.

I have not resolved all the mysteries that surround the bombing, but I have dug deeply enough to be sure that the claim that the trial was bereft of evidence is flatly wrong. One hundred and eighteen witnesses were called to testify, many of them unindicted co-conspirators who detailed secret meetings where plans to attack police stations were mapped out, coded messages were placed in radical newspapers, and bombs were assembled in one of the defendants' rooms.

So I removed the line about there being "no evidence" and provided a full explanation in Wikipedia's behind-the-scenes editing log. Within minutes my changes were reversed. The explanation: "You must provide reliable sources for your assertions to make changes along these lines to the article."

That was curious, as I had cited the documents that proved my point, including verbatim testimony from the trial published online by the Library of Congress. I also noted one of my own peer-reviewed articles. One of the people who had assumed the role of keeper of this bit of history for Wikipedia quoted the Web site's "undue weight" policy, which states that "articles should not give minority views as much or as detailed a description as more popular views." He then scolded me. "You should not delete information supported by the majority of sources to replace it with a minority view."

"Explain to me, then, how a 'minority' source with facts on its side would ever appear against a wrong 'majority' one?" I asked the Wiki-gatekeeper. He responded, "You're more than welcome to discuss reliable sources here, that's what the talk page is for. However, you might want to have a quick look at Wikipedia's civility policy."

I tried to edit the page again. Within 10 seconds I was informed that my citations to the primary documents were insufficient, as Wikipedia requires its contributors to rely on secondary sources, or, as my critic informed me, "published books." Another editor cheerfully tutored me in what this means: "Wikipedia is not 'truth,' Wikipedia is 'verifiability' of reliable sources. Hence, if most secondary sources which are taken as reliable happen to repeat a flawed account or description of something, Wikipedia will echo that."

So I waited two years, until my book on the trial was published. "Now, at last, I have a proper Wikipedia leg to stand on," I thought as I opened the page and found at least a dozen statements that were factual errors, including some that contradicted their own cited sources. I found myself hesitant to write, eerily aware that the self-deputized protectors of the page were reading over my shoulder, itching to revert my edits and tutor me in Wiki-decorum. I made a small edit, testing the waters.

My improvement lasted five minutes before a Wiki-cop scolded me, "I hope you will familiarize yourself with some of Wikipedia's policies, such as verifiability and undue weight. If all historians save one say that the sky was green in 1888, our policies require that we write 'Most historians write that the sky was green, but one says the sky was blue.' ... As individual editors, we're not in the business of weighing claims, just reporting what reliable sources write."

The 'Undue Weight' of Truth on Wikipedia - The Chronicle Review - The Chronicle of Higher Education

Wednesday, February 15, 2012

Apple: App Access to Contact Data Will Require User Permission - John Paczkowski - Mobile - AllThingsD

Ah…

After a week of silence, Apple has finally responded to reports that dozens of iOS applications have been accessing, transmitting and storing user contact data without explicit permission. Path was the first to be flagged for this, and others, including Twitter, Yelp and Foursquare, have since tidied up the way they ask for address book data. Apple has faced growing criticism that it has given iOS developers far too much access to address book information without requiring a user prompt.

Today, the company agreed with that assessment, and said that soon, apps that use address book data will require explicit user permission to do so

Apple: App Access to Contact Data Will Require User Permission - John Paczkowski - Mobile - AllThingsD

Your address book is mine: Many iPhone apps take your data | VentureBeat

Ew…

Path got caught red-handed uploading users’ address books to its servers and had to apologize. But the relatively obscure journaling app is not alone. In fact, Path was crucified for a practice that has become an unspoken industry standard.

Facebook, Twitter, Instagram, Foursquare, Foodspotting, Yelp, and Gowalla are among a smattering of iOS applications that have been sending the actual names, email addresses and/or phone numbers from your device’s internal address book to their servers, VentureBeat has learned. Several do so without first asking permission, and Instagram and Foursquare only added permissions prompts after the Path flare-up.

Some of these companies deny storing the personal data, as Path was doing, but the transmission alone makes the private data susceptible to would-be intercepters.

Perhaps most concerning, however, is that these app makers could mask the real names, phone numbers, and email addresses during the transmission process, protecting your privacy in the process, but choose not to.

Your address book is mine: Many iPhone apps take your data | VentureBeat

Sunday, February 12, 2012

Serendipitous Juxtaposition

Big data that’s wrong is still wrong.

In the opinion pages of the Sunday New York Times—Sunday Review in the dead-tree version—we have one article that mentions on-line dating sites as effective users of Big Data techniques, and another suggesting that the matching algorithms used by these sites don’t work.

The first article, “The Age of Big Data” includes this:

Online dating services, like Match.com, constantly sift through their Web listings of personal characteristics, reactions and communications to improve the algorithms for matching men and women on dates.

The second, “The Dubious Science of Online Dating” has this:

One major problem is that these sites fail to collect a lot of crucial information. Because they gather data from singles who have never met, the sites have no way of knowing how two people will interact once they have been matched. Yet our review of the literature reveals that aspects of relationships that emerge only after two people meet and get to know each other — things like communication patterns, problem-solving tendencies and sexual compatibility — are crucial for predicting the success or failure of relationships. For example, study after study has shown that the way that couples discuss and attempt to resolve disagreements predicts their future satisfaction and whether or not the relationship is likely to dissolve.

When it comes to data, size isn’t everything.

Online Dating Sites Don’t Match Hype - NYTimes.com

“One of the most pernicious uses of data”

A nod to information responsibility in the mainstream press:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”

Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.”

Big Data’s Impact in the World - NYTimes.com

Thursday, February 2, 2012

Networks Resort to Trickery in an Attempt to Lift Ratings - NYTimes.com

For those who follow information responsibility, here’s a story that has a little bit of everything:

  • A widespread tolerance of cynical data distortion, all to manipulate ratings and rankings.
  • Distortion of category boundaries to exclude unwanted data points.
  • Distortion of category boundaries to include desirable data points.
  • Manipulation of idiosyncratic metrics.
  • Abuse of the news cycle.
  • Anemic enforcement of ethical standards.
  • An exclusive set of informed consumers who are not hoodwinked.

Details on each of these bullets follow.  All quoted excerpts are from this article in today’s New York Times.

  • A widespread tolerance of cynical data distortion, all to manipulate ratings and rankings.

“This is the kind of programming sleight of hand that executives seize on as they seek to gain every possible edge in the television ratings game, at a time when each tenth of a point or two enhances their standing in the nightly ratings and the ability to pitch to advertisers who spend billions of dollars a year..”

  • Distortion of category boundaries to exclude unwanted data points:

“But as far as Nielsen ratings were concerned, four of the shows that week weren’t ‘Good Morning America’ at all. They were labeled ‘special’ programming by ABC, which told Nielsen that it would be called ‘Good Morning Amer.’

“ABC made the switch so that the final week of the year — typically the lowest rated of the year because of the holidays — would be ignored in the national ratings. The change allowed the network to claim — and it did — that ‘Good Morning America’ finished the year closer to NBC’s ‘Today’ show than it had in 16 years.”

  • Distortion of category boundaries to include desirable data points:

“NBC took the opposite path of ABC with the use of the term ‘special’ in its presentation of the Republican primary debate on Jan. 23. Careful viewers noticed that the debate was labeled a regular edition of the network’s ratings-challenged newsmagazine program, ‘Rock Center with Brian Williams’ — one that, as it turned out, just happened to double the show’s usual audience to just over 7.1 million viewers.”

  • Manipulation of idiosyncratic metrics:

“The manipulation of where national commercials are placed in a show has become one of the favorite shell games networks use to try to enhance their numbers. Shows receive national ratings from Nielsen only up to the point when the last national commercial is broadcast — after that, the numbers simply do not count.”

  • Abuse of the news cycle:

“Another ratings tactic that is now routine involves extending the duration of more popular shows, allowing them to run a minute or two past their scheduled end time. That means the show that follows — usually one a network wants to enhance with the best possible introduction — gets a ratings lift in early national ratings reports that are often widely reported by news outlets.

In December, for example, Fox ran its singing competition ‘The X-Factor’ a minute long to provide a strong entry for ‘I Hate My Teenage Daughter,’ the low-rated comedy that followed it.

In the initial national ratings that Nielsen reports every morning, these later-starting shows receive inflated numbers in their first half hour. Those numbers are corrected by the late afternoon, but by then media reporters intent on getting news up as fast as possible, have often bestowed some measure of success on the tagalong show.”

  • Anemic enforcement of ethical standards:

“Unless the gimmick results in something egregiously false, Nielsen does not step in. The worse that might happen would be a sternly worded letter.”

  • An exclusive set of informed consumers who are not hoodwinked:

“The tricks themselves are familiar to most in the business: smart commercial buyers know when the ratings are being spun for a better story in the media or a claim in a print ad, and they insist on paying for the real ratings, not the artificially enhanced versions.”

Networks Resort to Trickery in an Attempt to Lift Ratings - NYTimes.com

Wednesday, February 1, 2012

Data Quality, Degrading Before Our Eyes

A high-profile example of a commonplace occurrence: Reality changes and data quality suffers.  Sir Fred is now just “Mr. Fred.”  Databases must be updated.

Fred Goodwin joined the ranks of Robert Mugabe and Nicolae Ceausescu last night when his knighthood was removed by order of the Queen.

The loss of his title is immediate and he will now have to return the official medal and ribbon that went with his knighthood to Buckingham Palace. His wife Joyce, formerly Lady Goodwin, becomes plain Mrs Goodwin.

Of course, this happens all the time—people are born, die, change their names when they marry, etc.  Poor data quality is not necessarily a sign of bad software, or sloppy data entry, or any other kink in the process.  Poor data quality can come from the gradual divergence between stored data and the reality it purports to describe. 

That’s why data quality is not a project. It is a permanent program, a process, a matter of eternal vigilance.

Fred Goodwin knighthood shredded 4 years after biggest banking disaster in British history | Mail Online

The Mainstream Press: Covering What You Thought You Didn’t Care About

Why we need a mainstream press, unwittingly illustrated by three comments in—of all places—the blogosphere.

Situation:  New York Times columnist Joe Nocera is scrutinizing the many ways that students are abused at the hands of the NCAA.  In the past few weeks he’s devoted quite a few columns to the topic. He has also blogged about it, including in this post today.

The post has induced some noteworthy comments, some of which contemplate whether Mr. Nocera should relegate his outrage to his blog, where people who are interested in such matters can find it but where people who are not can avoid it.  Here’s one from Adrienne in Scarsdale, NY:

Could Mr. Nocera continue to comment on this issue in his blog, and save the precious column inches in the Times for news of broader and deeper impact? I understand that the NCAA is doing repellent things and harming some innocent people, but at a time when our nation faces so many dreadful problems (not least the current debased state of politics), let's save this particular hobby horse for the bloggers who care most about it.

Adrienne’s comment yields this direct response from Jimmy in Long Island, NY:

I disagree. Joe needs to continue his NCAA critiques in his NYT columns as well as his blog. Not everyone who reads his columns reads his blog.

Scroll down a bit and you find this earlier comment from LT in Boston:

Mr. Nocera, I have little interest in sports of any kind but have been reading your series on the NCAA religiously with a growing sense of outrage. The bullying mistreatment of the poor and less fortunate at the hands of the NCAA is abhorrent. For me, as a mother of two young children, the story of the woman and team being penalized because she wanted to nurse her infant is an outrage too far. Please let us know what we can do. Who should I write? Who can I call? Where do I send my donation?

And there you have it.  For my money, LT in Boston takes the day.  An essential purpose of the mainstream press is to show us things that we should be concerned about, but didn’t realize it. Information responsibility dictates that we not fall into the trap of the internet echo chamber—that we not spend all our time reading about and contemplating an increasingly circumscribed set of issues.  Read the mainstream press to appreciate what you didn’t even know you weren’t appreciating.

Submit Your Stupid N.C.A.A. Rule and Win - NYTimes.com