Sunday, July 31, 2011

#MITIQ: Report from MIT IQIS: The Chief Data Officer

Earlier this month I spent a week at the 2011 MIT Information Quality Industry Symposium. 

IQ is a young discipline, and conventional wisdom has not yet congealed into a widely accepted set of best practices.  For example, although most of the gathered experts agreed that an IQ program requires someone in the Chief Data Officer role, the urgency—or the perceived urgency—of the need can vary:

  • If data is part of your service (e.g., your service is a thoroughbred racetrack), you absolutely need a CDO.
  • If data supports your service (e.g., a brokerage), the need for a CDO is real, but some myopic folks might not realize this.
  • If data is not a part of your product, but describes your operations, the need for a CDO is slightly less urgent.  (Even here, the merits of data quality cannot be understated, especially for data that supports regulatory reporting.)

Furthermore, there were different—widely different—organizational approaches: 

  • CDO should report to the CIO
  • CDO should not report to the CIO, but to someone less focused on technology, such as the COO.

Other topics yielded equally varied opinions:

  • A good way to launch a data quality / information quality program is to concentrate on saving money.
  • Cost should not be the primary motivator; understanding the business should be.  The goal is not reducing cost, but establishing discipline (e.g., understanding risks and efficiencies in operation…)

And while we’re talking about money, here’s another topic whose discussion showed widely varying opinions:

  • One function of the CDO is to establish a reliable funding stream for the DQ / IQ program.
  • No.  Don’t rely on a funding stream and don’t fall into the funding-stream mentality, because funding streams dry up when personnel changes.  It is too difficult to defend a budget line item called “Data Quality.”  You must embed data quality into your core business, so that DQ is barely distinguishable from your core business operations.

The opinions referred to here are not mine.  I’m just reporting on the vibrancy of the discussions at this year’s MIT IQIS conference.  Vibrant discussion is a good sign, because DQ / IQ programs will be a part of our future and we need to figure out what that means.

Wednesday, July 20, 2011

Lies, Perjury, Irresponsibility, Gamesmanship

James B. Stewart suspects that America is suffering from an epidemic of perjury.  It says so in his new book, Tangled Webs: How False Statements Are Undermining America: From Martha Stewart to Bernie Madoff.  Here’s an excerpt (taken from here):

We know how many murders are committed each year — 1,318,398 in 2009. We know the precise numbers for reported instances of rape, robbery, aggravated assault, burglary, larceny, and vehicle theft. No one keeps statistics for perjury and false statements — lies told under oath or to investigative and other agencies of the U.S. government — even though they are felonies punishable by up to five years in prison. There is simply too much of it, and too little is prosecuted to generate any meaningful statistics.

Although lying seems to be an inherent part of human nature, the narrow but serious class of lies that undermines the judicial process on which government depends has been a crime as old as civilization itself. Originally prosecuted in England by ecclesiastical courts, by the sixteenth century perjury was firmly embedded as a crime in the English common law.

Mounting evidence suggests that the broad public commitment to telling the truth under oath has been breaking down, eroding over recent decades, a trend that has been accelerating in recent years. Because there are no statistics, it’s impossible to know for certain how much lying afflicts the judicial process, and whether it’s worse now than in previous decades. Street criminals have always lied when confronted by law enforcement. But prosecutors have told me repeatedly that a surge of concerted, deliberate lying by a different class of criminal — sophisticated, educated, affluent, and represented in many cases by the best lawyers — threatens to swamp the legal system and undermine the prosecution of white-collar crime.

Stewart’s take on the phenomenon might be a bit simplistic.  Here’s an excerpt from one book review of Tangled Webs:

“We know how many murders are committed each year — 1,318,398 in 2009,” he writes in the first sentence of “Tangled Webs.”

At this point, if I were caught up in Stewart’s prosecutorial spirit, I might object that the first sentence of his book is a lie. In fact, according to the F.B.I.’s statistics, an estimated 1,318,398 violent crimes, not murders, were committed in the United States in 2009. And a vast majority of these violent crimes didn’t involve murder; they involved robbery and aggravated assault. But of course, it would be hyperbolic and unfair of me to accuse Stewart of lying without knowing more about the motive behind his false statement. Perhaps it was an inadvertent error, in which case calling it a lie seems much too strong. On the other hand, perhaps it was a deliberate misrepresentation devised to create a more dramatic opening — perhaps, in other words, he felt that comparing lying to robbery would be less vivid than comparing lying to murder. Deliberate misrepresentation seems highly unlikely for a Pulitzer Prize-­winning journalist of his caliber, but without knowing more about his motives, I can’t make a fair-minded judgment about how seriously to treat his false statement.

Unlike Stewart, the Anglo-American legal system has long been sensitive to these fine distinctions. It has treated some lies more seriously than others, depending on the intent of the speaker and the effect on other people.

Although Stewart, now a business columnist for The New York Times, claims that lying has been on the rise, a more plausible thesis is that prosecutions for false statements have been rising — not because of growing contempt for the truth but because defendants are increasingly prosecuted for doing nothing more than denying their guilt to investigators.

Complicating matters is the widely disseminated meme of “three felonies a day”—the idea that American federal law is so onerous that a typical American citizen commits three felonies per day as he or she goes about his or her business. 

Another complication: Prosecutors themselves—those who report their outrage about perjury to Mr. Stewart—are not above taking the occasional liberty with the truth.  One doesn’t need to look hard: Here’s an example from—delving deep into the archives here—this week:

Assertions by the prosecution that Casey Anthony conducted extensive computer searches on the word “chloroform” were based on inaccurate data, a software designer who testified at the trial said Monday.

The designer, John Bradley, said Ms. Anthony had visited what the prosecution said was a crucial Web site only once, not 84 times, as prosecutors had asserted. He came to that conclusion after redesigning his software, and immediately alerted prosecutors and the police about the mistake, he said.

The finding of 84 visits was used repeatedly during the trial to suggest that Ms. Anthony had planned to murder her 2-year-old daughter, Caylee, who was found dead in 2008. Ms. Anthony, who could have faced the death penalty, was acquitted of the killing on July 5.

Mr. Bradley’s findings were not presented to the jury and the record was never corrected, he said. Prosecutors are required to reveal all information that is exculpatory to the defense.

Now is a good chance to call attention to a personal approach to information responsibility.  I do not know whether Casey Anthony committed the crimes with which she was charged and of which she was recently acquitted.  It is my stated policy—motivated by information responsibility—to not have an an opinion about criminal charges unless I am a juror responsible for contemplating those charges.

Revolutions: Growth in data-related jobs

Makes sense to me.

At job-search site indeed.com, you can take a look at trends in the use of keywords used in job postings. As you might expect, job postings containing terms related to making sense from data are on the rise.

Revolutions: Growth in data-related jobs

Monday, July 18, 2011

Language Log » Presupposition and boasting instructions for politicians

From the department of “Information Responsibility Includes Knowing How Your Brain Can Let You Down:”

In fact, psychological studies as far back as the seventies have shown that people can be so eager to accommodate presupposed information that they might even tweak their own memories accordingly. In a study led by memory scientist Elizabeth Loftus, people who'd witnessed simulated car crashes were more likely to mistakenly remember a stop sign when asked "Do you remember seeing the stop sign?" as opposed to "Do you remember seeing a stop sign?"

Language Log » Presupposition and boasting instructions for politicians

Wednesday, July 13, 2011

MIT IQIS 2011: Data, Information, Knowledge

Lots of great stuff presented at MIT IQIS 2011 this week.  Stuff?  It seems that I’m no longer entitled to use the words data, information, and knowledge the ways civilians—the sorts of folks who wouldn’t attend MIT IQIS—might use those words. 

I admit, I’m repeating a theme from an earlier post.

Many conference attendees distinguish between raw data, minimally processed data, and thoroughly analyzed data. I’m down with that; those distinctions are legitimate and deserve our attention.  However, it seems that many of the attendees of MIT IQIS have appropriated some English words to express these distinctions.  I’ve heard the following: Raw data is called “data.”  Minimally processed data is called “information.”  Thoroughly analyzed data is called “knowledge.”

Reminder:  Civilians—even those who recognize the merits of the distinctions among data that is raw, minimally processed, and thoroughly analyzed—would not use the words data, information, and knowledge in this way.  An irony: A recurring theme of information quality is the need to secure business buy-in and support for IQ initiatives.  Hard to get the business to buy in to what you are proposing when you keep distorting the meanings of their perfectly good words.

MIT IQIS 2011

Just beginning my 90-minute tutorial on best practices in data modeling for attendees of the MIT IQIS 2011 Conference.  Good room, good crowd, sunny day in Cambridge.  Throughout the rest of this week, watch this spot for more from MIT IQIS.

Monday, July 11, 2011

An Underwhelming Bachmann "Gaffe" : CJR

Likely reality:  A member of Michelle Bachmann’s staff confused Waterloo, Iowa with Winterset, Iowa.

Gleefully reported story:  Michelle Bachmann confused notorious serial killer John Wayne Gacy with American film icon and airport name-giver John Wayne.

Vigorous competition for eyeballs notwithstanding, this is information irresponsibility, plain and simple. 

There is no reason to assume that Bachmann, apparently unaware of the relatively well-known John Wayne Birthplace Museum in Winterset, Iowa, some 150 miles from Waterloo, was better informed about the birthplace of the serial killer (who certainly demonstrated few of the “what America’s all about” qualities the candidate praises in the cowboy). More likely is that Bachmann’s aides were careless in their research before her Waterloo appearance, giving her bad information.

Making the leap from John Wayne’s misnamed hometown to a mix-up of him and Gacy seems like a vaguely dishonest means to a spicy headline (not to mention innumerable videos of a Pogo-faced Michele Bachmann on YouTube). Granted, the headline grabs the eye more than might “BACHMANN MIXES UP JOHN WAYNE’S HOMETOWN WITH ANOTHER IOWA HAMLET STARTING WITH W,” and thus draws attention to the kind of careless mistake on the part of the candidate that could have an impact on swing voters.

An Underwhelming Bachmann "Gaffe" : CJR