Tuesday, June 25, 2013

Opinion: What Jim Carrey got wrong - CNN.com

An atypical topic for me, but there’s some wisdom in here about a common subconscious motivation for distorting reality.  I have already blogged about such motivations for ignoring data that contradicts what we want to believe, or for distorting data to support what we want others to believe.  This example is about imagining data that supports what we want to believe even when such data, inconveniently, fails to exist.

Such narratives give us a sense that the uncontrollable might be controlled. And to maintain them, we simply ignore cases that don't fit them, such as a rash of violence in the months after Sandy Hook committed by elderly men with no discernible connection to violent media. This allows us to maintain an illusion of correlation where none exists.

Opinion: What Jim Carrey got wrong - CNN.com

Monday, June 24, 2013

Big Data's Human Error Problem | Big Data

Some publicity about a panel I am moderating at the MIT CDO/IQ Conference next month.

Has the problem of bad data grown worse in the era of big data? No, not really, says author and industry analyst Joe Maguire, one of the organizers of the MIT Chief Data Officer and Information Quality (CDOIQ) Symposium, to be held July 17-19 in Cambridge, Mass.

When it comes to information, digital or otherwise, one fact never changes: humans and data quality errors are inseparable, Maguire told InformationWeek in a phone and email interview. Furthermore, data that's too clean -- devoid of any signs of human blunders -- is immediately suspect.

"Sure, bad data touches human lives -- and vice versa. Humans are known to make a certain number of typos. In certain contexts, immaculate data could be a sign of fraud. If humans are involved in the production of data, you should expect it to be imperfect," Maguire wrote via email.

Big Data's Human Error Problem | Big Data

Webinar By Me on Tuesday 25 June

Topic: How Data Modelers Can Save Their Jobs in NoSQL Environments.  Sponsored by DataStax.

Data modeling emerged in the 1970’s in response to the needs of database designers. This accident of history has influenced perceptions and practices of data modeling in harmful ways. Most notably, business-focused requirements analysis has been wrongly commingled with relational modeling. Compounding the problem, vendors have produced data-modeling tools that blur the important distinction between the client’s problem and the technologist’s solution.

Enter NoSQL, with its promise of liberating practitioners from the tiresome burden of designing relational databases. The chance to dispense with relational modeling was embraced enthusiastically, but for many organizations, it has meant discarding the only rigorous activity that had any hope of formally expressing the client’s data needs. This is a textbook case of throwing out the baby with the bathwater. This presentation shows you how to save the baby, and your career as a data modeler.

Data Modelers Still Have Jobs: Adjusting For the NoSQL Environment | DataStax

Sunday, June 9, 2013

Speaking at #Cassandra13 Summit This Week: How Data Modelers Can Save Their Jobs

Here’s the blurb from the agenda:

Data Modelers Still Have Jobs: Adjusting For the NoSQL Environment

Speaker: Joe Maguire, Founder at Data Quality Strategies, LLC
Using concrete, real-world examples, the presenter will show the following: How abandoning modeling altogether is a recipe for disaster, even in—or especially in—NoSQL environments; How experienced relational modelers can leverage their skills for NoSQL projects; How the NoSQL context both simplifies and complicates the modeling endeavor.How lessons learned modeling for NoSQL projects can make you a more effective modeler for any kind of project.

Thursday, June 6, 2013

In Bulger Trial, It’s the Defense vs. the Media | WGBH News

Gamesmanship in Information Control

The defense’s witness list includes Boston Globe reporters Kevin Cullen and Shelley Murphy, former Globe staffers Gerry O’Neill and Dick Lehrer, and Boston Herald columnist Howie Carr, all of whom have written books about Bulger. Bulger’s attorney, J.W. Carney, say the journalists have interviewed other potential witnesses likely to be called by the prosecution – and could challenge their accounts if they distort or exaggerate in an attempt to aide the government’s case.

If they stay on the witness list, though, the aforementioned journalists will be barred from sitting in the courtroom, and won’t be able to report on the trial. According to Murphy – who recently co-authored a book on the Bulger saga with Cullen – that’s what Bulger really wants.

In Bulger Trial, It’s the Defense vs. the Media | WGBH News

Wednesday, May 15, 2013

New Yorker reveals Aaron Swartz-inspired system to protect sources - FT.com

Security by obscurity and encryption.

Four decades after Deep Throat met Bob Woodward in a Washington parking garage, news organisations are scrambling to find ways to protect their confidential sources in the digital age as they push back against government attempts to identify whistleblowers.

On Wednesday, the New Yorker unveiled a nine-step process for sources to send documents and messages to the Condé Nast-owned magazine, saying the system could offer them “a reasonable degree of anonymity”. Called Strongbox, it involves the use of multiple computers, thumb drives, encryption codes and secure networks

New Yorker reveals Aaron Swartz-inspired system to protect sources - FT.com

Tuesday, May 14, 2013

Redacted Text Message Memo - NYTimes.com

No Comment

The American Civil Liberties Union wanted insight into the Obama administration’s policy on intercepting text messages. So it filled out it submitted a Freedom of Information Act request. The Justice Department complied with the law by releasing 15 pages—but these were entirely censored.  Every single word except the subject of the memo was shaded over in black.

Redacted Text Message Memo - NYTimes.com

Monday, May 13, 2013

Backsliding on the 'death panels' myth : Columbia Journalism Review

Journalists who want to be appreciated for the fair-mindedness will bend over backwards to cover both sides of a story, even if one side is delusional.  This post from CJR calls it “He said, She said” reporting.  For previous posts on this topic, see here and here.

Unfortunately, the board is best known as the current vehicle for the false claim that Obama’s health care plan would create “death panels,” which spread widely after Sarah Palin’s August 2009 Facebook post coining the term. As a result, journalists face a conundrum. The pervasiveness of the myth is part of the reason the partisan dispute over IPAB appointments is now newsworthy—but as I warned back in January, credulous coverage has the potential to reinforce the misperception.

It’s important for journalists to adopt best practices in reporting on myths like “death panels” rather than backsliding into the “he said,” “she said” style-reporting that was frequently observed during the initial “death panels” controversy. Though IPAB’s cost-cutting process has been delayed for at least a year, the demagoguery surrounding health care cost reduction strategies isn’t likely to go away any time soon.

Backsliding on the 'death panels' myth : Columbia Journalism Review

Wednesday, May 8, 2013

Language Log » What use electrolytic pickling?

The source includes some real doozies, including “Repulsive Tory” for “Repository.”

My point here is not at all to make fun of Google's speech recognition capabilities. I've long been a staunch defender of current ASR technology in general, and Google's implementation of it in particular. And in fact, the overall quality of the Kagan transcripts is very good — there are stretches where nearly all the words are correct.

Still, errors of the kind illustrated above indicate some of the… shall we say, areas for potential improvement.

There are some cases where the transcript is a plausible rendering of the pronunciation, but is not very plausible as English-language content, e.g. "the searched ford general …" in place of "the search for general …", or "like adams molecule selves and tissues" for "like atoms molecules cells and tissues". I'm surprised that the recognizer's n-gram language model, which is contemporary ASR's approximation to what makes sense, made these choices. And there are a few things that are just bizarre, where I can only imagine that some obscure bug has short-circuited Google's language model entirely:  "became evermore phadke" for "became ever more vague", or "tutsi you can write" in place of "to each new generation".

Language Log » What use electrolytic pickling?

Monday, April 29, 2013

About | Undercover Reporting

A case study in information responsibility: The site collects some the best undercover reporting alongside some of the most ethically bankrupt. It also cites some of the ensuing discussions and outcomes, which run the gamut from Pulitzer prizes to story retractions and reporter terminations.  Also see this previous post on a related topic.

This collaboration with NYU Libraries collects many decades of high-impact, sometimes controversial, mostly U.S.-generated journalism that used undercover techniques. It grows out of the research for Undercover Reporting: The Truth About Deception (Northwestern University Press, 2012), which argues that much of the valuable journalism since before the U.S. Civil War has emerged from investigations that employed subterfuge to expose wrong. It asserts that undercover work, though sometimes criticized as deceptive or unethical, embodies a central tenet of good reporting--to extract significant information or expose hard-to-penetrate institutions or social situations that deserve the public's attention. The site, designed as a resource for scholars, student researchers and journalists, collects some of the best investigative work going back almost two centuries.

The material has been gathered into clusters, highlighting award-winning series, exemplary proponents of the practice or recurring themes (such as prison infiltrations, shadowing migrants, work, and gender, class or ethnic impersonation and dozens more.) Included are not only examples of the most outstanding work but also the most serious lapses. There are examples of controversies over the practice, such as those generated by hidden camera investigations, and of the scholarly, legal and journalistic debates that followed. Many excellent digital collections still cover only recent decades so retrieval of much of this material has been difficult, much of it still accessible solely on microfilm.

About | Undercover Reporting

Friday, April 12, 2013

Documents at Anti-Aging Clinic Up for Sale in Doping Case - NYTimes.com

A lurid case study in the information arms race.

Major League Baseball’s investigation of an anti-aging clinic linked to performance-enhancing drugs has taken a new turn, with the commissioner’s office paying a former employee of the facility for documents related to the case. At the same time, two people briefed on the matter said, at least one player linked to the clinic has purchased documents from a former clinic employee in order to destroy them.

The unusual battle, according to the two people, also appears to involve efforts by other players tied to the clinic to buy potentially incriminating documents and keep them out of the hands of baseball’s investigators.

One of the two people said that, in part, baseball, which has no subpoena power, felt compelled to pay money for documents because its officials had been concerned that more than one player was trying to do the same.

Documents at Anti-Aging Clinic Up for Sale in Doping Case - NYTimes.com

Thursday, April 11, 2013

Happiness, Beyond the Data - NYTimes.com

This one might not make the team cuts when I moderate the panel “Human Factors in Data Quality” at the 2013 MIT Chief Data Officer & Information Quality Symposium this July, but it’s worth noting the limits—and the overall skepticism about—quantitative studies on human phenomena.

Happiness studies are booming in the social sciences, and governments are moving toward quantitative measures of a nation’s overall happiness, meant to supplement traditional measures of wealth and productivity. The resulting studies have a high noise-to-signal ratio, but we can expect that work with an aura of scientific rigor on something as important as happiness is going to be taken seriously. Still, our first-person experience and reflection can catch crucial truths about happiness that escape the quantitative net.

Happiness, Beyond the Data - NYTimes.com

Monday, April 8, 2013

For Scientists, an Exploding World of Pseudo-Academia - NYTimes.com

I’ll surely be mentioning this when I moderate the panel “Human Factors in Data Quality” at the 2013 MIT Chief Data Officer & Information Quality Symposium this July.

But some researchers are now raising the alarm about what they see as the proliferation of online journals that will print seemingly anything for a fee. They warn that nonexperts doing online research will have trouble distinguishing credible research from junk. “Most people don’t know the journal universe,” Dr. Goodman said. “They will not know from a journal’s title if it is for real or not.”

For Scientists, an Exploding World of Pseudo-Academia - NYTimes.com

Tuesday, February 26, 2013

Big Data – Relational Opens Its Mouth – Is It Going To Consume Hadoop? | Mike Ferguson's Blog

It’s worth following the link below to read the entire post, about how Hadoop, a linchpin of many NoSQL products and attitudes, now must coexist with SQL, and some of the most Hadoop-centric vendors are moving aggressively in this direction.  Here, I’ll just call attention to what sounds like an actual repository.  The pendulum swings back.

The point about this is that if you want to comnect to a mix of NoSQL DBMSs, Hadoop and Analytical RDBMSs as well as Data Warehouses, On-Line Transaction Processing Systems and other data then you very quickly start to need the ability to know where the data is in underlying systems.  A global catalog is needed so that software knows that it needs to invoke underlying MapReduce jobs to get at Data in Hadoop HDFS, or that it accesses it directly by bypassing MapReduce via Impala for example. 

Big Data – Relational Opens Its Mouth – Is It Going To Consume Hadoop? | Mike Ferguson's Blog

Sunday, February 24, 2013

Obamacare: round two : Columbia Journalism Review

An appeal for information responsibility in journalism.

The Affordable Care Act, a.k.a Obamacare, is the law of the land, and the re-election of the president ensures that its far-reaching provisions will take effect as scheduled in 2014. What does that mean for journalists? It presents an opportunity—and an obligation—to deliver the clear and thorough reporting that was missing during the debate on the Act itself.

The politicians were not helpful then. Republicans demonized Obamacare, essentially a plan hatched in conservative think tanks and road-tested by Mitt Romney in Massachusetts. The Democrats hid the football, de-emphasizing the mechanics of the law in an effort to minimize the risk that the requirement to have health insurance or face penalties might not win votes.

In the end, the media covered the law’s long, slow passage like a sporting event, without much explanation of how it would shape people’s lives. Here comes another chance.

Obamacare: round two : Columbia Journalism Review