Wednesday, August 31, 2011

The Data Are Always Messy

A case of information responsibility, public-policy division…

“There’s a certain kind of academic that comes to Washington and can’t survive,” [former White House economic adviser Austan] Goolsbee said. “They’re the ones starting each sentence with ‘The economic model says …’ They are prone to silver-bullet-style answers, which demonstrate very sophisticated thinking about the model but very unsophisticated thinking about the real world.” The model may be missing a few things that are found in the real world—not least, the institutional and political obstacles that make some problems silver-bullet-proof. “If you’re going to be an academic who’s involved in the world of policy, you have to be involved in the world that exists,” Goolsbee told me. “I was always a data guy, not a theorist. Theorists can maintain total purity. The data are always messy.”

Devil’s Advocate - Magazine - The Atlantic

Nine Lives,… and Three Deaths (Statistically Speaking)

Sigh… From the New York Times Book Review. A case of information irresponsibility, statistics-and-probability division.

Heller flew 60 bombing missions between May and October 1944, a feat that should have killed him three times over, statistically speaking, since the average personnel loss was 5 percent per mission.

Reality:  A pilot’s chances of surviving 60 such bombing missions is about 4 percent.  Was there no one in the editorial process who perceived the silliness of being killed “three times over, statistically speaking?”

Being realistic:  I don’t expect everyone to be facile with simple statistics and probability.  But I expect writers and editors to know their limitations. If you’re going to write (or publish) a statistical claim, ask someone competent to check it for you.

The Enigma of Joseph Heller - NYTimes.com

Falser Words Were Never Spoken - NYTimes.com

Information irresponsibility, bumper-sticker division:

In a coffee shop not long ago, I [that is, NY Times Op-Ed contributor Brian Morton] saw a mug with an inscription from Henry David Thoreau: “Go confidently in the direction of your dreams! Live the life you’ve imagined.”

At least it said the words were Thoreau’s. But the attribution seemed a bit suspect. Thoreau, after all, was not known for his liberal use of exclamation points. When I got home, I looked up the passage (it’s from “Walden”): “I learned this, at least, by my experiment: that if one advances confidently in the direction of his dreams, and endeavors to live the life which he has imagined, he will meet with a success unexpected in common hours.”

...

Thoreau, Gandhi, Mandela — it’s easy to see why their words and ideas have been massaged into gauzy slogans. They were inspirational figures, dreamers of beautiful dreams. But what goes missing in the slogans is that they were also sober, steely men. Each of them knew that thoroughgoing change, whether personal or social, involves humility and sacrifice, and that the effort to change oneself or the world always exacts a price.

Falser Words Were Never Spoken - NYTimes.com

Thursday, August 11, 2011

The n-th Circle of Hell

Many of the complaints about Google+ Circles can be paraphrased as Nice idea, but too much of a hassle. 

Peter Pachal described the problem in an article at PCMag.com.:

The main problem with Google Circles is that it's tedious. While I agree that most people separate their contacts into various groups in real life, doing so in a social network is a chore. It's one of the reasons we have different social networks (LinkedIn for work, Facebook for friends, etc.). Asking people to do this kind of organizing proactively, on a single network, vastly overestimates the patience of Web users. Sure, some people are very organized and left-brained (like the engineers who created Google+), with spotless inboxes and well-maintained lists of contacts, but my feeling is that the vast majority aren't. And of all the things that have turned people off of Facebook over the years, the lack of focus on friend-organizing tools isn't one of them.

And here’s Andrew Gent (author of the misnamed Incredibly Dull blog) on the same topic:

My second issue is around circles. I understand they sound like a good idea. My personal (and professional) relationships are more complex than Facebook's simplistic friends / non-friends model.So being able to define your relationships in more detail sounds like a positive step.

The problem is, it's far more difficult than it sounds. I have friend friends and I have professional friends. I have professional friends and professional acquaintances. Some work for my old employer; some used to; some never did. Some know I am interested in poetry and video games (among other things); some don't. A few have met my wife; some may not even know I am married.

When I start to break it down, it is not only not binary, it is more complex than even I can describe. Which is what makes Google+'s circles so frustrating. They require too much thinking. This is not a technical issue, per se, but a failure to be able to turn an implicit organic process into an explicit concrete categorization.

The lesson here for data modelers and requirements analysts:  Modeling a phenomenon can be the (comparatively) easy part.  What’s hard is collecting, cleansing, maintaining, and archiving the data that populates the model.

This lesson cannot be taught to modeling novices; it is one of the lessons that modelers-in-training can never get until they have learned the basics—until they have become intermediate modelers.  Upon becoming competent, a freshly minted modeler can be swept away by the power of the technique.  Armed with a new facility with content-neutral data-model shapes, the modeler thinks “Wow, this is powerful—I can model anything!  No matter what the users ask for I can express it on a model!”

Such enthusiasm about newly acquired knowledge or skill is always dangerous. 

In this case, the well-meaning modeler overlooks an ugly truth: every data model will, when implemented, impose a burden on the user community to populate that model with data.  If the model oversimplifies the phenomenon, the users will experience a burden much like what the users of Google+ Circles are reporting.  Being able to express something on a data model is a good thing.  But it is only the beginning.  And often, there is little correlation between how easy it is to develop the model and how easy it will be to populate it with instances.

Thursday, August 4, 2011

Language Log » Xtreme nerdview

Why conceptual modeling will always be needed….

I don't do surveys, so don't ask. I cannot afford a quarter of an hour answering an ill-designed list of questions for you so that your manager can use the scientifically worthless results to make out a case that your service unit is doing a good job. And don't call me on the phone and tell me you're doing some social science research, because I just know there will be a follow-up call trying to sell me carpets or enrol me in a political action committee. However, my colleague Bob Ladd encouraged me to do a survey about the new building in which the School of Philosophy, Psychology and Language Sciences lives its generally happy life at the University of Edinburgh. He told me there would be a treat at the end in terms of what I have dubbed nerdview. And boy, was there a treat.

The defining feature of nerdview is the confusing of the viewpoint of the technical specialist on the inside with that of the general public outside, so that language suited only to the internal/technical perspective gets delivered to an external/layperson audience, resulting in unintelligibility. And here you really see it writ large, to a degree that seems almost moronic. This would scarcely seem plausible in a Dilbert cartoon strip. This is xtreme nerdview.

Language Log » Xtreme nerdview