August 07, 2005

Clay Shirky's Viewpoints are Overrated

So, I finally got around to reading Clay Shirky's Ontology is Overrated essay. I'd been avoiding it for months, knowing I was going to want to take some time with it, and that I was going to want to respond.

Clay has assumed the role of an ideologue. He says enough that is obviously true to keep you nodding, and then slips in bold statements predicated on no actual facts. He tells people what they want to hear, setting up a false dichotomy between some mythical group of elite ontologists and the rag-tag uprising of mass categorization.

Long ago, Gene did an admirable job of poking at Clay's ideological bent. He commented that he was not concerned with the technical errors and omissions, and thought he might get to them in later posts. He hasn't yet, so I'm going to take a stab. Because I think it's important to show that the emperor has no hair... er, I mean, clothes.

Tags as Identity, Tags as Attribute

Clay has a tendency to use examples of tags-as-identity. So, he dismisses the value of the thesaurus, saying that you don't want to connect terms like "cinema," "film," and "movie," because "The movie people don't want to hang out with the cinema people."

OK. But, let's say I'm a scientist. Doing research on Avian Flu. And I go to Connotea, "free online reference management service for scientists". If I look in "Avian Flu," I will actually miss a vast number of articles of potential interest. Because, as this list shows, people are using a variety of terms for what they undoubtedly would consider the same thing:


Tags are rarely a matter of identity. Of the "cinema" people against the "movie" people. Of the "queer", "gay", "homosexual". Yes, that does happen occasionally, and yes, in those few instances, you shouldn't assume synonymity. But if I'm trying to understand the breadth of issues around the avian flu, you *better* point me to all the pertinent resources.

Classification Comes In More Than One Flavor

One of Clay's greatest fallacies is his conflation of hierarchy, general classification, library classification, and the-book-on-the-shelf. In poking fun at the Library of Congress' outdated categorization schemes, he uses the following example:
D: History (general)
DA: Great Britain
DB: Austria
DC: France
DD: Germany
DE: Mediterranean
DF: Greece
DG: Italy
DH: Low Countries
DJ: Netherlands
DK: Former Soviet Union
DL: Scandinavia
DP: Iberian Peninsula
DQ: Switzerland
DR: Balkan Peninsula
DS: Asia
DT: Africa
DU: Oceania
DX: Gypsies

Isn't it funny that "Greece" is considered to be at the same level as all of "Asia" and "Africa"?! Ha ha!

The problem is, the top-level categorization scheme actually means very little in actual use of the Library of Congress' classifications. What does matter is something that Clay only gives a throwaway comment to much later on. When he discusses symbolic links on Yahoo (where they can place "Books and Literature" in Entertainment though it primarily "belongs" in Humanities), he gives this aside: "The Library of Congress has something similar in its second-order categorization -- "This book is mainly about the Balkans, but it's also about art, or it's mainly about art, but it's also about the Balkans." Most hierarchical attempts to subdivide the world use some system like this."

Actually, the "second-order categorization" he's referring to are the LOC's Subject Headings. Which, in our digital world, are actually what people *use* when trying to find books. So, if I'm doing research on the history of environmental degradation caused by the development of the city of San Francisco, I don't need to figure out some single primary concept ("history," "environment", "san francisco") and hope for the best. As this listing of Gray Brechin's "Imperial San Francisco" demonstrates, I could find this book through any number of subjects...


So, yes, while books have One True Call Number to determine where it is placed on the shelf, they're also rife with metadata (author, title, subject) that allows us to uncover the book through a variety of means.

And Clay does classifiers a big disservice by suggesting they all assume The Shelf, which in turn suggests they all assume hierarchy. In doing so, he neglects faceted classification, which recognized long ago that there is no shelf. ("There is no spoon.")

Okay, I *will* talk about ideology

Clay's whole argument predicates a black-and-white distinction between evil hierarchy on one side and good tags on the other... And while Clay is right to question hierarchy, and, particularly, Yahoo's less-than-optimal use of it, he neglects to distinguish truly useful forms of professionally-created classification and categorization, which undermines his argument. (He continues to set tags against folders-and-hierarchies, as if there are no other ways of classifying information. Sigh.)

Where Clay demonstrates that his is a cause of ideology, not reason, is here:

"The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize. They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus."
Has he ever talked to a cataloguer? This statement suggests not. He sets up cataloguers as some faceless elite trying to enforce their will on the world. And he then makes a series of claims ("underestimate" this, "overestimate" that) that have no evidence whatsoever. They are convenient hypotheses, but nothing more.

And this ideology leads to this utterly nonsensical claim:

"With a multiplicity of points of view the question isn't "Is everyone tagging any given link 'correctly'", but rather "Is anyone tagging it the way I do?" As long as at least one other person tags something they way you would, you'll find it -- using a thesaurus to force everyone's tags into tighter synchrony would actually worsen the noise you'll get with your signal. If there is no shelf, then even imagining that there is one right way to organize things is an error.

If all I'm doing is trying to find people who tag things the way I do, my exposure to the world of information is going to be awfully awfully constrained. If I'm a scientist, and I tag an article "bird flu," well, yes, I might find all the other articles labelled "bird flu," but I won't find any labelled "avian flu." In this case, a thesaurus (well, a synonym ring, but no mind) will increase the quality of the signal. And, contrary to Clay's coda in that claim, you can utilize thesauri and not believe there is one right way to organize things. In fact, a strong, robust thesaurus works PRECISELY BECAUSE there is not one right way to organize things.

Where I compare Clay to Jakob Nielsen, and yes, irony intended

Clay has pretty much decided to be to tagging what Jakob Nielsen is to usability. Vocal, bombastic, attention-getting, and frequently specious. Read his words carefully, because while his rhetoric might induce a lot of head-nodding, his arguments have a tendency to fall apart.

Look. I love tags. I love classifications. (I pretty much loathe hierarchy). All of these things will be made better when they work in concert. Not when they're set apart.

But Wait, There's More!

And hey, just for reading this far, here are two other places where Clay is demonstrably, well, if not wrong, misguided. In his discussion of Dresden and East Germany, he states, "It is much easier for a country to disappear than for a city to disappear, so when you're saying that the small thing is contained by the large thing, you're actually mixing radically different kinds of entities." Um. The former cities of Venice, Malibu, Hollywood, Brooklyn and others that have been swallowed up by neighboring growing cities might beg to differ. Countries and cities are similarly fictions (or not). Frankly, I don't know why he brings up this "example" in the first place.

The other is in this passage: "Let's say I need every Web page with the word "obstreperous" and "Minnesota" in it. You can't ask a cataloguer in advance to say "Well, that's going to be a useful category, we should encode that in advance." Instead, what the cataloguer is going to say is, "Obstreperous plus Minnesota! Forget it, we're not going to optimize for one-offs like that."" First we have to set aside the fact that Clay is now talking about free-text search, and not tagging. But, let's say he is talking about tagging. The system he's discussing already exists. It's called "postcoordinate indexing," and I mentioned it in a prior folksonomy post of mine.

I guess that's another thing that's really bugging me. Clay acting as if he's discovered unchartered territory, when, really, it's been well-trod upon for awhile.

I leave you with this. When considering purchasing an alarm system for my house, I Googled "home security." The amount of noise in those results is startling, because "home" and "security" can mean so many different things. However, using Yahoo!s Directory, I can find all manner of highly relevant items.

Thank you for writing this. I wish I had written it months ago, instead of having it out with Clay on delicious-discuss[1]. Half-truths and groundless sloganeering are irritating coming from anybody, but when they come from someone as widely read and respected as Clay, it is really frustrating. I can't you tell how sick I am of hearing lines from "Ontology is Overrated" quoted to me in meetings...


Posted by: Ryan Shaw [TypeKey Profile Page] at August 7, 2005 09:49 PM

For me, an important point that Clay makes is about the cost and scalability of professional classification compared with amateur tagging.

He makes a prediction:

"if you can find any way to create value from combining myriad amateur classifications over time, they will come to be more valuable than professional categorization schemes, particularly with regards to robustness and cost of creation."

Time will tell if his prediction will come true. He offers compelling analogies to illustrate why he thinks it will come true.

Tagging hasn't been around very long. Lots of people seem to find it useful. I can imagine statistical techniques being used to address some of the issues you raised and I expect to see them in the near future.

In the mean time, we're fortunate that this is a both/and issue. We can add amateur tagging to collections that already have professionally created catalogs. And I'd guess that professional catalogers will be able to hone their craft using tag systems.

I predict that the market for professional catalogers will not go away, but will evolve as a result of tagging.

-Some collections will benefit from the cost of professional classification.

-Some collections will not attract sufficient amateur tagging.

-There are ever increasing quantities of information to keep both groups busy.

Posted by: Michael Shook [TypeKey Profile Page] at August 8, 2005 10:15 AM

"I pretty much loathe hierarchy"

Rule of thumb: If you have used hierarchy to solve a problem, there is a very good chance you have misundertood the problem.

Posted by: Jim [TypeKey Profile Page] at August 8, 2005 03:43 PM

Nice critique. I've found Shirky's writings on this topic to be utterly messed up.

There are many conversations that could be had around information organization, hierarchy, tags, etc. But, framing all that as a kind-of struggle between the evil ontologists on one side and Cory, apple pie, and free tags on the other, has been utterly convoluted.

Posted by: Jay Fienberg [TypeKey Profile Page] at August 8, 2005 06:28 PM

What Clay does is the 'story telling' that is so complemented in many places over the internet. I think it is hard to be scientfically strict and engaging not only for the classification experts but also for a broader audience.

The problem is Clay writing would never get so popular if it was scientifically strict in every aspect.

Posted by: zby [TypeKey Profile Page] at August 9, 2005 01:19 AM

What a wonderful combination of "ad-hominem" and "straw-man" fallacies. I must say that I stopped my reading when arrived at the phrase about "black-and-white distinction between evil hierarchy on one side and good tags on the other". I don't know where in Clay's article you can read something remotely similar to that. Actually, he gives a list of situations where ontology is the best choice.

Posted by: xamevou [TypeKey Profile Page] at August 9, 2005 07:30 AM

Thank you.
Having actually taken a class in library school on subject analysis, the sheer ignorance of what's actually involved in indexing/cataloging both belittling the effort and ignoring what's been learned in decades (centuries) of practice.

Posted by: Lis Riba [TypeKey Profile Page] at August 9, 2005 08:09 AM

This is an interesting critique. Actually, here's another a more thorough critique by Mimi Yin you might find interesting.

It is less concerned with taxonomy and more with interface. But still a lot of interesting distinctions made.

Posted by: rjnagle [TypeKey Profile Page] at August 9, 2005 09:42 AM

In the long run, one failing of having amateur taggers classify content could be cruft because content is dynamic, language is fluid, value rises and falls. This sort of thing had better be automated, and for that, at minimum, one needs to have clustering algorithms. Clay says Ontology fails because ontologies are these brittle, fragile things (he's steadfast to the definition of Ontology as a Universal Essence); but the researchers I know aren't such sticklers for absolutes, and if an ontology has to serve some machine learning goal, even a continuous reclustering or dynamic choice of ontology would be tolerable. On the other hand, there is this marginal area of human activity where new memes are generated faster than anyone or perhaps any algorithm understands, and if the game keeps evolving I guess you could say that manual tagging plus automated indexing is the best one could hope for. In such cases, the question will be what part of the long tail this covers.

Posted by: Ernle [TypeKey Profile Page] at August 9, 2005 11:31 AM

you can utilize thesauri and not believe there is one right way to organize things. In fact, a strong, robust thesaurus works PRECISELY BECAUSE there is not one right way to organize things.

Thanks for this, in particular. I'd never thought of this line of argument, but it's radiantly obvious once it's been pointed out.

Tangentially, another vote for 'ethnoclassifiation' here.

Posted by: Phil [TypeKey Profile Page] at August 10, 2005 03:07 AM

Damn, butterfingers. When I wrote 'ethnoclassifiation' I should of course have written 'ethnoticlassificationism', or as it's sometimes known 'ethnoclassifieromation'. (The older form 'ethnoclassifyingology' is now depremacated.) HTH.

Posted by: Phil [TypeKey Profile Page] at August 10, 2005 03:10 AM

"you can utilize thesauri and not believe there is one right way to organize things. In fact, a strong, robust thesaurus works PRECISELY BECAUSE there is not one right way to organize things."

Perhaps, but you've still not offered a solution to the problem of who gets to specify what the thesaurus contains. It may be that a thesaurus enforces multiple choices, but that set of choices is enforced nonetheless.

Posted by: pathall [TypeKey Profile Page] at August 11, 2005 02:53 AM

RJ: In the long run, one failing of having amateur taggers classify content could be cruft because content is dynamic, language is fluid, value rises and falls.

Alas, those who classify professionally are not exempt from influences of background & culture, personal viewpoints & perceptions, the ravages of time and related changes in taste, preferences & language.

IMHO the TRUE VALUE of tagging is that it captures these variations in NON-professional people, i.e. people like me (and perhaps even you). With a sufficiently large pool of taggers for any particular "object", I can now locate it EVEN if my frame of reference does not match that of the professional. Specificity may suffer, but recall won't.

Posted by: sndrr [TypeKey Profile Page] at August 14, 2005 11:42 AM

What's the big deal? Clay Shirky's article is a manifesto. It exaggerates for effect. So it's less convincing about the utility of tags than about the disutility of strict hierarchies; it was still an interesting article.

Posted by: David Hopwood at December 28, 2005 04:21 PM

