Sunday, October 05, 2003

Taxonomies: So what?

How do we orient ourselves in a large set of documents? It seems that there are a number of ways: subject hierarchies, classification schemas, and meta- data. I’m intrigued by how the user is supposed to interact with these systems and if the systems are ultimately effective for what they are supposed to do. One of the main principles of the Dublin Core initiatives, for example, is the “dumb-down principle” (Hillmann, 2003). At what point, however, do our systems become too dumb?

In inspecting the documentation for the Dublin Core Metadata Initiative, we learn that the initiative has several goals: simplicity of creation and maintenance, commonly understood semantics, international scope, and extensibility. Furthermore, the Dublin Core is to act as a “pidgin” to enable communication (Hillmann, 2003). As identified by the sociologist Peter Galison (1999), pidgins and Creoles have their limitations: they can’t communicate advanced concepts and are constructed for an express purpose to broker communication between two specific epistemic cultures. The Dublin Core, however, has been constructed to act as a global pidgin—a digital Esperanto. I wonder how effective it can possibly be.

As meta-data, the Dublin Core expressly applies to specific documents. What exactly is a document? After reviewing the work of the early documentalists like Otlet and Briet, Michael Buckland maintains that a document could be an Antelope, provided it was in a zoo (Buckland, 1997). Does the Dublin Core include and Resource Type of “Antelope”? Implicit in this document-centric view of the Dublin Core is the notion of literary warrant i.e., the resulting classification scheme is based on the collection represented. Both Dewey and the Library of Congress classification schemes are based on a particular collection (OCLC’s WorldCat and the Library of Congress respectively) (Svenonius, 2000). Literary warrant, however, has been expressly indicted as nonsense in the construction of taxonomies. On its list of taxonomy myths, the Montague Institute claims:

“Myth #6: A corporate taxonomy should be derived solely from the content in a repository.” (Montague Institute, 2002)

So what should we do? I find it telling that despite the Montague Institute’s recommendations for the construction of taxonomies, their own index provides a post-coordinated fall back: “Inktomi” [sic] (Montague Institute, 200?). In addition, descriptions of taxonomy construction make the whole process seem contrived and artificial (see Johnston, 2003 for an example). They make me wonder if the resulting classification is really any more useful than an alphabetical listing!


Svenonius, E. (2000). Subject Languages: Introduction, Vocabulary Selection, and Classification. In The Intellectual Foundation of Information Organization. Cambridge: MIT Press.


