Thursday, January 08, 2004


Len.Kirby@pwNOSPA&& ("Len Kirby") wrote in message

> I need some advice on taxonomy development for a website knowledgebase.
> My Google-research taught me that a taxonomy is a classification scheme
> containing vocabulary and navigation to make content easier to find. It
> will also help guide content development because 'holes' in the content
> will have to be filled.


Taxonomy construction is a challenge. The best people to talk to are often librarians. Library Science seems to be the only field that really takes a hard look at the practical construction and application of taxonomies.

If you're interested in the topic, I can recommend some background reading...

Elaine Svenonius has written one of the best books describing the construction of organization systems. Although her work focusses largely on library classification, the underlying rigour is very valuable:

Svenonius, E. (2000). The Intellectual Foundation of Information Organization. MIT Press: Cambridge MA.

We often think of taxonomies as rigid hierarchies like the genus-species relationships of biology. Unfortunately, for a number of reasons (i.e., no evolutionary process) these types of classification structures are often inappropriate for intellectual assets. Instead of a tree, we sometimes need a field of bushes. These bushes have to represent the underlying syntax and semantics of the related concepts. Thesauri are very valuable in this regard.

There is an ISO code related to the construction of thesauri (the ISO-2788 standard for monolingual thesauri)which is worth a read. It's not a quick read... but this isn't an easy topic!

There are also a number of preassembled thesauri e.g., the WAND Environmental Thesaurus, or Gale's Energy Thesaurus.

You may also be interested in the "faceted-classification" group over at yahoo. The archives contain links to a number of interesting papers:

Finally, you may want to look at a rough paper that I wrote about business taxonomies. It's more of a "what" than a "how to" but you may find it useful:


Wednesday, January 07, 2004

Random Thoughts

I have some notes in my agenda. Before I can turn the page in good conscience I have to record them somewhere where I can find them...

Railroads vs. Cathedrals

Having recently written a paper about cathedrals and documentary practices, big projects are still on my mind. Cathedrals were perhaps the largest construction projects of the mediaeval era. They required intense planning, commitment, and labour. The modern era, however, was established by a different technology: railroads. It would be interesting to compare these two types of projects. Certain similarities are self-evident in that they're both huge projects involving lots of labour and administration. The differences are just as evident. Cathedrals were constructed in one location while railroads were constructed across a massive geographic territory. Both types of projects also generated different types of innovations. Cathedrals generated organized labour and guilds. Railroads developed accounting systems, venture capital, and distributed bureacracy.

Social Construction of DSL

I've been reading up on the Social Construction of Technology (SCOT; Bijker, Hughes, & Pinch 1987). An important consideration of SCOT is that technolgy is determined by a number of factors including the social context of innovations. This context is determined by various actors--not just the inventor or engineer--and each actor contributes to the technology. While I was reading up on this topic while waiting in a friend's office, someone came in and exclaimed: "The network's down again! I get more done at home on my DSL line!" So what exactly does DSL (and broadband) represent and how are we using it. It's obvious that working from home because of the reliability of DSL is a somehow more significant interpretation of the technology than just streaming music or downloading broadband videos.

Open Source and Cathedral Building

The difference between commercial and open source software development has been famously compared to the difference between a cathedral and a bazaar (CatB). In some ways, however, I find the cathedral to be a very good description of the open source movement--not the institution represented by the cathedral but rather the process of building it. Cathedrals were built largely without central plans and much of the construction process was grounded in social practice rather than in formalized documentation.

Enough thoughts. Back to work.


Bijker, W. E., Hughes, T. P., & Pinch, T. J. (1987). The Social construction of technological systems : new directions in the sociology and history of technology. Cambridge, Mass.: MIT Press.
Spam Experiment

I've been exploring various spam control devices. In order to test, I need a honeypot. Here goes...

Spam away!

I'll post results. BTW- I'm placing a few other honeypots over on the discussion groups at brint and on the usenet. We'll see what happens.

Tuesday, January 06, 2004

Reason recently published an interview with Bruce Sterling. He said a number of interesting things although I found one exchange particularly interesting:

reason: It never ceases to amaze me how much material is sort of spontaneously thrown up on the Web.

Sterling: I think that’s an early response. You get this database toxicity. You go into a system like Lexis-Nexis and you put in a search word and get 60,000 hits, and you think, this is all the knowledge there is in the universe. But it’s actually 10,000 references to six different things, and the actual story is something very few people know.

reason: I think there are some positive social changes happening as a result of this spontaneous database building and Web page building. There are more and more of us who reflexively look things up.

Sterling: There is a Google blindness. It’s a kind of common wisdom generator, but it’s not necessarily going to get you to the real story of what’s actually going on.

Do librarians get to the "real story"? Or are we just as Google blind?

Monday, January 05, 2004

Why SF is Over

Spider Robinson recently published an article in the Globe and Mail after attending the SF conference Torcon 3. His comments have received a lot of commentary among the various SF blogs and lists. I can see why:

"Incredibly, young people no longer find the real future exciting. They no longer find science admirable. They no longer instinctively lust to go to space."

While I'm unsure why youth have lost their yurn for space, I suspect it may have something to do with the instantaneous nature of the modern cool market. There's no room for arbitrage. Why dream when everything in the world is only a key-stroke away?

As Timothy Taylor notes, "cool is over." Just like SF.
Google and Glass-Steagall

A number of my colleagues are travelling to San Diego for a conference. One of their presentations concerns Google as a democratic power. Due to the voting nature of PageRank Google seems to be a completely democratic IR technology. My colleagues, however, have some concerns.

I'm not necessarily supporting or disagreeing with them. I just wonder how we could possibly intervene with IR technologies. Indeed, most discussion of IR technologies (e.g., TREC) concern what the technology could do rather than what it should do.

Given my technology bent, I'm keen to argue that IR engines themselves have no motives and are just reflective of the underlying corpus. My librarian side, however, is senstive to any suggestion of possibly editing or censoring the collection! Perhaps a larger intervention is in order.

Many have argued that information is now the life-blood of our economy--perhaps explaining my colleagues' suspicions of Google since it is the primary conveyor of information. Information has perhaps replaced money as the key motivating economic factor. While the federal government hasn't intervened with information institutions, it has certainly restrained the activities of banks. The famous Glass-Steagall Act of 1933 separated retail banking from commercial enterprises such as issueing bond or debentures.

As information replaces money, perhaps Google will face similar legislation.

Sunday, January 04, 2004

The Economics of Farm Subsidies and IP Laws

Like many others, I read Wired. I have to applaud two recent additions: the short columns by SF author Bruce Sterling and Professor Lawrence Lessig.

In Lessig's most recent piece he discussed the nature of United States farm subsidies and the harm that they cause for food producers in the developing world. Lessig went on to suggest that these nations adopt loose copyright law and become the sort of IP robber nation that the United States became with the passing of the US Copyright Act in 1790.

It's a great idea, but are the economics there?

As indicated on his blog, Lessig incorrectly identified the level of subsidy as $300-billion annually. A more accurate figure is provided by the Environmental Working Group. The top 20 subsidy programs in 2002 totalled an impressive US$11.7-billion. Does the potential IP theft of the developing world compare to such a number?

For a possible comparison, we can explore the IP policies of several countries involved in a recent shrimp fishery issue. It seems that US fisherman are asking Washington to place tariffs on products from Thailand, China, Vietnam, India, Ecuador, and Brazil. Could these countries protest with lax IP law? Maybe they could introduce robber nation licensing rules for software?

Here are some numbers from the Economist Intelligence Unit's 2002 Country-by-Country reports:

-level of software piracy: 77%
-Retail losses from software piracy: US$41.1bn (!!! Seems high. Total foreign direct investment for 2000 was US$3.36bn. Assume million rather than billion)
-level of software piracy: 92%
-Retail losses from software piracy: US$1.66bn
-level of software piracy: 94%
-Retail losses from software piracy: US$32.24m
-level of software piracy: 70%
-Retail losses from software piracy: US$365.31m
-level of software piracy: 62%
-Retail losses from software piracy: US$8.48m
-level of software piracy: 55%
-Retail losses from software piracy: US$396bn (!!! Seems high. Total foreign direct investment for 2000 was US$32.7bn. Assume million rather than billion)

For comparison, here are Canada's numbers:

-level of software piracy: 39%
-Retail losses from software piracy: US$306.5m

Based on these numbers, it seems that these developing countries are already IP robber nations. Furthermore, if these countries somehow managed to improve their levels of software piracy to Canada's and then used their new clout to bargain with the US, the total retail loss is still only about US$1.3-billion. And US$1.3-billion is a lot smaller than the total farm subsidy of US$11.7-billion!