Friday, October 08, 2004

Thoughts on body memory and cognitive authority

Experience is important for the development of cognitive authority. Karin Knorr-Cetina talks about the need for scientists to see particular events with their own eyes or to actually place their hands on the apparatus. Charles Boyle apparently set up the vacuum pump as a roving laboratory to give others this same kind of experience and practical authority (see Frohmann's recent work in Library Trends).

There are a few recent events that demostrate this idea. The first involves Pierre Pettigrew's recent trip to Haiti. Pettigrew is the Canadian Foreign Affairs Minister and he traveled to Haiti to "see for himself" the effects of Hurricane Jeane.

The second was recently documented in the NYT. Barstow, Broad, and Gerth ("How the White House Embraced Disputed Arms Intelligence") describe how the US Government made the decision to invade Iraq based on an opinion regarding the use of some--well, lots of--aluminum tubes. A CIA analyst ("Joe") defended the position that the tubes were being used in centrifuges to refine uranium for WMD. Most experts disagreed. The analyst, however, had been a mechanical engineer who had worked at Oak Ridge, Tenn. and had actually operated centrifuges. Even though he was a junior analyst, his body authority--combined, of course, with some fervent political will--led the government to disregard the opinions of world recognized experts who had not actually operated centrifuges.

Thursday, October 07, 2004

I Feel Great. And informed.

"Informed," said Derek.

"Informed?"

"Informed and content."

As part of the batterer intervention program, we all have to check out and tell the group how we're feeling. The oldtimers--more than six weeks--seem to quickly catch on that "informed" and "content" are two completely acceptable yet semantically empty words. The words aren't even listed on page twenty seven of their manual; page twenty-seven is a cheat sheet of emotion words. To some, this treatise of a single page is far greater than the great books of faith or knowledge. To others, it's just page twenty seven of the "fucking manual" for the course that they're court mandated to attend.

Informed.

So what, exactly, does informed mean? The guys in group ably demonstrate how the word conveys very litte. I suppose that "informed" means having recieved information. Do the guys mean that they have recieved information as the faithful recieve communion? "Take this and eat. This is the flesh and blood of literary genre consecrated in the sanctity of Saint Guttenberg." I doubt it. The OED, however, suggests that an early--and awkward--meaning of informed was: "of faith." More recent years has seen the meaning shift to "Instructed; having knowledge of or acquaintance with facts; educated, enlightened, intelligent."

Informed: acquaintance with facts.

"Hello. My name is George and this is my third week. I'm not yet familiar with group norms but I have met this charming fellow beside me: Mr. Facts. I am now informed... and content."

So informed is a throwaway term. Perhaps it's the state of having "knowledge" or "information". Are these terms similarly bankrupt? It would be a shame to demonstrate the futility of defining information. It seems that the favourite sport of certain colleagues is tilting against this inestimable (and perhaps wind driven) foe by asking the question: "What is information?"

It's nothing.

"This was my fourth week and I'll be back next week."

"How are you feeling."

"Informed. And loved."

Loved? So what does love mean? The OED coughs up a fairly lucid description: "In senses of the vb.; in attributive use now chiefly poet., exc. with prefixed adv. as dearly-loved, much-loved; ordinarily superseded by BELOVED." Huh? How about this one: "In royal and feudal documents, prefixed to personal names or designations; equivalent to the ‘trusty and well-beloved’ of English charters. Often with plural ending. " I'm quite certain that if asked, the guys in group would be unable to produce either of these definitions. Yet, we are all quite clear on having some idea about what love is. And we're not poets.

Perhaps we can argue that our understanding of information is fundamentally different from our understanding of love. They're both the stuff of emotion but whereas we eschew defining love we strive to break information into quanta and render it to our devices. A critic may say that this quantified perspective is a product of our post-modern times, a result of our obsession with cybernetics and information theory. In the olden days (when?!), we focussed on continuity rather than bits. Shakespeare, however, seemed to use a fairly quantitative focus as he struggled to define love:

"Love goes toward love, as schoolboys from their books; / But love from love, toward school with heavy looks. / " [Romeo and Juliet (1595) act 2, sc. 2, l. 156]

[Ed. Upon reflection, this doesn't seem nearly as quantitative as I first read it to be. Too late to rewrite!]

I've never heard this description in group. Like I said, we're not poets.

"I feel great."

Great. So what does great mean? Absolutely nothing and we have no problem identifying that it means nothing. It's just a filler description shaped by the language games of conversational interchange. Instead we look to the activities behind the person's identification of a great feeling or we examine their thoughts or--if sufficiently suspicious--we wonder about what chemicals they are on. We ponder at what is resulted in their state of "great". We do not, however, attempt to define the quanta of great that infuse their being!

Instead of trying to define information we should understand the state of being informed: it's processes and its results.

I'm not sure that this piece of writing has taken me where I wanted it to. Close enough.

Tuesday, October 05, 2004

Genre

In this discussion of artefacts and communication, a particular image has come to my mind. I imagine a group of actors sitting around a poker table and dealing out various material based boundary objects enriched with various markings. Each of the actors has their own set of biases, beliefs, and desires, and each wants to enlist the other players as allies to support their own particular set of inscriptions. This image is quite right. There are other forces acting on the communication process that just the whims and passions of our poker players. The actors, the table, and even the smoky and dimly lit room housing our card game is contained within a particular social environment. The card players aren’t just rational and maximizing examples of homo economicus, they are actors situated within a web of social relations and power structures.

Whether one agrees with Marx or not, it’s hard to argue that we don’t live in a socially structured world and that the ways in which we communicate aren’t shaped by these same social forces. A personal phone call from the CEO, for example, has considerably more meaning than a memo addressed to “all employees” even if the two messages contain exactly the same information.

Applying social relations and power structures at this stage in our journey is a bit tricky. We have a way of doing it—genres—but we’re still lacking a reason. Perhaps Foucault can give us some guidance. In his treatise on discourse (in which he identifies any number of the means of communication that we’ve already touched upon), he remarks:

“It is supposed therefore that everything that is formulated in the discourse was already articulated in the semi-silence that precedes it, which continues to run obstinately beneath it, but which it covers and silences.” (Foucault, 2002 [1972]: 25)

Our materials, inscriptions, and boundary objects are therefore a product of this semi-silence. Determining a model for knowledge artefacts that accounts for this aether requires a level of analysis that goes beyond the products of our poker game: boundary objects, standardized packages, and inscriptions. JoAnne Yates and Wanda Orlikowski formulated one possible approach. Through their historiographic work they articulated “genres” of communication.

"Genre is a literary and rhetorical concept that describes widely recognized types of discourse (e.g., the novel, the sermon). In the context of organizational communication, it may be applied to recognized types of communication (e.g., letters, memoranda, or meetings) characterized by structural, linguistic, and substantive conventions." (Yates & Orlikowski, 1992: 300)

They refine this concept for application within organizational settings:

<> "A genre of organizational communication (e.g., a recommendation letter or a proposal) is a typified communicative action invoked in response to a recurrent situation. The recurrent situation or socially defined need includes the history and nature of established practices, social relations, and communication media within organizations (e.g., a request for a recommendation letter assumes the existence of employment procedures that include the evaluation and documentation of prior performance; a request for a proposal is premised on a system for conducting and supporting research). The resulting genre is characterized by similar substance and form. Substance refers to the social motives, themes, and topics being expressed in the communication… Form refers to the observable and linguistic features of the communication [structural features, communication medium, and language or symbol system]." (Yates & Orlikowski, 1992: 301)

In this context, genres pick up some interesting characteristics beyond their application as standardized boundary objects. Genres are tools that are used for structuration, and as such both create and are created by the social environment. Genres are enacted through particular social rules that govern this arena of structuration. Therefore, the use—or disuse—of particular genres is both an act of communication and a political statement:

"In structurational terms, genres are social institutions that are produced, reproduced, or modified when human agents draw on genre rules to engage in organizational communication." (Yates & Orlikowski, 1992: 305)

An interesting characteristic of genres is evident in Yates and Orlikowski’s biography of the office memo. The transition from private letter to email was created through various technological changes. These changes produced new or variant genres. A memo, for example, isn’t completely different from a letter nor is an email completely distinct from a memo. Each of these genres, however, has slightly different affordances and is used in a slightly different social context. This social context both creates and was created by the genres. This whole process sounds quite complicated and it is quite easy to visualize at as a type of Ouroboros decaying to nothingness. By invoking a temporal dimension this image becomes rather one of a looping spirographic shape.

Inherent in genres is the notion of granularity or abstractness. Some genres may contain fine-grained detail (such as production reports) while others may have very broad messages (such as sermons). The notion of granularity is also inherent in boundary objects but Star and Geissemer failed to fully develop a framework for articulating the concept. With genres, we can see that the structuration of various social environments provides the “semi-silence” that gives meaning to the actual objects. A sermon, for example, may be meaningless to a factory worker alien to the faith and production reports for industrial widgets could be meaningless to the preacher; they are each representatives of different epistemic communities.

One interesting application of genres is the concept of genre taxonomy, which serves to codify and store the various “communicative actions” of organizational members (Yoshioka, Herman, Yates, & Orlikowski, 2001).

"The key difference between a genre and a genre system is that although each has attributes, a genre system additionally has relational attributes that indicate relationships among constituent genres, such as sequence." (Yoshioka et al., 2001: 434)

By articulating the relational attributes between various genres researchers can gain a sense of both the institutional forces of organizations and can create conceptual maps of the “knowledge artefacts” within the organization. This articulation is typically executed by addressing the “5W1H” questions (who, what, why, where, how) related to the genesis and use of the various genres. Of course, worked examples of systems based on genre taxonomies seem quite rare.

Geoffrey Bowker and Susan Leigh Star provide additional insight on genres. In their analysis of the International Classification of Disease (ICD) they note how the ICD works as a genre. They explain that there is a constant tension within the users of the ICD between those who want to standardize and those who want the document to address their local concerns. Given the differing social environments and the different “semi-silence” pervading these two communities, their attempts to communicate using this particular genre is necessarily frustrated since the cycle of structuration is constantly fragmented.

References

Foucault, M. (2002 [1972]). Archaeology of knowledge. New York: Routledge.

Yates, J., & Orlikowski, W. J. (1992). Genres of organizational communication - a structurational approach to studying communication and media. Academy of Management Review, 17(2), 299-326.

Yoshioka, T., Herman, G., Yates, J., & Orlikowski, W. (2001). Genre taxonomy: A knowledge repository of communicative actions. Acm Transactions on Information Systems, 19(4), 431-456.

Moving towards my dissertation...

I've had a number of thoughts regarding my upcoming dissertation. I'll compress these thoughts down to two categories: 1- What the hell are we doing?, and 2- How the hell are we doing it?

What the hell are we doing?

In my own particular discipline there have been a number of recent innovations. It seems that those around me have really taken on a post-modern bent and have been aggressive in pursuing the implications of social structure and language on information seeking. Fine. I feel that we have, however, lost something: the problem. I'm not too sure why we're doing this type of research. To me--perhaps it's my training as an engineer--this process seems like the worst sort of hubris.

I recently came across something that could act as a guide for my own research. Alan Kazdin, a psychology professor at Yale and editor extrordinaire, has a very interesting bio that he appends to the work that he edits. It sums up his reseach interests and focus: understanding child mental disorders and eliminating them. Sounds good. The problem is short and to the point and it avoids the problems inherent in conflating the research question with a research method. So now I just have to find a problem (I think I've got one) and find a way to address it.

How the hell are we doing it?

With the turn to post-modern and language/discourse based approaches to research we seem to have reified these concepts into a sort of totality that both guides and limits our research [Turk says something about this is an odd research paper on architectural and engineering forms where he cites Heidegger...]. It's a compelling argument: since thought is based on language, and thought is a prerequisite for seeking and processing information, therefore language should be the basis for our modus operandi--indeed modus vivendi--of library practice!

There is, however, a whole different way of thinking that is beyond language. In one of Petroski's books he recounts an anecdote of two individuals discussing the shape of a particular object. The shape has no name but one of the parties says something like: "You know the shape of a cranskshaft... that shape!" Okay, I'll admit that the word "crankshaft" invokes the image of the object in question but the ensuing conversation was not necessarily due to the word cranskshaft but rather the individuals' ability to mentally see and manipulate the object. This whole visual aspect of thought seems to be largely ignored in our conception of information seeking. Perhaps we've become blinded by the very text-ness of our primary charges: books.

For engineers, this visual mode of thinking is crucial. Petroski recounts the experiences of a number of Victorian engineers who struggled for weeks with images cascading through their brains. Indeed, the documentation of these images seems to have been an important element in the formation of engineering practice and design. Some important works include the sketchbooks of the early cathedral architects (e.g., Villard de Honnecourt [c.1230-1235]); the Theatrum Machinarum of Besson [1578], Böckler [1661], and Leupold [1724]; and the collections of "mechanical movements" such as Henry T. Brown's famous work of 1871. Even the encyclopedie of Diderot and d'Alembert--or at least the plates--were important contributions to the visual records of engineers. [On two separate occassions I've seen a remarkable early plate depicting the process of pin-making as being crucial for Adam Smith's conception of capitalism. Unfortunately, two plates don't match are from different works!]

I personally suspect that these classical works found their physical realization in the Gallery of Machines at the Great Exposition of 1900 in Paris [Engineers are always keen on--and depend upon--their prototypes as revealed in the work of Bechky and Henderson]. The most famous description of the Gallery is from Henry Adams who remarked (in the 'board's choice' as best nonfiction work in the Modern Library):

"Satisfied that the sequence of men led to nothing and that the sequence of their society could lead no further, while the mere sequence of time was artificial, and the sequence of thought was chaos, he turned at last to the sequence of force; and thus it happened that, after ten years' pursuit, he found himself lying in the Gallery of Machines at the Great Exposition of 1900, his historical neck broken by the sudden irruption of forces totally new."

So where is the visual record of information seeking practices? Where are those design patterns that we can just mix and match? How do we address that visual part of the brain that seems to be so overlooked in our current research? Why do we instantly run to prototypes and physical manifestations of our work without the tools to depict the underlying processes?

Sunday, October 03, 2004

Some Notes for a Friend

Hey Paul:

I've been meaning to write you for quite a while. I've been thinking about how you guys have handled all the hurricanes. My parents place down in Port Charlotte lost its roof and they've been down there trying to find contractors to fix it!

Your problem regarding sales material is an interesting one. It's basically the same one that we faced at i2 and I've had a bit of time to think about it over the past few years.

Basically, your client is looking for a library. Amazon works because it uses a lot of the infrastructure developed for libraries. Early on Bezon recognized just how well structured the descriptive information for books is (e.g., authors, titles, publishers, subject headings, ISBN numbers, etc.) and how easily this information could be exploited online for both e-commerce and recommender systems.

Unfortunately, sales materials just aren't well structured. When we're making PowerPoint slides or writing whitepapers we don't worry about standardized titles or author names. For the KM system at i2, we struggled with some standardizing pretty basic issues such as geographies (EMEA, Americas, etc.), sales territories, products, verticals, etc. Trying to append this sort of information to various documents was really tough! Amazon works because all of this standardization has been done my a government institution: the Library of Congress!

When cataloging materials (books or sales ephemera) to put into an "Information Retrieval" (IR) system (that's librarian talk for accessible database!) you basically have three options. The first option is to use the words in the document for indexing. Most of the major web search engines (e.g., Google) use this type of approach. Amazon's new search engine--A9--applies this approach to the text of books. The second approach is the one traditionally used by librarians and involves the creation of "bibliographic surrogates". Basically, think of cards in an old card catalogue. Each card is a bibliographic surrogate for a particular book. We now do this sort of cataloguing by creating a database with fields for things like author, title, etc. and a pointer to the location of the actual document. We tried to do this type of thing at i2 and it didn't work primarily because we didn't have a librarian devoted to the creation of the records! The third approach involves the use of meta-data embedded in documents. Although we use key words and other meta-information in web pages, structured meta-data is typically a bit more involved. The current standard (supported by a number of office productivity applications) is called the Dublin Core. It was designed to act as a kind of hybrid between the really rigorous cataloging codes used for the creation of formal bibliographic surrogates--they all have very arcane names like the Anglo-American Cataloguing Rules R2 (AACR2) and Machine Readable Cataloguing (MARC)--and the completely unstructured apporach used by commercial IR systems.

Here's the rub. None of these approaches work well. The unstructured approach can work well depending on the technology (think Google) and the redundancy of the collection (think the Web) but can really suck for assisting users in exploring collections with some sort of inherent order (like a collection of sales material). Do you remember the search function on the homepage of the Intranet at i2? It would always cough up incredibly odd documents in response to queries. The problem with the bibliographic surrogate approach is that few organizations have the budget or demeanor to employ a cataloguer... especially a knowledgable one. The problem with the meta-data approach is that few authors are motivated to include meta-data in their documents. A sales guy creates a PowerPoint presentation to sell to a particular prospect and get comped for it. Including meta-data isn't part of this formula!

While your problem is a challenging one, it's not an impossible one. Here's my suggestion: forget about the meta-data approach and forget about the bibliographic surrogate approach. Use a conventional IR tool to index the sales document. Some people say that they need Google for their Intranet but Google probably won't work well because there's not enough redundancy in the collection. Instead you have to inject a bit of structure into the documents. The full-blown meta-data approach won't work but a hybrid one might. Your client could, for example, include a standard page at the back of their PowerPoint and whitepaper templates where authors can fill in limited information such as their name (actually their email address is better since every person has a unique email address), the name of their client, and the name of the products they're selling. If there are standardized product codes even better. Now, at the back of a PowerPoint deck you could have the following lines:

"gg345g#author:george_goodall@yahoo.com"
"gg345g#client:taylor made"
"gg345g#product:supply chain planner"

When the IR tool indexes these documents it will create entries for each of these expressions. The "gg345g#" part is meaningless but serves the purpose of making each entry indentifiable as a controlled field.

So now you've got this whole pile of mixed documents indexed by a standard IR tool (htdig is good open source option). You can then create a user interface that creates the search string people are looking for. If somebody was looking for the documents written by George Goodall on Supply Chain Planner, the interface could build the search string: "'gg345g#author:george_goodall@yahoo.com' 'gg345g#product:supply chain planner'".

Well, that was a very long answer that may or may not have made sense. With the type of structure provided by either controlled data or bibliographic surrogates, you can then do the fancy stuff that Amazon is famous for like clustering items based on similarity.

The one feature you mentioned that I haven't touched on is the whole recommendation feature i.e., "Customers that bought x, also bought..." This type of feature is commonly referred to as "collaborative filtering" and it's a hot topic. The process of collaborative filtering is fairly straight forward if a bit algorithmically complicated. By allowing user to assign ratings to particular documents it exploits the vector space model commonly used by IR systems. This discussion can get boring very quickly so I'm going to give you a few links.

A dated but good collection of links: http://www.cis.upenn.edu/~ungar/CF/
A public domain recommder tool: http://mappa.mundi.net/signals/memes/alkindi.shtml

In short, my suggestion is to use a standard search tool with modified document templates and a fancy front end for constructing search strings. This approach will get you the most bang for your buck. Of course, it all depends on what your client is looking for. Do they want something off the shelf? Do they want to roll their own? What sort of enterprise environment are they running? etc. As for whitepapers and analyst reports... ummm... I guess start with the standard vendors and go from there: seibel, salesforce, netperceptions maybe.

If you have any questions, please give me a shout. I'm home most days since I'm studying for a week long set of exams that start Oct. 24.

I hope this helps.

Cheers,
George