Thursday, November 06, 2003

Communicating Across the Disadvantage Margin

What does it really mean to be disadvantaged? According to Childers and Post: “It means to be lacking in something that society considers important.” (Childers & Post, 1975). I wonder it there is opportunities within this lacking.

In The Impoverished Life-World of Outsiders, Chatman provides a cogent overview of the issues facing disadvantaged communities: elders, the poor, and the unemployed (Chatman, 1996). In her discussion, Chatman provides a taxonomy of key considerations when studying the disadvantaged: Secrecy, Deception, Situational Relevance, and Risk Taking. Her key finding is that the poor are not a community of insiders who share information among themselves but rather a constellation of outsiders who are often afraid and unwilling to share information. The consideration that I find particularly interesting is Chatman’s focus on “Situational Relevance” and the notion that the value of information and communication is rooted in the practices of a particular community.

This concept of communally established relevance is inherent in many other works. Wenger, for example, talks at length about the processes of community establishment and communication (Wenger, 1998). Even sociological (e.g., Berger & Luckmann, 1989 [1966]) and philosophical (e.g., Wittgenstein, 1958) works have discussed the importance of communal practice in establishing meaning and importance. Menou refers to situational relevance as Paradigmatic Knowledge (Menou, 1995b). Despite Menou’s often Polonius-like double-speak for those studying information exchange in disadvantaged communities (e.g., Menou, 1995a), his taxonomy of paradigmatic knowledge is useful for understanding and exploring paradigmatic knowledge or situational relevance: Informal-Formal, Endogeneous-Exogeneous, Resident-Circulating, Unconscious-Conscious, Acient-Recent, Stable-Changing, and Multiple Purposes-Single Purposes.

In her other work, Chatman has explored the notion of situational relevance that has led to the conclusion that particular people may actively block information. Prisoners, for example, may purposely ignore contact with “the outside” (Chatman, 1999). If individuals are driven by their cultural practices, how can they possibly get new information? Do the disadvantaged have any “weak ties” from whom to get valuable or innovative information? If everyone is on the margin of a community, can there possibly be an early adopter (Holland, 1997) to introduce new innovations?


References

Berger, P. L., & Luckmann, T. (1989 [1966]). The social construction of reality : a treatise in the sociology of knowledge (Anchor Book ed. ed.). Garden City, NY: Anchor Books.
Chatman, E. A. (1996). The impoverished life-world of outsiders. Journal of the American Society for Information Science, 47(3), 193-206.
Chatman, E. A. (1999). A theory of life in the round. Journal of the American Society for Information Science, 50(3), 207-217.
Childers, T., & Post, J. A. (1975). Introduction. In The information-poor in America (pp. 182 p.). Metuchen, N.J.: Scarecrow Press.
Holland, M. (1997). Diffusion of innovation theories and their relevance to understanding the role of librarians when introducing users to networked information. Electronic Library, 15(5), 389-394.
Menou, M. J. (1995a). The Impact of Information .1. Toward a Research Agenda for Its Definition and Measurement. Information Processing & Management, 31(4), 455-477.
Menou, M. J. (1995b). The Impact of Information .2. Concepts of Information and Its Value. Information Processing & Management, 31(4), 479-490.
Wenger, E. (1998). Communities of practice : learning, meaning, and identity. Cambridge, U.K. ; New York, N.Y.: Cambridge University Press.
Wittgenstein, L. (1958). Philosophical investigations (G. E. M. Anscombe, Trans.). New York: Macmillan.

Tuesday, November 04, 2003

Psyche, Simulations, and Citations

As an undergraduate, I built a great number of models. I especially liked geotechnical models. We would construct miniature dams out of clay and sand and then load them to a massive centrifuge to simulate increased time and gravity. The centrifuge made the most fantastic noise: WHOOMP, WHOOMP, WHOOMP. If I close my eyes I can still imagine that noise and feel the vibrations in my feet as all of Ellis Hall rattled so we could determine seepage rates. If only simulation in library science held such promise.

I imagine that the models we could construct for library science would all live in a computer somewhere and we would run endless Monte Carlo simulations to establish something or other. It’s not a bad idea, really. I can already think of an application…

Bibliometrics and citation indexing are quite well established in the LIS discourse (Borgman, 1990). Using tools like the Social Science Citation Index (SSCI) we can zoom back and forth through time and trace the evolution of ideas. SSCI, however, is far from perfect. Since SSCI was originally a print tool, citations are often listed as mysterious acronyms and word variations cause havoc with determining real citations. It seems odd to me that SSCI depends on a dyadic interpretation of the word “citation”. Either an article has been cited or it hasn’t. Why doesn’t ISI incorporate a new citation coefficient? Instead of attempting to parse out and compile entire citations from full text sources, an n-gram matching algorithm could be used to determine co-citation values between 0 and one. This approach could provide another input for vector space modelling and determining best documents.

Would the n-gram approach I’ve described be an improvement over existing systems? User studies would be valuable but a simulation approach may be considerably easier—assuming we followed a rigorous design methodology (Shannon, 1975). Perhaps we could run some simulations based on the TREC document collection and compare our citation matching results to other Information Retrieval approaches.

Similarly we could manipulate our citation coefficient using some sort of algorithm that compares the similarity of documents thereby limiting the use of the “mercy-cite”.

Although my simulation model would be quite entertaining to build (but possibly too challenging for an LIS839 project), it’s still susceptible to some limitations. The most pressing of which are the observations of Zhao and Logan that there are now essentially two worlds of citation: print journal and electronic (Zhao & Logan, 2001). Their observations seem to resonate with Sandstrom’s comments on the localized information foraging patterns of scholars (Sandstrom, 1994).

Fortunately, our simulation could simply ignore the concerns of Zhao and Logan. If we were using the TREC sample, these concerns would be nonexistent. With simulation models, you can always simplify and abstract in order to make the model work… even if it doesn’t necessarily reflect real life.


References

Borgman, C. L. (1990). Scholarly communication and bibliometrics. Newbury Park: Sage Publications.
Sandstrom, P. E. (1994). An optimal foraging approach to information seeking and use. Library Quarterly, 64(4), 414-449.
Shannon, R. E. (1975). Systems simulation : the art and science. Englewood Cliffs, N.J.: Prentice-Hall.
Zhao, D., & Logan, E. (2001). Citation analysis of scientific publications on the Web: a case study on the research area of XML. Paper presented at the 8th International Conference on Scientometrics and Informetrics. Proceedings - ISSI-2001 -, Sydney, NSW, Australia.

Endnotes

i. Sample methodology: System Definition, Model Formulation, Data Preparation, Model Translation, Validation, Strategic Planning, Tactical Planning, Experimentation, Interpretation, Implementation, Documentation

Monday, November 03, 2003

Reading the Brand Bullies

I like to consider myself a reader. I’ve read all sorts of stuff: fantasy, mystery, erudite biography, hardcover bestsellers, classics, and even a romance (only one). I once even worked my way through a pile of mildewed westerns and hardboiled dime novels I found in a disused footlocker at our cottage.

Even as a reader, I’m not sure why I read. Reading itself is hardly a static activity. We know that people engage in different modes of reading and that the reading of any text is a dynamic process. In Television Culture, Fiske provides a detailed review of the ways in which individuals engage with texts—be they books or television programs (Fiske, 1987). According to Fiske (and many other Reading-Response theorists like Derrida, Stanley Fish, etc.), the reader creates the meaning of the text.

The reader may construct the meaning of the text, but the actual creation of books is within the realm of authors, publishers, and distributors. Certain commentators have noted the increasing commercial motivation of the book industry (Radway, 1991), but it should be noted that this process is similarly dynamic. As demonstrated in studies by Kaestle and Darnton, the creation of books is subject to the vagaries of a cycle of communication between various parties and modalities are injected into the text at each stage. Even texts seemingly immune from discursive interpretations are subject to these production modalities. Pang (1998), for example, provides a fascinating account of how artistic license rather than scientific accuracy governed the creation of illustrations and plates in scientific journals!

So what are the highly unstable things—books—supposed to do? Kaestle (1991) provides a moving argument that books and literacy can be used to both control a population and to free a population. Kaestle, for example, describes the rise of literacy in puritanical New England. His argument is that indoctrination in scripture provided a means of controlling a population. I imagine a bunch of pilgrims deferring to the bible—which they can all read—in order to justify the burning of a witch. This scenario seems to resonate with Chelton’s (1997) description of the “overdue kid” in which a librarian defers to a computer terminal in order to exercise authority.

Despite the rise of literacy in particular eras, I have to wonder about functional literacy. Just because a pilgrim could read a bible does not necessarily mean that they are functionally literate or able to make informed decisions about their communities. The threat of functional illiteracy to democracy has been noted by a number of left wing commentators. It’s interesting to note that the ultra-right wing economist Friedrich Hayek voiced similar concerns. If I didn’t know better, I would rekindle the “fiction problem” (Ross, 1991) and state that Joe Consumer has to be educated in order to overcome functional illiteracy so they can become functional societal members.

Several authors have noted that Joe Consumer does have his own set of texts and his own genres of literature that meet his informational needs (Radway, 1991; Ross, 1991). Commentators consider these works “pap” or “trash” because they are unversed in the genre. Personally, I find in hard to consider texts such as the television programs “Joe Millionaire” or “The Mullets” anything but trash—my own genre illiteracy be damned!

Perhaps the greatest lesson from Kaestle’s review is his discussion of the problems of measuring literacy. From signature studies, for example, it seems that illiteracy followed a general societal ascendancy except for a period when industrialization caused a reverse trend due to breakdowns in family structure and increases in child labour. It seems ominous to me that our media—books and television included—are becoming increasingly banal in an era when most North Americans are employed in service sector jobs with few benefits and no opportunity for advancement (Klein, 2000). What sort of meanings does the Wal-Mart greeter create from primetime television?


References

Chelton, M. K. (1997). The "overdue kid": A face-to-face library service encounter as ritual interaction. Library & Information Science Research, 19(4), 387-399.
Fiske, J. (1987). Television culture. London ; New York: Methuen.
Kaestle, C. F. (1991). Literacy in the United States : readers and reading since 1880. New Haven: Yale University Press.
Klein, N. (2000). No space, no choice, no jobs, no logo : taking aim at the brand bullies. New York: Picador.
Pang, A. (1998). Technology, aesthetics, and the development of astrophotography at the Lick Observatory. In T. Lenoir (Ed.), Inscribing science: Scientific tests and the materiality of communication (pp. 223-248). Stanford CA: Stanford University Press.
Radway, J. A. (1991). Reading the romance : women, patriarchy, and popular literature. Chapel Hill: University of North Carolina Press.
Ross, C. S. (1991). Readers' Advisory Service: New Directions. RQ, 30(4), 503-518.

Basic IR Definitions

After attempting to tackle a rather formidable treatise on information retrieval (Baeza-Yates & Ribeiro, 1999), I’ve run into a few roadblocks. I need some definitions:



Bit Mask

A pattern of binary values which is combined with some value using bitwise AND with the result that bits in the value in positions where the mask is zero are also set to zero. For example, if, in C, we want to test if bits 0 or 2 of x are set, we can write

int mask = 5; /* binary 101 */

if (x & mask) ...

A bit mask might also be used to set certain bits using bitwise OR, or to invert them using bitwise exclusive OR.

Source: http://dict.die.net/bit%20mask/ (attributed to: The Free On-line Dictionary of Computing (09 FEB 02))



Hashing

Producing hash values for accessing data or for security. A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.
Hashes play a role in security systems where they're used to ensure that transmitted messages have not been tampered with. The sender generates a hash of the message, encrypts it, and sends it with the message itself. The recipient then decrypts both the message and the hash, produces another hash from the received message, and compares the two hashes. If they're the same, there is a very high probability that the message was transmitted intact.

Hashing is also a common method of accessing data records. Consider, for example, a list of names:

· John Smith
· Sarah Jones
· Roger Adams

To create an index, called a hash table, for these records, you would apply a formula to each name to produce a unique numeric value. So you might get something like:

· 1345873 John smith
· 3097905 Sarah Jones
· 4060964 Roger Adams

Then to search for the record containing Sarah Jones, you just need to reapply the formula, which directly yields the index key to the record. This is much more efficient than searching through all the records till the matching record is found.

Source: http://www.webopedia.com/TERM/h/hashing.html



Heap’s Law

For text files, the second important thing is the number of distinct words or vocabulary of each document. We use the Heaps’ Law. This is a very precise law ruling the growth of the vocabulary in natural language texts... Hence, the vocabulary of a text grows sublinearly with the text size, in a proportion close to its square root.

Source: http://www.pnclink.org/annual/annual1998/1998pdf/yates.pdf



Patricia Tree
Definition: A compact representation of a trie where all nodes with one child are merged with their parents.

See also suffix tree, compact DAWG.

Note: A compact directed acyclic word graph (DAWG) merges common suffix trees to save additional space.

Source: http://www.nist.gov/dads/HTML/patriciatree.html



Suffix

Definition: The end characters of a string. More formally a string v is a suffix of a string u if u=u'v for some string u'.

Source: http://www.nist.gov/dads/HTML/suffix.html



Suffix Tree

Definition: A compact representation of a trie corresponding to the suffixes of a given string where all nodes with one child are merged with their parents.

See also multi suffix tree, Patricia tree, suffix array, directed acyclic word graph.

Note: A suffix tree is a Patricia tree corresponding to the suffixes of a given string. A directed acyclic word graph (DAWG) is a more compact form.

Source: http://www.nist.gov/dads/HTML/suffixtree.html



Trie

Definition: A tree for storing strings in which there is one node for every common prefix. The strings are stored in extra leaf nodes.

See also digital tree, digital search tree, directed acyclic word graph, compact DAWG, Patricia tree, suffix tree.

Note: The name comes from retrieval and is pronounced, "tree."

Source: http://www.nist.gov/dads/HTML/trie.html



References

Baeza-Yates, R., & Ribeiro, B. d. A. N. (1999). Modern information retrieval. New York: ACM Press.