Cheating Grounded Theory

I have an inherent distrust of qualitative research methods: where are the stats? where’s the calculus? To pursue my research, however, it seems that I have to make amends with the qualitative paradigm and struggle through.

In discussing qualitative research involving participants, one topic always seems to emerge: Grounded Theory (GT). I’m not really sure what GT is but I’m a bit suspicious. The notion of assigning categories to observations in order to develop theory is subject to several criticisms like Traweek’s discussion of scientific method and Occam’s Razor (Traweek, 1996) and Bowker and Star’s comments of the ubiquity and necessity of the “other” category (Bowker & Star, 1999). Is some ways, GT sounds a bit like bibliographic classification—a topic with its own cadre of critics and sceptics (see discussion in Svenonius, 2000). In her discussion of GT and faceted classifications, Susan Leigh Star describes my own unease:

“Both struggle with a core problem—i.e., the representation of vernacular words and processes, empirically discovered, which will, although enthographically faithful, be powerful beyond the single instance or case study.” (Star, 1998 pg. 218)

One thing is certain, however; I must attempt to know and understand my enemy (It seems ironic that both Jesus and Sun Tzu offered similar advice). Towards this end, I am embracing the vocational model of discourse that I’m familiar with from my days in engineering. Instead of concocting arguments about the validity of Grounded Theory, I’m just going to create a cheat sheet summarizing how to do Grounded Theory (see details in Lofland & Lofland, 1995; Strauss & Corbin, 1990). While ugly to look at, this sheet is the hammer I need to conduct my research.

Analyzing Data (from Lofland & Lofland, 1995)

1. Strategy 1: Social Science Framing
a. Formulate generic propositions to summarize and describe data
i. Formal Propositions: Type, Frequencies, Magnitudes, Structures, Processes, Causes, Consequences, Agency
ii. Put aside typical notions of research or paper
iii. Can develop many possible propositions; one proposition per book chapter
2. Strategy 2: Socializing Anxiety
a. Emergent Induction
i. Not mechanical or easy
ii. Work persistently
3. Strategy 3: Coding
a. “The word (or short set of words) you apply to the item of data in answering such questions is a code.”
b. Physical Models: filing, PC Databasing
c. Types of Coding: housekeeping, analytic (emergent and analytical, items can have multiple codes)
d. Stages of Coding: initial coding, focused coding (with codes are used more and how?)
4. Strategy 4: Memoing
a. Emerges as main activity of research
b. Three kinds of memos: elemental, sorting (connections between memos), integrating (modes of integration of memos).
5. Strategy 5: Diagramming
a. Typologizing- what are the topic’s types?
b. Matrix Making
c. Concept Charting
d. Flow Charts
6. Strategy 6: Thinking Flexibly
a. Rephrasing
b. Changing Diagrams
c. Constantly Comparing
d. Thinking in extremes and opposites
e. Talking with fellows
f. Listening to fellows
g. Drawing back
h. Withholding judgement

Grounded Theory (from Strauss & Corbin, 1990)

1. Coding
a. Open
i. Propositions
ii. Create Lablels
iii. Making comparisons
iv. Categorizing and Categories
1. Define Properties
2. Dimensionalize properties
3. Establish subproperties
v. Line by line; sentence or paragraph; entire document
vi. Sensitivity
1. Who, what, when, where, how much, why, temporal
2. Comparisons: Flip Flop, systematic with other phenomena, far outs, red flags,
b. Axial
i. Make connections with categories and subs
ii. Conditions, Context (specific properties), Interactional Strategies, Consequences of Strategies
iii. Categories related to subcategories in “paradigm model”: Causal Conditions à phenomenon à context à intervening conditions à interactional strategies à consequences
iv. Linking categories: a) hypothetically relate by statements of relationship, b) verification against data, c) continued search for category properties and dimensions, d) explore variation
v. Final theory limited to those statements rooted in the data
c. Selective
i. Similar to axial but at a higher level
ii. Procedure
1. Explicate story line i.e., elevator pitch
2. Relate subsidiary categories to core categories via paradigm
3. Relate categories at a dimensional level
4. Validate relationships against the data
5. Fill in categories for refinement
2. Some cases don’t fit. Are they transitional? Are there intervening conditions?


Tuesday, October 21, 2003

Consuming the Consommé of Consumer Health

I recently prepared a report on business taxonomies. Taxonomies—like all classification schemas—suffer from semantic problems: one person’s Freedom Fighter is another’s Guerilla (Svenonius, 2000). It becomes obvious in the literature on business taxonomies that our classifications are filled with semantic interpretations. After reviewing some literature on consumer health issues, it seems that even information seeking behaviour is variable and subjective.

Williamson (2002), for example, studied the information needs and behaviours of women with breast cancer. The participants were largely dissatisfied with both the content of the material and the delivery method. Many suffered from information overload or felt that the provided information wasn’t appropriate to their needs. Their needs and behaviour didn’t necessarily match the way the information was provided.

In considering consumer health information, an important concept is authority. The Internet offers a lot of information but users must evaluate the quality of the provided material. PubMed’s consumer health portal——offers vetted consumer health information from sources such as public agencies, reputable agencies, and professional associations. The user’s interpretation of authority, however, doesn’t depend solely on the reputation of an agency. People construct their own concepts of authority based on their background and experience. McKenzie (In Press), for example, demonstrates that expecting mothers of twins engage in discursive rhetorical practices to assign authority to information sources, professional opinions, and lay knowledge.

Folk knowledge may be particularly significant for those searching for consumer health information. Pettigrew (1999), for example, introduced the concept of the “information ground” to describe the dynamic process of people coming together to share information. As evidenced by the dissatisfaction of the women with breast cancer and Pettigrew’s information ground, there may be a different way of conceptualizing information seeking difficulties rather than simply system inadequacies. Barriers to information may exist because the users need to be acculturated into a specific community of practice (Wenger, 1998) with established conceptions and vocabularies. Unfortunately, much of the consumer health information has been created for the professional medical community of practice rather than end users (see discussion of literature reading level in Baker, Wilson, & Kars, 1997).

Medical terminology has become so developed that information access problems exist even for members of the community. Doctors, for example, depend more on their university textbooks and standard handbooks than published journals (Marshall, 1993). As Ludwig Fleck noted in his discussion of scientific communities (1979), handbooks represent just the entry point to a particular literature; the “real” information lies in the esoteric journals. In the world of clinical information, the esoteric is out of reach of both the professionals and John Q. Public.

To understand some of the difficulties in searching for consumer health information, I’ve conducted my own search. While reading McKenzie’s article, I realized that I know nothing about twins. So, assuming my natural role of somebody interested in the natal development of twins, I did what so many others do: I hit PubMed.

After calling up the PubMed homepage in my browser of choice, I typed “twins” into the search field: 24,435 hits! Precision and Recall be damned, 24000 is too many hits!

I’m not sure what a “normal” searcher would do, but I used my librarian skills and turned immediately to the subject headings. I searched through MeSH for “twins” and was presented with an entire hierarchy of facets. Since I’m not a medical doctor, I had no idea what facet to use so I selected “education” and found three articles. One looked promising—“Twins: not just in science, but in society” (Segal, 2002).

Unfortunately, the full text link was down but I was able to track down the journal in Ingenta despite mismatched article titles and page numbers. The article turned out to be an opinion piece about the development of twins. The article was quite readable and provided an overview of some recent literature. In retrospect, however, I’m not sure what question the article was attempting to answer. As a “real” consumer health user, the article may have been interesting while lacking utility.

Upon refocusing my thoughts, I realized that I needed some non-academic information. While contemplating where to look (Health Canada? CDC? Beilstein? [nb. Beilstein is probably a useless database for consumer health but I like the name!]), I noticed that PubMed offered a link to “Consumer Health”. Upon following the link I found myself in NIH’s consumer health portal: MEDLINEplus. I repeated the procedure and typed “twins” into the search box. I was presented with several articles broken out into various categories: Health Topics, Drug Information, Medical Encyclopaedia, News, and Other. Under Health Topics, there was a specific topic entitled “Twins, Triplets, and Multiple Births” containing 15 articles. The titles of the most popular articles were provided to give the user greater context. One particular article related to Folic Acid and having twins caught my eye (Centres for Disease Control, 2003).

It seemed like MEDLINEplus provided better consumer health information than PubMed. Upon compiling Flesch-Kincaid scores the articles, however, I learned that both articles were practically identical with respect to readability (12.1 vs 12)!

…It seems like I’ve written myself into a corner and I’m not sure how to bring readability back to the “information grounds” and communities of practice. Maybe next week.


Monday, October 20, 2003

The Information Ground of The Ministry of Truth

Weber gave us a remarkable view into the inside of the bureaucracy. His views are admittedly rather Orwellian… in a pre-Orwell sort of way:

“Every bureaucracy seeks to increase the superiority of the professionally informed by keeping their knowledge and intentions secret. Bureaucratic administration always tends to be an administration of 'secret sessions': in so far as it can, it hides its knowledge and action from criticism.”(Weber, 1958 p.233)

Are we still confronted by Weber’s “Secret Sessions”? Do we still struggle to pry information from the bureaucratic vault? In some cases… yes.

In a discussion of Information and Referral centres, Risha Levinson claims “bureaucratic complexities, restricted admissions, extended waiting lists, and discriminatory practices often pose overwhelming barriers to those in need of services, particularly the poor, the ill, and the elderly.” (Levinson, 1988 pg.3). Furthermore, Marcella and Baxter maintain that 26% of a survey maintain that they have suffered a disadvantage through not finding information (2000).

It seems that people are unable to get the information they need. I’m unsure, however, if the blame for this lies entirely with our modern bureaucracies. I could go on at length about other communication difficulties such as discursive structures, semantic and epistemic boundaries, emic and etic languages, or peripherality in communities of practice. Weber’s notion, however, seems quite plausible.

Perhaps a way of addressing the problem is to understand who confronts the Weberian bureaucracy. We all do—but only in very specific roles. As professionals, we are rarely threatened by bureaucracy to the extent that our role as a “citizen participant” is reduced to that of a “consumer citizen” (Marcella & Baxter, 2000). Instead, it is as ordinary people that we are threatened. And, as Harris and Dewdney note: “everyone, regardless of occupation or social status, is an 'ordinary person' in some aspect of his or her life.” (Harris & Dewdney, 1994 p. 9).

We seem to maintain this rather Soviet image of the “ordinary person” against the machine—an everyman Atlas struggling with the world on his shoulders. There is, however, another image of the ordinary person. I’m rather partial to Bakhtin’s notion of the carnival and all of the hegemonic resistance that it implies. In terms of information seeking, Pettigrew’s notion of the “information ground” seems to accord with this carnival: “an environment temporarily created by the behaviour of people who have come together to perform a given task, but from which emerges a social atmosphere that fosters the spontaneous and serendipitous sharing of information.” (Pettigrew, 1999 p. 811). In the exchanges of the information ground, we exchange stories and give each other information that we need to structure our daily lives. Maybe it’s in these stories of the carnivalesque information ground that we find our resistance to the bureaucracy. One corporate folklorist even claims that “The days of studying organizations as grey Weberian bureaucracies, made up of rules and hierarchies, are now gone, as are the days of looking at people as anonymous functionaries or well-oiled cogs.” (Gabriel, 2000 p. 87).

That said, I don’t think that the Ministry of Health (Truth?) will be holding carnivals any time soon.


Sunday, October 19, 2003

Googling the Inverted Index

Our readings on Information Retrieval have been quite valuable. I now feel like I have some grounding in the systems side of our discipline. The readings are loaded into my citation manager and I know where to find the formula for constructing vector space models (although I’ll need my undergraduate textbooks to decipher what a “dot product” is). Despite my interest in IR, I have to wonder about our reliance on controlled vocabularies. Where would inverted indexes be without a controlled vocabulary? How does post-coordinated searching work? What about natural language queries?

To understand these issues, I’m taking a quick look at Google—specifically Brin and Page’s work graduate school work from 1998: (Brin & Page, 1998).

It seems that the original goal of Google was to improve the quality of the early search engines and to overcome the increasingly commercial nature of the web. The key to Google’s efficiency is their trademarked PageRank technology. Brin and Page, however, attribute PageRank to concepts of citation matching (without crediting ISI!).

In trying to understand Google, I thought that Brin and Page had created a completely new means of working with large document spaces. In their paper, however, it seems that they have just polished the methods described in our readings. Indeed, their research is rooted in Information Science: they note that the “very large corpus” typically used to test IR systems is completely inapplicable to the web.

From their paper, I learned:

As Google crawls pages, copies of the HMTL are compressed and written to a drive. Each page is given a unique ID. An indexing engine then notes word occurrences. These occurrences are written to a forward index. Indexing terms are broken into various buckets to improve efficiency. These buckets are then analyzed to establish a descriptive lexicon. The lexicon is then used to create an inverted index. It seems that we can’t get away from the inverted index!

One of the strengths of Google lies in the size of its lexicon: 14 million words. LCSH 26th edition, in comparison, contains only 270,000 controlled terms. The Google lexicon, however, can easily be contained in the working memory of a single computer (256 MB).

Despite their age, our readings are still very salient. Even Google is subject to those seemingly antique terms of “precision”, “recall”, and “inverted index”. Given Google’s impending IPO, perhaps we should spend more time in the antiquity of Information Science!


Chasms, Brokers, and Related Terms: The role of the researcher

Participant Observation has ordered some of my recent musings and ramblings. Perhaps it’s time to set the stage…

Vignette 1: It’s Friday morning and a group of aspiring researchers are gathered in front of me. I’m a bit irritable because I had to buy my coffee from the Donut Café; The Althouse cafeteria is closed on Fridays. There are three books in front of me: LCSH volume 1, APA’s Thesaurus of psychological index terms, and the Thesaurus of ERIC descriptors. For the next two hours I will be explaining the difficulties of cross-disciplinary research. I place the blame for these difficulties on the differences in how the various disciplines structure their realities as described by controlled vocabularies (see Machin’s description of Foucault and language structure (Machin, 2002)). The etic language of the researchers, in essence, becomes an emic language to a researcher trained in a different field (Taylor & Bogdan, 1975, p. 200).

NOTE: I decided against addressing either MeSH or ISI’s Permuterm indexing; Common sense prevailed.

Vignette 2: I’m talking to one of my colleagues and notice that all of the characters on his computer screen are in Chinese. I ask him how he can possibly remember so many different characters. He explains that there are only about 2,000 characters that he needs for communicating on most topics. When I express surprise at this large number, he laughs. English words, he explains, are much like individual characters to him. For the TOEFL, he had to learn 10,000 of these “characters”. To communicate in his scientific discipline, he was required to learn perhaps another 10,000. So my colleague has 20,000 communication terms. The LCSH 26th Ed., meanwhile, has 270,000 ( How can any of us move across boundaries if such a vast number of terms are involved?

These two vignettes illustrate the difficulties of moving across disciplines and languages. Social researchers—especially those engaged in participant observation—must overcome a similar barrier. To understand an environment or “cultural scene” they must become versed in the local language and “folk knowledge” (Morgan, 2002). Spanning these boundaries is no easy task but it can be accomplished. Brown and Duguid, for example, identify three specific processes: boundary objects, translators, and knowledge brokers, (1998).

In social science, the researcher is encouraged to “triangulate” their findings through the use of existing documentary sources (Denzin, 1989; Taylor & Bogdan, 1975). These documentary sources represent boundary objects. As Star and Geisemer (1999) indicate, however, boundary objects are loosely structured in common usage—much of the semantic meaning of the document is unavailable to the uninitiated researcher. Although they may be able to achieve some sense of an environment, a researcher will not gain fluency just through the use of boundary objects. Some sort of “apprenticeship” is required. Bockarie (2002), for example, invokes Vygotsky’s concept of the Zone of Proximal Development in analyzing an apprentice’s interactions with a journeyman. The first stage in the learning process for an apprentice is to learn the stories and communication techniques of the vocation. Taylor and Bogdan (1984), similarly recommend that researchers spend the first stage of their research becoming acculturated to a particular environment.

Translators may act as the researcher’s teacher and provide the researcher with some insight into the environment. The translator has in-depth knowledge of the participants’ emic language and makes this language available to the researcher. Any interchange between the researcher and the translator, however, is subject to the “translation competence” of the translator (Spradley, 1979). In using the language of the researcher, the specifics of the environment may be lost. Basically, to communicate between their communities the researcher and translator agree to use a less specific and well-defined thesaurus. This “pidgin” thesaurus (Galison, 1999) is necessarily incomplete.

The third way to navigate across the boundary is to use a “knowledge broker”. In social science research, this broker is generally referred to as a key informant (Denzin, 1989; Taylor & Bogdan, 1975). The informant, however, is subject to the same linguistic constraints as the translator. In addition, the informant may be only a marginal member of the particular community and not privy to complete details of particular interactions (Taylor & Bogdan, 1975). A more significant concern, however, is that new meanings may be created through the interaction of the informant and the researcher. Wenger (1998), for example, states:

“Brokers are able to make new connections across communities of practice, enable coordination, and--if they are good brokers--open new possibilities of meaning.” (p. 109)

As the informant and the researcher negotiate their positions in the environment, their roles change. They are no longer merely observer and observed or researcher and researched. Some authors describe this changing position with terms like “going native”. Wenger (1998), however, provides a taxonomy of various interaction roles based on the “trajectory” of individuals as they move through disparate communities: peripheral, inbound, insider, boundary, and outbound. In their relationship with knowledge brokers or key informants, the researcher must be aware of both their own interaction role and the role of the informant.

Cross-disciplinary research is no easy task but at least we have tools such as subject guides, bibliographies, and thesauri. Unfortunately, when dealing with real subjects and participant observation there is no “RT- Related Term”. Instead, we have to be cognizant of how we negotiate the gaps and crevasses of our disparate communities.


