Tuesday, December 09, 2014

Categorization and Classification, revisited

An incredibly ad hoc search led me to a 2004 Library Trends article by Elin Jacob, entitled "Classification and Categorization: A Difference that Makes a Difference."

Some insights:

"Categorization divides the world of experience into groups or categories whose members share some perceptible similarity within a given context. That this context may vary and with it the composition of the of the category is the very basis for both the flexibility and the power of cognitive categorization." (p. 518)

"As experimentally-based categories evolve into well-defined, domain-specific classes that facilitate sharing of knowledge without lose of information, they lose their original flexibility and plasticity as well as the ability to response to new patterns of similarity." (p. 519)

Classical theory of categorization -- rigid hierarchies of shared features. Empirical research indicated that people are vary good at assigning a graded structure for category membership. For example, consider the set |robin, pigeon, ostrich, butterfly, chair|. People can assign each of these a score for how well they belong to the category of "bird". This observation challenges the idea "that there is an explicit inclusion/exclusion relationship between an entity and a category."

"Classification" refers to the use of a "representational tool used to organize a collection of information resources." Jacob explains:

"Classification as process involves the orderly and systematic assignment of each entity to one and one class within a system of mutually exclusive and non-overlapping classes. This process is lawful and systematic: lawful because it is carried out in accordance with an established set of principles that governs the structure of classes and class relationships; and systematic because it mandates consistent application of these principles within the framework of a prescribed order of reality. The scheme itself is artificial and arbitrary." (p.522)

Jacob explores different types of classification starting with the most rigid: taxonomic classification, as exemplified by the Linnaeun system. These hierarchies are incredibly valuable for stabilizing nomenclature and facilitating knowledge transmission. They are also limited in that they constrain "the information context by limiting the identification of knowledge-bearing associations to hierarchical relationships between classes."

Classification Schemes represent another approach. Jacob cites Shera who noted that all classificaion schemes rely on four assumptions: universal order, unity of knowledge, similarity of class members, and intrinsic essence. Bibliographic Classification Schemes have traditionally been a deductive approach. Faceted schemes are inductive, requiring an analysis of the "universe of knowledge" to identify appropriate properties and features. These terms can then be grouped into hierarchies. Jacob notes:

"The result is not a classification scheme but a controlled vocabulary of concepts and their associated labels that can be used, in association with a notation and prescribed citation order, to synthesize the classes that will populate the classification scheme."

Jacob goes back to Shera to describe the seven properties of a bibliographic classification scheme:

  • linearity
  • inclusivity of all knowledge within the classification's universe
  • well-defined, specific, and meaningful class labels
  • an arrangement of classes that establishes relationships between them
  • distinctions between classes that are meaningful
  • a mutually exclusive and nonoverlapping class structure
  • an infinite hospitality than can accomodate every entity in the bibliographic universe

Classification and categorization are related concepts but they are not the same thing:

"While traditional classification is rigorous in that it mandates that an entity either is or is not a member of a particular class, the process of categorization is flexible and creative and draws nonbinding associations between entities." (p. 527)

Basically, classification is the process of forcing entities into an arbitrary system based on specific rules while categorization drives definition based on context. Borges's taxonomy, for example, seems like a terrible classification because the underlying rules are impossible to divine but it might be an effective categorization based on the context in which it was created and for the relevant epistemic community.

Jacob's table describing the differences between classification and categorization is sufficiently interesting that I will include it completely:

Interestingly, Jacob notes that categories don't necessarily provide organization. The categories might be relevant to a particular group member but they might not actually demonstrate any hierarchical structure, thereby introducing challenges of access and navigation.

Free-text search, for example, represents "a very elementary mechanism for grouping." This limitation means that "a free-text retrieval system cannot contribute to an information environment that will support or enhance the value of system output through the establishment of meaningful context." More nuanced controls are post-coordinated systems, pre-coordinated systems, and classification systems. Subject headings, for example, enable multiple access points while a classification system enables only one (e.g., shelving location).

It's with subject headings that we get some challenges:

"Unlike the systematic and principled structure of a classification system, the structure of a subject heading system is frequently unprincipled, unsystematic, and poly-hierarchical. And, unlike the relationships established between well-defined and mutually exclusive classes in a classification, any relationships created between the categories of a subject heading system cannot be assumed to be either meaningful for information-bearing." (p.536)

Post-coordinated systems also introduce some challenges. According to Jacob, they are "simply mechanisms for grouping, not systems of organization."

Whew... I'm not sure if I'm any further ahead after reading that. Basically, categorization is dependent on context and may -- or may not -- actually lead to any sort of organization. Classification however, is a "lawful and arbitrary" system of forcing things into an organizational system. This system enforces the vocabulary and conventions of a discipline. Pre-coordinated and post-coordinated subject headings provide some middle ground.

But do users actually care?


