As an undergraduate, I built a great number of models. I especially liked geotechnical models. We would construct miniature dams out of clay and sand and then load them to a massive centrifuge to simulate increased time and gravity. The centrifuge made the most fantastic noise: WHOOMP, WHOOMP, WHOOMP. If I close my eyes I can still imagine that noise and feel the vibrations in my feet as all of Ellis Hall rattled so we could determine seepage rates. If only simulation in library science held such promise.
I imagine that the models we could construct for library science would all live in a computer somewhere and we would run endless Monte Carlo simulations to establish something or other. It’s not a bad idea, really. I can already think of an application…
Bibliometrics and citation indexing are quite well established in the LIS discourse (Borgman, 1990). Using tools like the Social Science Citation Index (SSCI) we can zoom back and forth through time and trace the evolution of ideas. SSCI, however, is far from perfect. Since SSCI was originally a print tool, citations are often listed as mysterious acronyms and word variations cause havoc with determining real citations. It seems odd to me that SSCI depends on a dyadic interpretation of the word “citation”. Either an article has been cited or it hasn’t. Why doesn’t ISI incorporate a new citation coefficient? Instead of attempting to parse out and compile entire citations from full text sources, an n-gram matching algorithm could be used to determine co-citation values between 0 and one. This approach could provide another input for vector space modelling and determining best documents.
Would the n-gram approach I’ve described be an improvement over existing systems? User studies would be valuable but a simulation approach may be considerably easier—assuming we followed a rigorous design methodology (Shannon, 1975). Perhaps we could run some simulations based on the TREC document collection and compare our citation matching results to other Information Retrieval approaches.
Similarly we could manipulate our citation coefficient using some sort of algorithm that compares the similarity of documents thereby limiting the use of the “mercy-cite”.
Although my simulation model would be quite entertaining to build (but possibly too challenging for an LIS839 project), it’s still susceptible to some limitations. The most pressing of which are the observations of Zhao and Logan that there are now essentially two worlds of citation: print journal and electronic (Zhao & Logan, 2001). Their observations seem to resonate with Sandstrom’s comments on the localized information foraging patterns of scholars (Sandstrom, 1994).
Fortunately, our simulation could simply ignore the concerns of Zhao and Logan. If we were using the TREC sample, these concerns would be nonexistent. With simulation models, you can always simplify and abstract in order to make the model work… even if it doesn’t necessarily reflect real life.
Borgman, C. L. (1990). Scholarly communication and bibliometrics. Newbury Park: Sage Publications.
Sandstrom, P. E. (1994). An optimal foraging approach to information seeking and use. Library Quarterly, 64(4), 414-449.
Shannon, R. E. (1975). Systems simulation : the art and science. Englewood Cliffs, N.J.: Prentice-Hall.
Zhao, D., & Logan, E. (2001). Citation analysis of scientific publications on the Web: a case study on the research area of XML. Paper presented at the 8th International Conference on Scientometrics and Informetrics. Proceedings - ISSI-2001 -, Sydney, NSW, Australia.
i. Sample methodology: System Definition, Model Formulation, Data Preparation, Model Translation, Validation, Strategic Planning, Tactical Planning, Experimentation, Interpretation, Implementation, Documentation