Library Blog Comparison: AADL and BCL

Posted by Paul Roberts on March 6th, 2010

Note to my readers: this post is the result of an assignment for a class on information technology at the University of Kentucky (LIS637) as part of my MSLS studies. The assignment was to evaluate two library blogs in light of principles set forth in a reading list.

Two library blogs are here evaluated: the Ann Arbor District library in Ann Arbor, Michigan, and the James P. Boyce Centennial Library in Louisville, Kentucky. Many of the principles set forth in the reading were most applicatory to unaffiliated blogs written by individuals rather than to blogs affiliated with institutions such as libraries, but many of the principles were universally helpful for evaluating blogs of every type. When referring to specific principles mentioned on specific sites in the reading list, the following abbreviations are used.

The Ann Arbor District library (AADL) is a public library well-known for its innovation. Indeed, the primary reason why I have frequented the blog is for some of the unique and creative ideas they have implemented in recent years — even if some of them are difficult to find now (what happened to the mock catalog-card on which online users can scrawl?). Evaluating the blog-related aspects of their site has furthered my impression of them as an online library presence that “gets it.” The AADL site contains a great deal of information, but the various pages and views do not feel overly complicated by miscellaneous information (BW). The use of the blog for library news (WJ) is utilized effectively for the events blog. The links are plentiful, relevant (BS) and always explain where they go (DM). It is updated frequently and appears to contain up-to-date information. Commenting is welcomed, though it understandably requires user log-in before comments can be left. The comments are publicly readable. Questions are not typically posed in order to generate discussion.

The blog of the James P. Boyce Centennial Library, however, serves a different purpose. Rather than attempting to be a place for online discussion or conversation (SB), this library is using Wordpress primarily as a content management system of sorts. An RSS feed is indeed published for the library news, but comments are disabled because it is not intended to be a conversational medium. An XML icon is not displayed (WD), but a prominently-placed feed icon is placed at the top right. Topics in the blog are indeed mixed (DM) in that it includes news of a wide-ranging nature. The CSS of the site is quite well designed such that the blog’s pages are consistent and easy to navigate.

Popularity: 2% [?]

Abstract

This essay seeks to provide a via media in the discourse of social indexing versus professional manual indexing by arguing that the use of social indexing can indeed be useful, and perhaps even the most practical option for imminent use, but should be employed in the service of the eventual application of a formalized methodology. The perspective here set forth argues that professional manual indexing is always the final and theoretical best, but recognizes that temporal circumstances can often render social indexing as the temporary and practical best – yet only insofar as it enables a subsequent and properly thorough treatment. The argument is supported by brief examples from three categories of research fields: established fields, emergent fields, and entry fields.

Concept and Contention

Daniel Webster is credited with having said that a politician”s blind conviction that “something must be done” is “the parent of many bad measures.”1 This tyranny of the urgent to which Webster was referring has a way of tempting even the most judicious of politicians to very injudicious decisions when circumstances are such that they feel “something must be done.” Before too many stones are cast at such politicians, let the reader admit they this tendency is universal enough to include even those in the most judicious of all professions – library and information science.

The question must be asked, albeit somewhat tongue-in-cheek, if this tyranny of the urgent has affected the judgment of the discipline in regard to the continuing debate on the utility and viability of social indexing over against the established and proven manual indexing of information professionals. The lines have been drawn, sides have been chosen, and judgements have been passed. The debate, however, seems all too often to be between the one and the other, with little nuance or admission of mutual utility. The purpose of this essay is to propose a via media, a middle way, through the polarizing debate by arguing that there is a very real, useful, and helpful place for both approaches when applied purposefully.

It should be recognized that temporal circumstances do often construct a legitimate sense of urgency. When this urgency occurs in disciplines which are slow to react, the most sensical course of action is not to quickly and impatiently implement final decisions but rather to implement intervening measures which can provide some helpful, but temporary, structure and organization. Intervening measures need not be as robust, as comprehensive, or even as intuitive as the eventual final solution, but they are extremely important for many reasons. Intervening measures bring at least some structure and organization to the situation. They have incredible potential for shaping the debate as the situation unfolds. Most importantly, however, they enable more thorough subsequent treatments by “buying time” until a patient implementation of a more proper long-term solution is applied.

As Laura Kane McElfresh points out, changes to controlled vocabularies such as the Library of Congress Subject Headings (LCSH) are “constrained by the machineries of the bureaucracies that create the classiffication system.”2 In other words, the controlled approaches which have created such a careful, intricate, and thorough approach to the organization of information in the discipline of library science are not the most efficiently responsive of approaches. But they worked, and they worked well. Indeed, they continue to do so. They are, however, ill equipped for fast, relevant, and agile responses to emerging topics. The contemporary information landscape includes many emerging topics that are glaringly absent from the controlled vocabularies most frequently employed -– at least not in terminology relevant to the information searcher. What is to be done? Are users to expect that until the bureaucracy addresses their needs, there is to be no helpful approach to the identification and retrieval of relevant information? It is here that the utility of social indexing can be helpfully employed: relatively short-term and purposeful use of social tagging can help to bridge the gap between the immediate need for information access to fields not readily reflected in formalized and controlled vocabularies.

Contextualized Examples

The most responsive social approaches to indexing are most useful in emerging fields. Established fields of research, on the other hand, are best approached through existing controlled vocabularies that have been formalized through years of application and use. Emerging fields of research do not yet have this luxury and social indexing can be a helpful intervening measure. From a third perspective, neither social nor manual indexing have clear advantages when it comes to their use by novices in what shall be termed entry fields. Each of these shall now be addressed in turn.

1. Established Fields of Research

Controlled vocabularies employed by professional indexers here have a clear advantage for all the reasons explicated by the many basic texts in library science and even in those of research methodology. 3 The study of historical movements is example enough. Research in Reformation Studies is, to many people, a rather specific and narrow topic. For those engaged in research in this Sixteenth Century movement (movements, plural, some would argue) the field of “Reformation Studies” is unhelpfully broad given all its inherent internal, geographic and topical differentiations. For a researcher desiring to find information on Bohemian correspondence related to the Reformation in the Czech Republic, for example, the most effective and efficient option is clearly to to utilize a system such as LCSH where there is already a relevant subject heading.4

Think of the plethora of ways in which users would tag resources relevant to this particular topic. Or, perhaps more to the point, is this topic not too esoteric for a significant enough number of users to contribute enough tags to produce a helpful outcome? In Tom Steele’s defense of the superiority of folksonomies, he argues that “experiments have shown recall is fastest at the basic level. When shown pictures of dogs and birds, people were more likely to use the term ‘dog’ or ‘bird’ instead of ‘beagle’ or ‘robin.’” He concludes, “With tagging, the users can relate to their own basic level, whether it is ‘beagle’ or ‘dog.’ A controlled vocabulary uses a hierarchy instead, which may or may not match the users’ basic level.”5 Unfortunately for Steele, he just proved the wrong point by admitting that users will tend to tag resources with broad topical tags -– an approach clearly unhelpful when searching for detailed, specialized, esoteric information. To alleviate this problem, Steele later asserts that “a thesaurus like the LCSH can assist users creating tags in many ways.”6 Doubtful they will, but his point is taken. If they do, it only serves to further the viability of tagging as an intervening measure until controlled vocabularies can more adequately reflect the field.

Established fields of research are therefore best served through professional manual indexing. Indeed, all fields of research are theoretically best served by this means, but the exponential growth of information now available to searchers is obviously much greater that the capacity of any corpus of professional indexers. It is at this point that this theoretical best must yield to the practical best as a “tide-me-over” in order to temporarily alleviate the urgency.

2. Emergent Fields of Research

A sagacious use of social indexing can help relieve the tyranny of the urgent, and here the very topic of social indexing is its own example. Social indexing is a process identified by multiple monikers – folksonomies, social bookmarking, social tagging, collaborative tagging, distributed tagging, and more. A single term has yet to emerge as the preferred term in the field. In other words, a multiplicity of terminologies are used for the study of social indexing. The LCSH, on the other hand, does not yet have any readily identifiable relevant authority terms. A researcher seeking information on the emerging field of social indexing should not be expected to wait until the LCSH bureaucracy responds. Social indexing can be the very tool that is needed needed to solve the problem.

There are those who have argued that user tagging would enhance libraries’ websites and catalogs.7 It could be argued that most of the literature on emerging fields is published serially, but there is much discussion of how social indexing would also help in identifying monographic material already in publication but which is deemed in retrospect to be relevant to an emerging topic. Librarians are more unlikely to index exhaustively enough to identify secondary or tertiary issues within a resource when indexing it, and they are even more unlikely to retrospectively edit a record to add further index terms. Social indexing is a great advantage at this particular point.

3. Entry Fields of Research

It must be recognized that social indexing offers fewer barriers to involvement in that users do not need a previous knowledge of complicated thesauri or controlled vocabularies in order to participate in the process.8 Novice information seekers are drawn to the practice because this lack of necessary instruction. It is somewhat intuitive for personal use. This also explains Rolla’s conclusion that tagging is more commonly employed for popular works, with the consequence that its usefulness for special and academic libraries remains in ques-tion.9 More specifically, Montana State University Libraries had concerns over whether social indexing would be adequate for the electronic dissertations and theses (ETDs) and discovered that though the average ETD in their collection had four LCSH headings, only 2.4 percent had tags assigned by users.10 Clearly social indexing is best reserved for more popular settings.

Conclusion and Summary

Controlled vocabularies easily vanquish the problems of polysemy, synonymy, and basic-level variation, all of which are significant problems with the social approach to indexing. However, Peter Rolla’s comparative study of tagging on LibraryThing with LCSH found that in every LibraryThing record the tagging community assigned at least one concept not covered by the subject headings in the catalog record.11 Both approaches, then, have clear strengths. But determining when to employ each requires a purposeful approach. It is here argued that the use of controlled vocabularies – especially in academic settings – is to be preferred, but due to the slow nature of these systems in responding to new topics social indexing can serve as a helpful and viable intervening measure. This approach ensures the continued careful treatment of topics by professional manual indexers, while taking advantage of the adaptability of social indexing. Indeed, the two can learn from each other without the resultant “bad measures” that flow from acting on the impetus that “something must be done.”
_____

1 Fadiman, Clifton Fadiman, The American Treasury, 1455-1955 (New York: Harper, 1955) 338.

2 Laura Kane McElfresh, “Folksonomies and the Future of Subject Cataloging.” Technicalities 28, no. 2 (March/April 2008): 3-6. Library Lit & Inf Full Text, WilsonWeb (accessed October 11, 2009).

3 Thomas Mann provides an entire chapter on subject headings in his Oxford Guide to Library Research [How to Find Reliable Information Online and Offline] (Oxford: Oxford Univ. Press, 2005).

4 Reformation–Czech Republic–Bohemia–Correspondence

5 Tom Steele, “The New Cooperative Cataloging” Library Hi Tech 27, No. 1 (2009): 70. Library Lit & Inf Full Text, WilsonWeb (accessed October 11, 2009).

6 Steele, 72.

7 See, for example, Louise F. Spiteri, “The Use of Folksonomies in Public Library Catalogues,” Serials Librarian 51, no. 2 (2006): 75-89.

8 See Darlene Fichter, “Intranet Applications for Tagging and Folksonomies,” Online 30, no. 3 (2006): 43-46; See also Ellyssa Kroski’s assertion that metadata is now in the realm of Everyman in L. Gordon-Murnane, “Social bookmarking, folksonomies, and Web 2.0 tools,” Searcher 14: 26-38.

9 Peter J. Rolla, “User Tags versus Subject Headings: Can User-Supplied Data Improve Subject Access to Library Collections?” Library Resources and Technical Services 53, no. 3 (2009): 178.

10 Elaine Peterson, “Patron Preferences for Folksonomy Tags: Research Findings When Both Hierarchical Subject Headings and Folksonomy Tags Are Used,” Evidence Based Library and Information Practice 4, no. 1 (2009): 55.

11 Rolla, 183.

Popularity: 2% [?]

It has been said, “There is no substitute for experience, but letting your wife do it is the next best thing.”1 This colloquialism expresses an idea that is more profound than an initial reading might suggest. The idea is that a personal, first-hand, internalized knowledge of information is ideal since it is entirely available to the individual at the point of need – assuming, of course, that it can be remembered. Otherwise, however, the presence of a substitute that points an individual to the needed information is the next best thing. In the real world, however, such substitutes become the practical ideal since not everyone has the same knowledge or vocabulary. The illustration here is clear: the use of surrogate records to point to information resources is, for a multiplicity of reasons, the most practical and therefore the best only real solution to the problems inherent in information representation and access.

Full-Text Indexing

The popularity of many full-text databases is likely attributable to their seeming ease of use, though, ironically, the simpler user interfaces usually require more non-intuitive and advanced knowledge to search effectively. Anyone can enter “jaguar” into Google’s single search box, but not many know how to limit the results to either the car, the old Mac operating system, or the animal. Yet, convincing a searcher that there are better, more efficient, ways to arrive at a desired set of results is not an easy task.

One of the impediments to successfully convincing searchers to learn what they consider to be needlessly complicated and irrelevant search syntax when using full-text databases is convincing them that using an intermediary layer between them and the text (or other information resource) is often more efficient. Understandably, most searchers balk at the thought of distancing themselves from the information in order to find it. It seems counter-intuitive. Who are we, anyway, to dictate the terms under which they can access information? Herein lies the rub, however. Without a system that quite literally does exactly that, most information resources will be less likely identified by the majority of searches. There are too many difficulties inherent in present-day full-text indexing methods for searches yield accurate and comprehensive results, and someone must indeed dictate the terms under which a resource can be found.

Full-text indexing is accomplished automatically, that is, it is a computerized process that extracts terms according to a defined algorithm. The process can be rather complex but is really rather simple in its conception: lexical analysis and term selection. Lexical analysis is the process by which formatted, punctuated, inflected text is dismantled into unformatted, uninflected, words. These tokens, as they are frequently called, then undergo the term selection process in which certain stop-words are removed. Some words are “stemmed,” or truncated, to remove any inflection from their verbal roots and to group lexically related words under their simplest form. Others, such as hyphenated words, are broken into their constituent parts. The terms are then “weighted” to determine their relative importance based, usually, on their frequency of occurrence.

The benefits of this type of indexing are, in my judgement, few but important. Full-text indexing is inexpensive and is becoming increasingly so. This is no small benefit. Libraries are chronically under-funded, and the bottom-line is always a concern. Database vendors, the primary producers of such databases, are for-profit businesses. Taken together, under-funded libraries and profit-driven vendors are constantly engaged in a tug-of-war as each pleads their case. Full-text indexing, though often a high-cost initial entry endeavor, appeals to both for the same reason: it is affordable.

The second important benefit to full-text indexing is that it removes the inconsistencies that result from the use of manual indexers. Spelling variants between indexers (color or colour? indexes or indices?) as well as the inevitable inconsistencies that a single indexer may apply are avoided with an indexing algorithm’s prescribed procedures. They will be followed correctly every time. Consistency is no small benefit either. Without it, the architectonic purpose of indexing is nullified.

These benefits are important. Taken together with the increasing expectation by searchers for full-text search capabilities, a strong argument is made for the implementation of full-text indexing of information resources -– especially of textually-based resources. Lest we rob Peter to pay Paul, however, there are further considerations to be had.

Surrogate Records

A surrogate record is “a presentation of the characteristics . . . of an information resource.”2 When referring to surrogate records in a catalog of bibliographic resources, this metadata typically includes three primary types of information: descriptive data, subject data, and classification data. These records are used to help render the resources for which they stand as intermediaries more identifiable to searchers. They do not provide the resource per se, but point to the resource. These records are no longer singular in their directionality, however. Rather, properly created surrogate records provide multiple points of access to the resource through the fields such as subjects and classifications, as well as the author’s name and the resource’s title. Indeed, the access points in contemporary surrogate records render the record multidirectional, and allow the resource to be identified via several avenues.

The crux of this argument lies in the appropriation of controlled vocabulary – a process which heretofore has proven elusive to automatic methods. Controlled vocabulary in a surrogate record includes the normalization of spelling, the assignment of preferred terminologies in order to address homographic and synonymic issues, and thereby reduces ambiguity. For example, without some terms being dictated one would not know whether to look under “C. S. Lewis” or “Clive Staples Lewis” as an author. The task of pursuing both in full-text searches becomes cumbersome without complicated syntax. The application of an authoritative term is really quite valuable.

Homographic problems are also illustrative of the usefulness of surrogate records. Does “Mercury” refer to the planet, the metal, the automobile, or the mythological god? Full-text indexing has no way to differentiate them. Controlled vocabularies have devised a multiplicity of solutions, and in the case of subject classification and its manifestation in a catalog’s surrogate record for a bibliographic item, render resources on each of these possibilities uniquely identifiable.

Such precision is perhaps the strongest benefit of this approach. This precision, however, is important enough to outweigh the potential weaknesses of this approach. Admittedly, indexing to produce surrogate records with controlled access points allows for the potential for a number of lesser problems. Foremost among these problems is cost. At present, no automated process is sufficient for the task. This lack of automation requires that controlled vocabularies be appropriated manually – a rather costly endeavor. This cost is off-set somewhat with collaborative cataloging, a fact on which I rely when indicating that this cost factor is a lesser problem in comparison to the benefit of precision. Inconsistency (both intra-and inter-indexer) will always be a potential when human indexers are involved. Additionally, and commonly, searchers choose terms not included by indexers.

These potential problems have prompted many to attempt to bridge the divide between full-text indexing and manual indexing with the use of computer programs. More specifically, projects are underway which endeavor to link the primary terms gleaned automatically through the aforementioned application of stemming programs, etc., with particular controlled subject vocabularies such as the Library of Congress classification scheme. These ongoing projects are exciting developments in the field, and hold promise for future use, but are not yet viable for widespread use.

Conclusion

Surrogacy is a term that brings instantly to mind the idea of a substitute. It may seem counter-intuitive to render a resource more findable by inserting an artificial layer between the resource and the searcher, but such is the case in the modern indexing world. Full-text indexing is gaining in popularity, but it is my judgment that until automated indexing can solve the various problems of inaccuracy by providing clear, accurate, and specific results, someone must do it themselves. The only practical way for this to happen is through the creation of records containing information about the resource that provides the user with multiple points of access to the identification of the resource. As long as physical collections of resources are the locus of consideration, only some system of surrogacy will allow for a collocated organization of the collection. In other words, surrogacy is the way to go – it removes much of the labor!

_____

1 Evan Esar, 20,000 Quips & Quotes (New York: Barnes & Noble Books, 1995) p. 284
2 Arlene G. Taylor and Daniel N. Joudrey, The Organization of Information, 3rd Edition (Westport, CT: Libraries Unlimited, 2009) p. 473.

Popularity: 3% [?]

alt="Feed" /> comments rss

Creative Commons Creative Commons

WordPress
eXTReMe Tracker