It has been said, “There is no substitute for experience, but letting your wife do it is the next best thing.”1 This colloquialism expresses an idea that is more profound than an initial reading might suggest. The idea is that a personal, first-hand, internalized knowledge of information is ideal since it is entirely available to the individual at the point of need – assuming, of course, that it can be remembered. Otherwise, however, the presence of a substitute that points an individual to the needed information is the next best thing. In the real world, however, such substitutes become the practical ideal since not everyone has the same knowledge or vocabulary. The illustration here is clear: the use of surrogate records to point to information resources is, for a multiplicity of reasons, the most practical and therefore the best only real solution to the problems inherent in information representation and access.

Full-Text Indexing

The popularity of many full-text databases is likely attributable to their seeming ease of use, though, ironically, the simpler user interfaces usually require more non-intuitive and advanced knowledge to search effectively. Anyone can enter “jaguar” into Google’s single search box, but not many know how to limit the results to either the car, the old Mac operating system, or the animal. Yet, convincing a searcher that there are better, more efficient, ways to arrive at a desired set of results is not an easy task.

One of the impediments to successfully convincing searchers to learn what they consider to be needlessly complicated and irrelevant search syntax when using full-text databases is convincing them that using an intermediary layer between them and the text (or other information resource) is often more efficient. Understandably, most searchers balk at the thought of distancing themselves from the information in order to find it. It seems counter-intuitive. Who are we, anyway, to dictate the terms under which they can access information? Herein lies the rub, however. Without a system that quite literally does exactly that, most information resources will be less likely identified by the majority of searches. There are too many difficulties inherent in present-day full-text indexing methods for searches yield accurate and comprehensive results, and someone must indeed dictate the terms under which a resource can be found.

Full-text indexing is accomplished automatically, that is, it is a computerized process that extracts terms according to a defined algorithm. The process can be rather complex but is really rather simple in its conception: lexical analysis and term selection. Lexical analysis is the process by which formatted, punctuated, inflected text is dismantled into unformatted, uninflected, words. These tokens, as they are frequently called, then undergo the term selection process in which certain stop-words are removed. Some words are “stemmed,” or truncated, to remove any inflection from their verbal roots and to group lexically related words under their simplest form. Others, such as hyphenated words, are broken into their constituent parts. The terms are then “weighted” to determine their relative importance based, usually, on their frequency of occurrence.

The benefits of this type of indexing are, in my judgement, few but important. Full-text indexing is inexpensive and is becoming increasingly so. This is no small benefit. Libraries are chronically under-funded, and the bottom-line is always a concern. Database vendors, the primary producers of such databases, are for-profit businesses. Taken together, under-funded libraries and profit-driven vendors are constantly engaged in a tug-of-war as each pleads their case. Full-text indexing, though often a high-cost initial entry endeavor, appeals to both for the same reason: it is affordable.

The second important benefit to full-text indexing is that it removes the inconsistencies that result from the use of manual indexers. Spelling variants between indexers (color or colour? indexes or indices?) as well as the inevitable inconsistencies that a single indexer may apply are avoided with an indexing algorithm’s prescribed procedures. They will be followed correctly every time. Consistency is no small benefit either. Without it, the architectonic purpose of indexing is nullified.

These benefits are important. Taken together with the increasing expectation by searchers for full-text search capabilities, a strong argument is made for the implementation of full-text indexing of information resources -– especially of textually-based resources. Lest we rob Peter to pay Paul, however, there are further considerations to be had.

Surrogate Records

A surrogate record is “a presentation of the characteristics . . . of an information resource.”2 When referring to surrogate records in a catalog of bibliographic resources, this metadata typically includes three primary types of information: descriptive data, subject data, and classification data. These records are used to help render the resources for which they stand as intermediaries more identifiable to searchers. They do not provide the resource per se, but point to the resource. These records are no longer singular in their directionality, however. Rather, properly created surrogate records provide multiple points of access to the resource through the fields such as subjects and classifications, as well as the author’s name and the resource’s title. Indeed, the access points in contemporary surrogate records render the record multidirectional, and allow the resource to be identified via several avenues.

The crux of this argument lies in the appropriation of controlled vocabulary – a process which heretofore has proven elusive to automatic methods. Controlled vocabulary in a surrogate record includes the normalization of spelling, the assignment of preferred terminologies in order to address homographic and synonymic issues, and thereby reduces ambiguity. For example, without some terms being dictated one would not know whether to look under “C. S. Lewis” or “Clive Staples Lewis” as an author. The task of pursuing both in full-text searches becomes cumbersome without complicated syntax. The application of an authoritative term is really quite valuable.

Homographic problems are also illustrative of the usefulness of surrogate records. Does “Mercury” refer to the planet, the metal, the automobile, or the mythological god? Full-text indexing has no way to differentiate them. Controlled vocabularies have devised a multiplicity of solutions, and in the case of subject classification and its manifestation in a catalog’s surrogate record for a bibliographic item, render resources on each of these possibilities uniquely identifiable.

Such precision is perhaps the strongest benefit of this approach. This precision, however, is important enough to outweigh the potential weaknesses of this approach. Admittedly, indexing to produce surrogate records with controlled access points allows for the potential for a number of lesser problems. Foremost among these problems is cost. At present, no automated process is sufficient for the task. This lack of automation requires that controlled vocabularies be appropriated manually – a rather costly endeavor. This cost is off-set somewhat with collaborative cataloging, a fact on which I rely when indicating that this cost factor is a lesser problem in comparison to the benefit of precision. Inconsistency (both intra-and inter-indexer) will always be a potential when human indexers are involved. Additionally, and commonly, searchers choose terms not included by indexers.

These potential problems have prompted many to attempt to bridge the divide between full-text indexing and manual indexing with the use of computer programs. More specifically, projects are underway which endeavor to link the primary terms gleaned automatically through the aforementioned application of stemming programs, etc., with particular controlled subject vocabularies such as the Library of Congress classification scheme. These ongoing projects are exciting developments in the field, and hold promise for future use, but are not yet viable for widespread use.

Conclusion

Surrogacy is a term that brings instantly to mind the idea of a substitute. It may seem counter-intuitive to render a resource more findable by inserting an artificial layer between the resource and the searcher, but such is the case in the modern indexing world. Full-text indexing is gaining in popularity, but it is my judgment that until automated indexing can solve the various problems of inaccuracy by providing clear, accurate, and specific results, someone must do it themselves. The only practical way for this to happen is through the creation of records containing information about the resource that provides the user with multiple points of access to the identification of the resource. As long as physical collections of resources are the locus of consideration, only some system of surrogacy will allow for a collocated organization of the collection. In other words, surrogacy is the way to go – it removes much of the labor!

_____

1 Evan Esar, 20,000 Quips & Quotes (New York: Barnes & Noble Books, 1995) p. 284
2 Arlene G. Taylor and Daniel N. Joudrey, The Organization of Information, 3rd Edition (Westport, CT: Libraries Unlimited, 2009) p. 473.

Popularity: 16% [?]

No Comments | Category: Libraries, Research

A few people have inquired about my research on the role of printing during the Reformation, so I include it here for your reading pleasure. You may need to zoom-in a bit to read it.

`The Last Flicker of Flame’: The Influence of Printing on the Spread of the European Reformations

Popularity: 12% [?]

No Comments | Category: Uncategorized

I have been working on revising my basic presentation for library research (bibliographic instruction, as we call it). Take a look and let me know what you think. It is mainly in outline form so I can adjust it for different student populations.

Popularity: 17% [?]

No Comments | Category: Libraries, Research, Web

Agree or disagree?

Although young people demonstrate an apparent ease and familiarity with computers, they rely heavily on search engines, view rather than read and do not possess the critical and analytical skills to assess the information that they find on the Web. These behavioural traits are also increasingly becoming the norm for all age-groups, from younger pupils and undergraduates through to professors. The ability to concentrate deeply appears to be a dying skill.

From “Challenges for Great Libraries in the Age of the Digital Natives” by Dame Lynne J. Brindley, CEO, British Library, as the Miles Conrad lecturer at the 2009 annual meeting of the National Federation of Advanced Information Services. Lecture PDF.

I happen to agree — for the most part.

Popularity: 40% [?]

1 Comment | Category: Libraries, Research, Web

Many, many, thanks go to my colleague, Jason Fowler, and my new friends at The Association of Librarians & Archivists at Baptist Institutions (ALABI) for inviting me to give a presentation on Patron-Centered Spaces in Nashville last week. I enjoyed my brief time with them, and look forward to attending as a member in the future. They have graciously posted my manuscript for all who are interested.

I argued for Augustine’s definition of community from City of God, and then discussed the implications of this definition for a library’s physical and virtual spaces. Take a look and let me know what you think.

Popularity: 36% [?]

No Comments | Category: Conferences, Libraries, Tech, Theology

I am reviving the weekly webliography in which I summarize in a single post the “dogeared items from the web” listed in the sidebar. Dogeared from the web, March 1-6, 2009:

Popularity: 37% [?]

No Comments | Category: Commonplaces

I don’t have a Kindle. I have never used a Kindle. But I love the concept.

I do have books — lots of them. I love the concept and the craft of books. But I’m not a librarian for the sake of books.

I say these things because I agree to some extent with both of two opposing viewpoints on the Kindle’s impact on the culture of words and the future of books, both of which were published at theAtlantic.com.

Sven Birkerts’ article of March 2, 2009, “Resisting the Kindle,” laments the potential world created by the Kindle revolution in which “libraries survive as information centers rather than as repositories of printed books.” Professionally, I am actually fine with that. I am a librarian not primarily to preserve information but to make it available in ways that our students find helpful and accessible. Personally, however, his recognition that our literature is deeply contextual and historicized resonates with me. Consider:

Why, then, am I so uneasy about the page-to-screen transfer—a skeptic if not a downright resister? Perhaps it is because I see in the turning of literal pages—pages bound in literal books—a compelling larger value, and perceive in the move away from the book a move away from a certain kind of cultural understanding, one that I’m not confident that we are replacing, never mind improving upon. I’m not blind to the unwieldiness of the book, or to the cumbersome systems we must maintain to accommodate it—the vast libraries and complicated filing systems. But these structures evolved over centuries in ways that map our collective endeavor to understand and express our world. The book is part of a system. And that system stands for the labor and taxonomy of human understanding, and to touch a book is to touch that system, however lightly.

I think, though, that Matthew Battles’ article of March 5, 2009, “In Defense of the Kindle,” along with his 2003 book on the “unquiet history” of libraries, has helped to soothe my personal bibliophilic concerns:

Yet the culture of letters has always been subject to disruption and transformation. Indeed, since the advent of print, technologies of the book have changed dramatically, and with them the book’s place in society. The world of letters not only transcends these technological changes—it thrives because of them. Were that not the case, the cultural continuity that Birkerts holds so dear would have been lost long ago.

In other words, We didn’t start the fire. It was always burnin’ since the world’s been turnin’.

Popularity: 46% [?]

3 Comments | Category: Books, Libraries, Literature, Tech

Last April I highlighted JTOC, an online service written by Jason Fowler for viewing scans of the tables of contents for the most frequently used journals at our library.

The Mar/Apr issue of ONLINE: Exploring Technology & Resources for Information Professionals” turned me on to ticTOCs, another service for reviewing the latest Table of Contents (TOC) for any of 12,000+ scholarly journals. From ONLINEmag:


…users can find journals of interest by title, subject, or publisher; view the latest TOCs… ticTocs also allows users to export selected TOC RSS feeds to feedreaders and to import article citations into RefWorks.

ticTOC even links to the full-text of the articles, but this aspect of their service is subscription-based. The rest, however, appears to be completely free.

Popularity: 31% [?]

No Comments | Category: Research, Web

“When you come, bring the cloak that I left with Carpus at Troas, also the books, and above all the parchments.” (II Timothy 4:13, ESV)

– Apostle Paul, writing to Timothy as an old man imprisoned in a hole in the ground for spreading Christianity.

The Mamertine Prison is now a tourist destination included on many guided tours of Rome. Back then it was a literal hell-hole for the prisoners who were lowered into this cave-like underground dungeon through a hole in the ceiling. It was here in which the Apostle Paul was likely imprisoned near the end of his life.

Dark. Cold. Lonely.

And yet, he wanted his books. Why? What possible purpose could books serve for someone who knew what it was to have divine truth pour out through his own quill? Paul planted churches, invested in people, taught and discipled those who would teach and disciple. He penned two-thirds of the New Testament. And when the end seemed near, he wanted people … and books?

Lord willing, I will spend the next few posts attempting to answer this question and the implications for ministry in general and the ministry of theological librarianship in particular.

  • Post I. The purpose of books/parchments for Paul
  • Post II. The look of that purpose/need today
  • Post III. How we as theological librarians can meet that need.

Popularity: 36% [?]

2 Comments | Category: Books, Libraries, Ministry

I realize the title of this post sounds more like a Poirot novel, but I am actually refering to another example of mysterious book provenance I found in our library today. In 1851, the London publisher Thomas Bosworth published a second edition of Cases of Conscience; or, Lessons in Morals: for the Use of the Laity by Pascal the Younger (a.k.a. Pierce Connelly). The book is more of a pamphlet, and so was easily published together with a letter to W. E. Gladstone, Member of Parliament for the University of Oxford, who apparently held to some rather appeasing positions regarding the validity of the Church of Rome. The author attempts in this letter to convince Gladstone of the inconsistency of Romanism with true piety.

The letter itself is interesting reading, but the mysterious part is the handwritten, 4-page, note I found tucked within the book. The handwriting is rather hard to read (for me, at least), but it appears to be commending this publication along with the Church of Rome’s reply (which is not included in our binding). My best effort at interpreting note with links to images of the pages:

[page 1]
Blendworth _____
Hon. dean -
Feb. 6, 1859

My dear _______ /
I am very anxious / to put before you two / pamphlets written by / a friend of mine of / distinguished ability. / Their titles are “Cases / of Conscience or Lessons / in Morals for the use / of the Laity” by Pascal / [page 2] The Younger and / these men _____ Pascal / the Younger. / The Church of Rome’s / Defense against Cases / of Conscience with a Reply. / I consider these Pam- / phlets as one of the / severest blows, which the / Church of Rome has / received in modern / [page 3] times – a blow from which / she cannot recover – / Pray tell me the name / of your London bookseller / that _____ send you a / copy of each (of which I / expect _____ exceptance) / in kind to forward – / Should you like the Pam / phlets, those ____ will / kindly recommend them / to others; as it is a great / [page 4] object with my friend / (whose name I _____ _____ / mention) to sell his _____ / In this once well off, he / is now alas! in needy / circumstances -/
My archdeacon (_____) / says “this reply” is one of / the cleverest things he has / ever seen -/
_____ are my dear _____ / In _____ /

Edw. L. Ward

My best guess at the identity of the author is Edward Langton Ward, rector of Blendworth until his death in 1881.

Any help you can give me in deciphering the script of this note would be appreciated, for curiosity’s sake if nothing else. The next to last unreadable word appears to be the same as the second unreadable word.

Don’t you just love books?

Popularity: 40% [?]

3 Comments | Category: Book Provenance, Catholicism, Libraries