Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
The Consequences of Cataloging
As our usage statistics decline (on average), owing to the perceived promise of the Internet, we must think imaginatively. More and more, I believe that means establishing ever-wider cooperative relationships with other libraries. After all, there is no cheaper way to expand your collections than to make it possible (and easy) to request and receive materials from other institutions. The Westchester Library System, Ardsley, NY (see 'Technology and Teamwork,' Link List) is a great example. The 38 cooperating libraries made it easy for patrons to receive any book within the system by using the integrated interlibrary loan (ILL) software of their Dynix system and backing it up with a robust delivery service. In a challenge to conventional wisdom, older, long-ignored books began flying off the shelves. However, underlying this increased cooperation and its benefits are some niggling details that may prove to be significant stumbling blocks. Look no further than the icon and foundation of libraries--the catalog. Though it may be painstakingly constructed using respected standards such as MARC and AACR2, the catalog may be less standard and therefore less interoperable than we think. Too many records This became dramatically apparent as I prepared a talk for some ILL librarians. I decided to search the region's union catalog system for one of my books. The numerous records returned in part reflected multiple editions and printings. I decided to winnow down the results to only those records that seemed to describe the exact same book. I reduced the number to seven--seven independent records for the same book, in one medium-sized state. It appeared that two or three base records had been embellished or altered in various, mostly trivial ways. One misspelled the place of publication and added 'maps' to the physical description. Another quibbled with the copyright date (1993, but it was published at the end of 1992) and measured the book one centimeter smaller than the other records. One record said 'leaves' instead of 'p.' for the pagination notation. For subject headings, the records grouped around two main clusters. The differences seemed to revolve around plain mistakes of various kinds (misspellings mostly), added information, and disagreements. Except for the differences in subject headings, all the differences were completely and utterly inconsequential to the user. Since these variations were so trivial, why hadn't these records been merged? Because the system being searched is a 'virtual' union catalog. The records don't come from the same system but are merged on the fly after searching separate catalog systems. Karen Coyle, in 'The Virtual Union Catalog,' cautions that, with systems that retrieve large sets of results, merging records on the fly will be extremely difficult. With 'real' union catalogs, where records are contributed to one central database, there is more opportunity to merge duplicate records successfully, as well as to iron out trivial differences over time. There is no question that merging such records is vital to effective user services in a cooperative environment. It's not clear how we should handle records that vary, however slightly. For example, do most users care whether they get a hardback or a paperback? Some may, some may not. We must make it easy for them to select the correct title, and then the appropriate copy, without inundating them. How to merge records We consider it more important to know that we have a specific item in our collections rather than that several printings of a work hold the same content. Jeremy Hylton, in 'Identifying and Merging Related Bibliographic Records,' advocates Michael Buckland's idea of an 'information dossier' approach to merging relating records. This goes beyond the standard library practice of duplicate record detection and merging (see 'Record Matching: An Expert Algorithm') to merge records that describe different physical items but are algorithmically perceived to be the same intellectual object. Although Hylton's algorithm may be inadequate when faced with some of the ambiguous records that can be found in large library catalog systems, it nonetheless highlights an important issue: Don't users initially want to see that there are many different physical copies of a book, or do they want that initially hidden until they select a specific book to retrieve? How should our catalog systems mask information in displays where it isn't important but still make it displayable when users want to see it? As we move toward providing access to ever-larger pools of library content, these are questions we will need to answer, and answer well. It is clear that our cataloging practices can have unintended--and detrimental--consequences. __________________________________________________________________ LINK LIST Identifying and Merging Related Bibliographic Records ltt-www.lcs.mit.edu/ltt-www/People/jeremy/thesis/main.html Record Matching: An Expert Algorithm ASIS Proceedings, 22 (1985), 77-80 'Techology and Teamwork' Library Journal 9/1/00, p. 160-163 The Virtual Union Catalog www.dlib.org/dlib/march00/coyle/03coyle.html