Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
The New Cataloger
I've often said librarians should like any metadata they see. This is because we are entering an age where MARC no longer rules, since the 21st-century library will be handling increasing amounts of born-digital material. Even now, librarians are using formats such as Dublin Core (DC), Metadata Object Description Schema (MODS), and Metadata Encoding and Transmission Standard (METS), among others, to capture and manipulate important data about various information resources. One metadata standard is way too inadequate for the job. Our job requires much more than facility with new formats. We will need new kinds of tools that are only now beginning to be imagined and created for a growing amount of born-digital material as well as books. Publishers are increasingly supplying machine-readable metadata about the publications they put out--largely to enable their books to be sold by Amazon and other online booksellers. These records could provide much enriching information to our existing MARC data if the infrastructure were in place to normalize the records. Publishers often provide cover art, pull quotes from reviews, descriptive text, author biographies, and other useful material that MARC records typically lack, which vendors like Syndetic Solutions supply to libraries for on-the-fly display. The inside scoop How do I know this? I walk around with over 10,000 ONIX metadata records on my laptop that I downloaded from willing publishers. If we had a service to collect these records from publishers and make them available to catalogers, we could have access to many valuable facts about library materials. The real news is what completely original kinds of tasks catalogers will be expected to perform. In an online world, where there are many amazing free resources, librarians must get better at selecting and providing access to the right slice of this material. Part of this will entail harvesting (automated gathering) of metadata that describes freely available resources. OAIster.org, the mega-harvester site at the University of Michigan, has gathered records for over seven million freely available resources. Gathering is just a start As work at the California Digital Library, Cornell University, University of Illinois at Urbana-Champaign, and other places demonstrates, the 21st-century librarian must be good at normalizing and enriching selected piles of metadata. Metadata created for one purpose or system may not be optimized for another purpose or system. Also, when you aggregate a wide variety of metadata, you find a surprising number of variances in encoding practices as well as simple errors (see "Bitter Harvest"). In response, we are investigating ways to normalize and enrich metadata for greater versatility. Our first success is a utility for normalizing and enriching dates. For example, when given a date as "1880s," the function will create four new date fields, from a normalized "1880-1889" to a set of date tokens for enabling searching (e.g., 1880, 1881). This type of operation can be executed as a record is captured and placed into a database, but other types of metadata transformation cannot be performed simply by software, e.g., assigning subjects. Experiments with topical clustering software have been encouraging but not flawless. The optimum solution may be to enable a cataloger to view automatic subjects made by the software and remove or add topic assignments. A new toolbox We also see a need for tools that enable a group of records to be selected based on virtually any criteria and then transformed in a particular way (e.g., change all occurrences of X to Y). As such, the modern cataloger will one day be a software-enabled specialist who can gather, subset, normalize, and enrich piles of records for a specific audience or purpose. The real challenge is the retooling and reeducation of those already in the field. A number of LIS programs have adjusted their curricula. A good place to start is Karen Coyle's "Metadata: Data with a Purpose." The need for catalogers will not go away soon, but what they will be asked to do will be very, very different. For more on the wired library, see the netConnect supplement mailed with this issue and with the January, July, and October 15 issues of LJ __________________________________________________________________ Link List Bitter Harvest www.cdlib.org/inside/ projects/harvesting/bitter_harvest.html Date Normalization Utility www.cdlib.org/inside/diglib/datenorm Metadata: Data with a Purpose www.kcoyle.net/meta_purpose.html METS & MODS www.loc.gov/standards OAIster oaister.org ONIX for Libraries roytennant.com/proto/onix