Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
The $64,000 Question
"What digital library software application should I buy?" asks the typical harried library staff member. It usually means that his (or her) boss decided that, to be modern, they must go digital. The librarian or library assistant then bravely surfs the Internet to find the appropriate software. It's not that simple. What's a digital library after all? Is it a pile of licensed content or databases? Is it having your archival finding aids digitized and online? Is it having the content those finding aids describe digitized and online? Is it accepting and providing online access to preprints? Is it publishing books online? It is all that, and more. But there's no single software application to deal with the variety and complexity of digital library collections and services. Finding the tools Unfortunately, we're still not even close to finding the right tools. Except for online databases, we are mostly dealing with objects, at various levels of granularity. An archival finding aid describes a collection of objects. The objects described by a finding aid, such as individual historical photographs or a book, are discrete objects that may also be part of a logical collection. They can be described individually and placed within the context of a collection when appropriate. They are the "atom" of librarianship--an irreducible component. However, these "atoms" vary greatly. The description and structure of objects as diverse as photographs, journal articles, books, manuscripts (such as diaries), and datasets can vary significantly. How can we create one software application to manage and provide access to such a diverse group? One solution is to "encapsulate" all objects in a standardized descriptive structure so that software written to that specification will understand each type of object. Consider trying to make a handwritten diary available online in a complete and usable manner. This requires individual page images (to see the actual handwriting), a transcription of each page (so the diary is readable and perhaps searchable), and a way to navigate the diary. Any given item might comprise hundreds of individual files, and each image file would have to be matched up with its associated transcription. Also, these would all have to be appropriately slotted within the whole so a reader can page through the manuscript. Metadata (information that describes an object) is needed--and lots of it! More specifically, structural metadata is needed to specify which image file corresponds with "page one" and which text represents the transcription of that page. If we can agree on a standard method for encoding this structural metadata, and make this specification open and public, then anyone who wishes can write software to interact with these objects. This is the goal of the University of California at Berkeley's Making of America II Project (MOA II). Back to the future To see this in action, go to the web site and select the link for "The HTML-based MOA II Document Viewer." This leads to an application (a Java servlet) that lists a number of objects that have been encoded using the MOA II XML document type definition (DTD), which defines a method for encoding structural metadata. To see the system capabilities, select the "Patrick Breen Diary" (Breen was a member of the ill-fated Donner Party). If you click on a day in the left frame, the associated diary page appears in the right window. Drop-down menus above the right window allow different resolutions of the page image, even the transcribed text. Buttons allow browsing forward or back. The lower left corner shows the XML source document. The MOA II web site offers more information about the MOA II DTD and how the project is producing objects encoded to that specification. The Digital Library Federation publication "The Making of America II Testbed Project: A Digital Library Service Model" provides some essential background. The latest work is moving the specification from a DTD model to an XML schema, called the "Metadata Encoding and Transmission Standard" (METS). This not only brings the MOA II work into an XML framework that allows more software flexibility, it also adds the ability to either embed metadata (e.g., author and title) into the object description itself, or to point to it in an outside source such as a separate database. Other questions remain, such as how the objects are discovered by users in the first place, since some of these objects are not represented in library catalog records. But by encapsulating digital objects in standardized ways, we can create new modes of discovery. For example, a library might write an application to crawl remote collections of MOA II objects, extract descriptive metadata from them, and index them for searching. When an item of interest is discovered, the record link would enable a user to retrieve it from the library holding it. We likely won't have a single "digital library application" for every library. But with projects like MOA II, we can avoid the problems caused when each library makes up its own rules for encapsulating and describing digital objects. This may not be the answer to the $64,000 question, but it is close. LINK LIST The Making of America II Project sunsite.berkeley.edu/moa2 The Making of America II Testbed Project: A Digital Library Service Model www.clir.org/pubs/reports/pub87/pub87.pdf Metadata Encoding and Transmission Standard (METS) www.clir.org/diglib/standards/mets.xsd