Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Accessing Electronic Theses: Progress
Somehow the art of Salvador Dali seems to be an appropriate accompaniment to an eclectic group of people interested in electronic theses and dissertations (ETDs). Overlooking Tampa Bay, an international group of librarians, computer scientists, university administrators, and graduate students are sipping wine and munching hors d'oeuvres at the Salvador Dali Museum in St. Petersburg, FL, at the reception of the Third International Symposium on Electronic Theses and Dissertations (TISETD). The art is surreal, but the participants are in earnest. Some attendees -- such as those in the Networked Library of Theses and Dissertations, or NDLTD -- have been working in this field for a decade or more, while others are here only because a university dean or provost told them to come. But they all share an interest in accepting and providing access to theses and dissertations submitted electronically. There are several ways by which universities provide access to their ETDs, as well as various permutations and combinations of these methods. How this will shake out, unfortunately, remains an open question. UMI, XML, and other formats The easiest way for a library to make its theses and dissertations available online is to let a commercial company do it, UMI Dissertation Services (part of Bell and Howell Information and Learning) can provide copies of over one million theses and dissertations, with more than 100,000 available online as Adobe Acrobat files. After the ETD has been accepted, it is simply forwarded to UMI, which then creates an Adobe Acrobat version of the file and provides access free to the originating university. Other users must pay a fee -- from $21.50 to download a version to $46 for a hardcover paper copy. The benefit of this strategy is that it requires little or no infrastructure development or support on your part. However, the responsibility for providing access and long-term preservation for the material resides with a commercial company that may not last forever. While UMI may epitomize the easy way out, conference speakers seemed to be unanimous in their depiction of XML as the "right way" to do ETDs. The problem is that implementing an XML workflow for ETDs is unlikely to be either easy or clear at this time. Right now various universities are using nearly half a dozen different structural descriptions (document type definitions, or DTDs) for ETDs. Thus, it is unlikely that various ETD projects will be able to interoperate very well without first converting their ETDs to a common DTD. Yet another strategy is to accept ETDs in other formats, such as Microsoft Word and/or Adobe Acrobat. This will be much easier than requiring XML markup for the material, but it also means a less assured migration path as MS Word and/or Adobe Acrobat change or are replaced by other, competing applications. Key players home and away The home of the ETD effort is Virginia Tech University, and the person at the center is Edward Fox, a professor in the Department of Computer Science. He began the NDLTD, and Virginia Tech now has over 2000 of its students' ETDs online. Another leader is the Massachusetts Institute of Technology (MIT), which has put more than 4000 dissertations online. MIT has taken a very pragmatic approach. Since it was already duplicating dissertations to sell to those requesting them, MIT decided simply to digitize them, based on demand, for online access as part of the process. Now about 50 percent of the dissertations requested have already been digitized. Outside the United States, the Australian Digital Theses Project is very ambitious, with a beginning set of seven Australian universities involved in a pilot project to include eventually all 40 Australian universities once they have fully developed and tested the infrastructure. They are using a modified version of the submission software developed at Virginia Tech. Several people from the Humboldt University of Berlin were present and discussed their project to mark up ETDs in XML. As they noted, one key issue is the diversity of DTDs being used by various projects to mark up what are essentially the same kinds of documents. To address this problem, they are sponsoring a meeting in May to try to bring together the competing DTDs into one standardized DTD. By having ETDs marked up with the same structural tags, the user will be able to search more easily for and use ETDs from around the world. Since the underlying structure of the documents will be the same, search and display systems will be able to provide access to ETDs more easily from a variety of sources. Finding ETDs There are now two main methods for locating ETDs. The primary method is to use UMI's ProQuest Digital Dissertations site, which allows searches by a number of fields (keyword, author, title, school, subject). You can also browse by broad subjects, but in most cases there are too many results to make browsing very productive (for example, there are no narrower subject categories to select within "language and literature"). Once a subject category has been selected, however, additional search terms can be added to further narrow the set. ProQuest provides a 24-page "preview" of the ETD for free as individual page images. To find ETDs that are part of the NDLTD, you either must go to each institution's web site individually or try out the experimental "federated" search, which was not working as of this writing. Clearly the NDLTD has some distance to go to compete with UMI in both ease of searching or sheer numbers of ETDs. But, at least you don't have to pay for them if you access them through an NDLTD site. Speeding ahead? Standing on the pier in St. Petersburg, I can't help thinking about Dali's dripping clock. To my untrained eye, he seems to be saying that time is elastic, which somehow seems all the more appropriate regarding ETDs. People like Fox have been laboring for years, even a decade, and yet the number of ETDs available online through NDLTD institutions remains somewhat unimpressive. There are clearly issues that need to be addressed if ETDs are to become a major source of free content for digital libraries. But it's also clear that there will be many more people working to solve those issues, since this conference drew several times the number of attendees as either of the previous two meetings. Time may have been moving slowly in the ETD universe before, but now it appears that things are speeding up. Keep watching, and not just the clock. LINK LIST Australian Digital Theses Project http://www.library.unsw.edu.au/ thesis/thesis.html Humboldt University Ditigal Dissertations Project http://dochost.rz.hu-berlin.de/ epdiss/index_en.html NDLTD http://www.ndltd.org/ NDLTD Federated Search http://jin.dis.vt.edu/fedsearch/ ndltd/support/search-catalog.html ProQuest Digital Dissertations http://wwwlib.umi.com/ dissertations Salvador Dali Museum http://www.daliweb.com Third Intl. Symposium on Electronic Theses and Dissertations http://etd.eng.usf.edu/conference UMI Dissertation Services http://www.umi.com/hp/Sup port/DServices