Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
A Database for Every Need
Many digital library projects require database software -- whether for a catalog of holdings, subject pathfinders, or other kinds of structured information. Database solutions are actually fairly generic; the same software can support a large variety of uses. Only one of the solutions noted below (SiteSearch) is tailored to any degree at all for library uses, and yet it can also be used for other types of data. Libraries are increasingly finding database software to be an essential infrastructure component, whether they use it to serve their library catalog or to build web pages on the fly. So, it's likely that your library -- no matter what size or type -- could make effective use of database software. The good news is that there are many solutions from which to choose. The bad news is that there are many solutions from which to choose. Depending on the computer you have to run the software, the size of the database, the level of usage you expect, and your budget, one of the following options may be right for you. For more detailed information about options and choices, see Eric Morgan's paper "DBMs and Web Delivery." Workstation solutions Workstation solutions are database products that run on a desktop machine -- typically an MS Windows NT or Macintosh computer. The advantages include relative ease in creating and maintaining the database, low cost, a large installed user base, and good self-support options (e.g., many books that explain how to use the most popular systems). However, users may face infrequent data backup, "downtime" (time when the database is unavailable) and a lack of scalability (the capacity to grow larger or support more users). In general, these solutions are best for small, simple databases for which you do not expect many simultaneous users. The premier workstation solution for computers running MS Windows NT is, not surprisingly, Microsoft Access. Access has been around for years, and there are a large number of users you can consult for ad hoc assistance. For Macintosh users, FileMaker Pro clearly leads in terms of numbers of installations and integration with Macintosh web servers. However, FileMaker is now an option for MS Windows users and thus should be considered by them as well. Server solutions If a workstation solution does not offer enough power or reliability, you may need to step up to a larger, faster computer that is maintained by trained staff as a network server. Using a server, you gain a faster CPU (processor), more RAM (volatile memory), larger hard-disk capacity (persistent memory), and round-the-clock support. This allows you to serve more simultaneous users quickly. Server solutions vary widely in complexity, including very simple solutions (like Sprite), those with mid-range complexity (like MySQL), and industrial-strength applications that I discuss below as enterprise solutions. Sprite is a free Perl module, which means it can be installed and used on any computer that runs the Perl programming language (usually, but not exclusively, Unix computers). This solution is so simple it isn't even a database -- it stores data as a flat file. However, to exploit fully this solution you must be at least passingly familiar with Perl. For those who can write simple Perl scripts, this is a fine solution for small databases of a few thousand records that do not require frequent updates. MySQL is compliant with the Structured Query Language (SQL) syntax and thus can be queried with the same syntax used to query large commercial databases like Oracle and Sybase (see below). However, unlike those solutions, MySQL is free. It nonetheless can handle thousands of records with aplomb and with decent hardware support can easily handle many simultaneous users. This is the industrial solution for those who don't have industrial dollars. These kinds of solutions tend to be best when you have more time on your hands than money. Since they're free, you don't need much cash, but you will need time to program web front-ends to them so users can easily interact with these database "shells." Enterprise solutions If your needs surpass the solutions listed above, or if you wish to create a robust and scalable infrastructure, consider an enterprise solution. They do not come inexpensively -- in terms of both money and staff (time and level of expertise). These are big and complex solutions for big and complex databases. However, depending on the hardware you have, they may run fine on the same machine that runs what I call server solutions. The difference is not in the hardware, nor operating sytem environment, but in the complexity, robustness, and commercial nature (and therefore the availability of support) of the enterprise solutions. An enterprise database created specifically with library needs in mind is SiteSearch from OCLC. SiteSearch is both MARC and Z39.50 compliant, but it also can manage many other kinds of data. General-purpose enterprise database solutions like Informix, Oracle, and Sybase are primarily aimed at the business market; they may work for libraries, though they're less consistent in supporting such library standards as MARC and Z39.50. On the plus side, they (particularly Oracle and Sybase) are widely implemented in the commercial sector, which means experienced people and good support books should be easy to find. Making your choice To decide on the appropriate database software for a specific need, you must carefully consider a number of variables: the ease and effectiveness with which you can create a usable and effective database for your users; how easy or difficult it is to set up and maintain; the hardware that you have available or can purchase to run it; the number of simultaneous users you expect to serve; the amount and quality of technical support and service upon which you can rely (either in-house or commercial); and the overall cost (including any server purchase or upgrade, technical support, the cost of the software, etc.). Although generalizations do not always apply, small libraries of any type will usually do just fine with a workstation solution like MS Access, while larger libraries may require the power and scalability of server and enterprise solutions. When it comes time to decide, you can take some solace because nearly any solution will allow you to export your data should you make the wrong decision or change the decision variables cited above. LINK LIST "DBMs and Web Delivery" http://www.lib.ncsu.edu/staff/ morgan/dbms-and-web-delivery/ FileMaker Pro http://www.filemaker.com/ Informix http://www.informix.com/ Microsoft Access http://www.microsoft.com/office/access/ MySQL http://www.mysql.org/ Oracle http://www.oracle.com/ SiteSearch http://purl.oclc.org/SiteSearch/ Sprite http://www.perl.com/CPAN-local /modules/by-module/Sprite/ Sybase http://www.sybase.com/