:: Digital Libraries Columns


Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date. :: Digital Libraries Columns

Peer-to-Peer Networks: Promise & Peril


Recently the press has been full of news about how the music industry is trying to shut Napster down to prevent users from swapping pirated music files. But the genie's out of the bottle, and no one -- no person, industry, or government -- can put it back. Napster is just the tip of the iceberg.

To understand this, you should know how Napster works, how something called Gnutella works, and what this may mean for libraries.

Napster under the hood

Napster was invented in January 1999 by Shawn Fanning, then a freshman at Northeastern University. He combined the capacity to find and download music files stored in the MPEG 3 (MP3) format with the capacity of chat. He soon quit school to pursue its development full time. The booming company has 45 employees at its Redwood City, CA, location and claims 20 million users. Meanwhile, it is being sued by the Recording Industry Association of America (RIAA) for copyright infringement.

Napster is comprised of a client application and a set of servers. The client application is downloaded easily and installed on any Windows or Macintosh computer. (Mac users must use Macster, since Napster hasn't yet released a Mac client.) Users may then download files from other users of the Napster software and perhaps share their own files. Napster is optimized for sharing music files using MP3.

When you start up your Napster (or Macster) software, you are logged into one of several servers run by Napster, Inc., and any MP3 files you share become available for others to search and download. (Since MP3 files are decoded, not executed, there is no virus danger.) Conversely, if you want to find a particular song or sound file, you can search the server's entries for files available from currently logged-on users. If you find something you want, the server connects you to the other person's PC for the download. No central server stores the files; they come from the PCs of individual network users.

In my experience, Napster can be an effective way to share and download sound files. Still, a download may take a frustratingly long time to finish, or fail entirely. Also, the availability and quality of the descriptive information about the files are spotty. That said, for those who have no qualms about violating copyright, the music world is basically laid at your feet.

Although Napster is described as peer-to-peer networking (in which one node on the network can directly communicate with another without using a central server of some kind), in reality it is only partly peer-to-peer, since it relies on central servers for indexing and searching. This makes it vulnerable to being shut down completely if adjudged illegal. It is as yet unclear if the pending new initiative -- AppleSoup, apparently aimed at video instead of music -- is as vulnerable.

A true peer-to-peer network, however, is almost completely invulnerable, due to the far and wide distribution of both capability and responsibility. Napster, then, is but a pale shadow of what is to come. Welcome to Gnutella.

A publisher's worst nightmare?

Gnutella was originally developed by America Online's Nullsoft as a competing application for downloading MP3 files. It has since been adopted by a loose organization of developers brought together virtually by the GnutellaDev web site.

Gnutella is both a protocol and a software program that implements the protocol. It provides true peer-to-peer networking for those who use it. When you start up your Gnutella client (Macintosh users can use Furi), it announces your presence to all other Gnutella users on your "horizon." As much as you might like access to all the millions of computers on the Internet, that clearly won't scale, so your horizon is limited to about 10,000 other users. However, that group changes over time, so if you are logged on overnight you may "see" as many as 40,000 other users.

When you want to find something, whether it is an MP3 file, a recipe, an article, or just about anything someone wants to share, you send out a query. That query is passed to a few of the closest computers. They in turn each pass it to more computers, and so on, until thousands of computers have received your query. If a computer has a match, it sends back the answer via the same route it traveled to begin with. During the query stage neither the querying computer nor the responding computer know the identity of the other -- important for those wishing to retain their privacy.

When you go to download the file, however, a direct connection is established between the requesting and providing computer. Since no logs or profiles are being kept, privacy is maintained.

Gnutella rising?

My use of the Gnutella client for the Macintosh (Furi) demonstrates both usefulness and room for improvement. As an MP3 downloader, Napster clients are more responsive and easier to use. But since Napster is limited to MP3 files and may soon be shut down by the courts, Gnutella may soon be the only option. The Gnutella response time is slow enough to encourage you to run a search in the background while doing something else, and searching itself is primitive at best.

Nonetheless, given the underlying power of the model, and because this technology is at a very early stage of development (literally only months old), the potential is substantial. Yes, it's less organized than the web, but look how the web exploded.

Are you starting to get the picture? Anyone on the Internet can share anything they want anonymously. There is no company or set of central servers to shut down, and no way to track those who are breaking copyright law. To quote the "What is Gnutella?" document, "It's reliable, it's sharing terabytes of data, and it is absolutely unstoppable." We may, in fact, be seeing the death of enforceable digital copyright.

Implications for libraries

So what might this mean to libraries? First of all, as individuals begin using Gnutella to serve copies of articles, papers, and even books, users may increasingly find it easier to bypass the library entirely to locate information on their own. As we know, they will likely be missing much that we could provide, even within the Gnutella universe given its nearly brain-dead method of searching (by file name, no metadata is associated with the files).

"The folks who developed Gnutella are very sophisticated in their knowledge of networking, but they don't know squat about information retrieval," says Karen Coyle of the California Digital Library. "They need us, even if they don't know it."

Also, there is likely to be another growing universe of information that may not (at least initially) be available through web search engines. To add another wrinkle, individual users join and leave the Gnutella network at will, which suggests a randomly pulsing (growing and shrinking) universe of information. What is there now may not be there in a few minutes, and vice versa.

One way in which libraries may be able to use Gnutella is to write our own client, perhaps to perform digital interlibrary loan tasks for us. Daniel Chudnov, an advocate of open source software (see "Open Source Software: The Future of Library Systems?" LJ 8/99, p. 40-43), has posited just such a use with his paper "docster: instant document delivery." It isn't too difficult to imagine other kinds of "virtual private networks" of trusted library servers to share resources legally -- such as files out of copyright, or those for which copyright payment is managed -- among libraries. Sure, it might slow down network speed, but that's why libraries should consider networking among themselves before installing Gnutella clients for patrons.

Whatever happens, Gnutella represents a powerful new paradigm for network use that bypasses servers and connects individual users in new ways. If we ignore it, we do so at our own peril. If we embrace it, and bend it to our collective will, we have the potential to develop new services, or improve existing ones, in ways in which we may not have imagined until now.


                                        docster: instant document delivery
                                                              Gnutella FAQ


                                                               Napster FAQ