[XML4Lib] mods: the new marc?

Deridder, Jody L rde2 at utk.edu
Mon Dec 17 10:16:32 EST 2007

Hi Eric --
  Have you looked at the new ORE docs?

  I know that the move is toward FRBRization of metadata, and revamping
LCSH (also used in MODS, often) for faceted subject headings -- and that
we need to consider usability across arenas (learning objects vs.
libraries, for example).

 As you know, MODS was developed from MARC but still has many of the
same drawbacks.  I'm wondering if the best solution at present isn't
some combination scheme in RDF format.


-----Original Message-----
From: xml4lib-bounces at webjunction.org
[mailto:xml4lib-bounces at webjunction.org] On Behalf Of Eric Lease Morgan
Sent: Sunday, December 16, 2007 4:34 PM
To: xml4lib
Subject: [XML4Lib] mods: the new marc?

Is MODS the new MARC?

As you may or may not know, I advocate "catalogs" include content  
beyond the things a library owns or licenses. Moreover, I advocate  
libraries take a more active role in collecting and providing  
services against information resources no matter where they reside on  
a network. Don't get me wrong, I don't advocating "cataloging" the  
entire Internet, but I do advocate actively collecting materials  
apropos to the needs of a particular library's patrons.

In an effort to demonstrate such an idea I would like to collect and  
provide services against a number of different types of data/ 
information freely available on the 'Net. Some of these things  
include but are not to the following listed in no priority order:  
electronic books/texts (Project Gutenberg, University of Michigan  
MBooks, Open Content Alliance, etc.), electronic journals from DOAJ,  
electronic journal articles from DOAJ Articles, pre-prints and post- 
prints from various OAI repositories, mailing list messages, selected  
blog postings, theses & dissertations from NDLTD, etc.

Each of the things above can be systematically harvested through the  
use of OAI, simple Web crawling, or the retrieval of data sets. Once  
harvested the data could be stored in a database and/or indexed  
providing the means for discovery and services. The storage of this  
content in a database begs questions regarding tables, records, and  
fields. What might they be? Similarly, unless the index is going to  
be 100% free text, the harvest content/metadata will need to mapped  
to fields. Again, what fields?

I'm not so naive to believe there is such a thing a the perfect  
database structure for this "catalog", nor do I believe free text  
indexing is the answer either. So, what sort of data structure should  
I use? Not MARC. MODS? Some incarnation of RDF?

If I go this route I see the following plan:

   0. Articulate a collection policy.
   1. Acquire/harvest content in its raw form.
   2. Convert the raw content into MODS, RDF, or
      something else.
   3. Save/archive the raw data because things get lost
      in translation.
   4. Save the MODS or RDF to a (XML) database.
   5. Parse the MODS or RDF and save it to a
      (relational) database.
   6. Run scripts against the database to create things
      like browsable lists, create new relationships
      between items, or simply enhanced.
   7. Index the MODS or RDF, or write a report against
      the database intended for indexing.
   8. Provide access to the index (via SRU, OpenSearch,
      or Z39.50).
   9. Provide services against the search results such
      as Get It, Review It, Buy It, Bookmark It, Compare It
      To Other things, etc.
  10. Got to Step #1.

Assuming there is no single database structure for such a idea, what  
flavor of XML would you use as your canonical data format? MODS? RDF?  
Something else?

Eric Lease Morgan
University Libraries of Notre Dame

XML4Lib mailing list
XML4Lib at webjunction.org

More information about the XML4Lib mailing list