[XML4Lib] OAIster reaches 10 million records

Kent Fitch kfitch at nla.gov.au
Fri Jan 26 00:30:27 EST 2007


Hi Roy,

I think the problem with Google Scholar in this instance is the number
of hits of "roma" which have a different meaning (place name, person
name) and with about 9650 hits, the ranking is likely to be less "good"
for any specific meaning of "roma" than the ranking in oaister, which
for this example has two orders of magnitude fewer hits.  

The "correct" record: 

Blank Pages of the Holocaust: Gypsies in Yugoslavia During World War II;
Blank Pages of the Holocaust: Gypsies in Yugoslavia During World War II

seems to be being ranked highest in this instance due to luck or error -
the title seems erroneously repeated in the metadata, giving 2 hits on
"world war" (added to the 3 references to "world war" in the note),
because "relevant to this meaning of roma" records are next shown as hit
50, then 69 and the final record, 89 ("Annihilation through labor": the
killing of state prisoners in the Third Reich)

That is, this example seems to be a bit of a "straw man", especially as
scholars would probably want to see the _dozens_ of relevant hits in the
first 89 google scholar results.

Whilst not a google-fan-boy, but I do believe in the benefits of
aggregation of supply, and I think Google and Google Scholar just
aggregate more!

Regards,

Kent Fitch
> -----Original Message-----
> From: xml4lib-bounces at webjunction.org 
> [mailto:xml4lib-bounces at webjunction.org] On Behalf Of Roy Tennant
> Sent: Friday, 26 January 2007 11:40 AM
> To: xml4lib
> Subject: Re: [XML4Lib] OAIster reaches 10 million records
> 
> No to belabor the point on this list, but interesting enough, 
> using your search strategy below Google "proper" is indeed 
> much better, but Google Scholar is still questionable at 
> best, with articles that appear to mention the subject in passing.
> Roy
> 
> 
> On 1/25/07 3:46 PM, "Kent Fitch" <kfitch at nla.gov.au> wrote:
> 
> > If you try searching Google using its syntax for 
> designating an exact
> > match:
> > 
> > +roma "world war"
> > 
> > it nolonger "fails miserably".
> > 
> > Bernie Sloan on the NGC4LIB mailing list recently quoted from "The 
> > library of Google"
> > http://www.prospect-magazine.co.uk/article_details.php?id=8215 in 
> > Prospect Magazine:
> > 
> > "Researchers always need to be reminded not to put too much 
> trust in 
> > the materials that happen to lie within easy reach, but the risk of 
> > distortion will be much greater if they confine their 
> investigations 
> > to a shelf of pre-selected books in a library rather than exposing 
> > themselves to the awe-inspiring quantities of treasure mixed with 
> > dross that Google spreads before them...Google may be creating new 
> > problems for scholars, but it offers new solutions too, and 
> no one can 
> > play around with Book Search for more than a few minutes without 
> > stumbling into intellectual conflict zones that will wake them from 
> > the dogmatic doze that might have overwhelmed them in a 
> well-regulated library."
> > 
> > Regards,
> > 
> > Kent Fitch
> > 
> >> -----Original Message-----
> >> From: xml4lib-bounces at webjunction.org 
> >> [mailto:xml4lib-bounces at webjunction.org] On Behalf Of Perry Willett
> >> Sent: Friday, 26 January 2007 4:56 AM
> >> To: 'xml4lib'
> >> Subject: [XML4Lib] OAIster reaches 10 million records
> >> 
> >> ANN ARBOR, Mich. - OAIster Reaches 10 Million Records.
> >> <http://www.oaister.org/>
> >>  
> >> We live in an information-driven world-- one in which 
> access to good 
> >> information defines success. OAIster's growth to 10 
> million records 
> >> takes us one step closer to that goal.
> >> 
> >> Developed at the University of Michigan's Library, OAIster is a 
> >> collection of digital scholarly resources. OAIster is also 
> a service 
> >> that continually gathers these digital resources to remain 
> complete 
> >> and fresh. As global digital repositories grow, so do OAIster's 
> >> holdings.
> >> 
> >> Popular search engines don't have the holdings OAIster does.
> >> They crawl web pages and index the words on those pages. It's an 
> >> outstanding technique for fast, broad information from public 
> >> websites. But scholarly information, the kind researchers use to 
> >> enrich their work, is generally hidden from these search engines.
> >> 
> >> OAIster retrieves these otherwise elusive resources by tapping 
> >> directly into the collections of a variety of institutions using 
> >> harvesting technology based on the Open Archives Initiative (OAI) 
> >> Protocol for Metadata Harvesting.
> >> These can be images, academic papers, movies and audio files, 
> >> technical reports, books, as well as preprints (unpublished works 
> >> that have not yet been peer reviewed).
> >> By aggregating these resources, OAIster makes it possible 
> to search 
> >> across all of them and return the results of a thorough 
> investigation 
> >> of complete, up-to-date resources.
> >> 
> >> Ann Devenish, Publication Services Project Manager at Woods Hole 
> >> Oceanographic Institute, notes that "Harvesting by OAIster is a 
> >> primary 'selling point' when we talk to scientists and researchers 
> >> about the visibility, accessibility, and impact of their 
> >> contributions in an institutional repository. From their own 
> >> experiences they know that a search using one of the 
> popular search 
> >> engines can bring back thousands (if not, millions) of 
> results which 
> >> will require careful and time-consuming screening, with no 
> guarantee 
> >> that they will ever get to the content they seek. A search of 
> >> OAIster, across hundreds of open and scholarly archives 
> and millions 
> >> of records, brings back results with the key metadata 
> elements that 
> >> allow for quick identification of, and easy navigation to, the 
> >> content they seek."
> >> 
> >> OAIster is good news for the digital archives that contribute 
> >> material to open-access repositories. "[OAIster has demonstrated 
> >> that]...OAI interoperability can scale. This is good news for the 
> >> technology, since the proliferation is bound to continue and even 
> >> accelerate," says Peter Suber, author of the SPARC Open Access 
> >> Newsletter. As open-access repositories proliferate, they will be 
> >> supported by a single, well-managed, comprehensive, and 
> useful tool.
> >> 
> >> Scholars will find that searching in OAIster can provide better 
> >> results than searching in web search engines. Roy Tennant, User 
> >> Services Architect at the California Digital Library, offers an
> >> example: "In OAIster I searched 'roma' and 'world war,' 
> then sorted 
> >> by weighted relevance. The first hit nailed my topic-- the 
> >> persecution of the Roma in World War II. Trying 'roma world war'
> >> in Google fails miserably because Google apparently searches 'Rome'
> >> as well as 'Roma.' The ranking then makes anything about the Roma 
> >> people drop significantly, and there is nothing in the first few 
> >> screens of results that includes the word in the title, unlike the 
> >> OAIster hit."
> >> 
> >> OAIster currently harvests 730 repositories from 49 countries on 6 
> >> continents. In three years, it has more than quadrupled in 
> size and 
> >> increased from 6.2 million to 10 million in the past year. 
> OAIster is 
> >> a project of the University of Michigan Digital Library Production 
> >> Service.
> >> 
> >> For more information about University of Michigan's 
> OAIster Project, 
> >> visit http://www.oaister.org/, or contact Kat Hagedorn at 
> >> khage at umich.edu.
> >> ------------------
> >> 
> >> Perry Willett
> >> Head, Digital Library Production Service 300 Hatcher North 
> University 
> >> of Michigan Ann Arbor MI 48109-1205
> >> Ph: 734-764-8074
> >> Fax: 734-647-6897
> >> Email: pwillett at umich.edu
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> XML4Lib mailing list
> >> XML4Lib at webjunction.org
> >> http://lists.webjunction.org/mailman/listinfo/xml4lib
> >> 
> > _______________________________________________
> > XML4Lib mailing list
> > XML4Lib at webjunction.org
> > http://lists.webjunction.org/mailman/listinfo/xml4lib
> 
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
> 


More information about the XML4Lib mailing list