[XML4LIB] Re: Extracting data from an XML file

Morbus Iff morbus at disobey.com
Mon Jan 5 23:09:13 EST 2004


>Fourth, I tried both of these approaches plus my own, and timed them. I had
>to process 1.5 MB of data in nineteen files. Tiny. Ironically, my original
>code was the fastest at 96 seconds. The XSLT implementation came in second
>at 101 seconds, and the XML::Twig implementation, while straight-forward
>came in last as 141 seconds. (See the attached code snippets.)

Since doesn't seem right to me at all - I parse a 2 to 3 meg file in less
than 10 seconds (easily) using a straight XML::Parser (though it's
considered deprecated). Here's a snippet of that code:

  use XML::Parser;

  # get an new instance of the parser,
  # register handlers, then parse.
  my $p = new XML::Parser;
  $p->setHandlers( Start => \&_start );
  $p->parsefile( get_setting("files_services_channels") );

  # creates the
  # data struct.
  sub _start {
     my ($p, $tag_name, %attrs) = @_;

     while ( my ($key,$value) = each(%attrs) ) {

     return 1;
  }

I don't have your original message on me, but I suspect the thing that's
taking most of your time is loading the whole file into memory - do you
absolutely need to do that? My impression (based on my fading memory of
what you originally posted) was that you didn't need to. In which case, you
want a streaming parser (the above is an example) or something like XPath.

-- 
Morbus Iff ( rootle-dee-tootle-dee-toot! )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus


More information about the xml4lib mailing list