Topic: Parsing XML

I've pasted below a short program I have written for parsing XML into a tree structure in Ruby.

ATTRIBUTE = /\s\w+=('\w*'|"\w*")/
ELEMENT = /<\/?\w+#{ATTRIBUTE}*\s*\/?>/

OPEN = 1
CLOSE = 2
SOLE = 3

def recurse(list, xml, depth=0)
  tag, xml      = next_tag(xml)
  if tag == nil
    return list
  else
    tagname, type = parse_tag(tag)
    if type == CLOSE
      recurse( list, xml, depth-1 )
    else
      list = push_at_depth(list, depth, tagname)
      if type == OPEN
        recurse( list, xml, depth+1)
      else #ie. type == SOLE
        recurse( list, xml, depth)
      end
    end
  end
end

def push_at_depth(list, depth, tagname)
  if depth == 0
    list.push(
        { :name => tagname, :elements => [] } )
  elsif depth == 1
    list[-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 2
    list[-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 3
    list[-1][:elements][-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 4
    list[-1][:elements][-1][:elements][-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  end
  list
end

def next_tag(xml)
  if xml =~ ELEMENT
    return [$&, $']
  else
    return nil
  end
end

def parse_tag(tag)
  type       = tag_type(tag)
  match_word = /(\w|\d)+/
  if tag =~ match_word
    return [$&, type]
  else
    return [nil, type]
  end
end

def tag_type(tag)
  slash_pos = tag.index("/")
  if slash_pos == nil
    type = OPEN
  elsif slash_pos == 1
    type = CLOSE
  else
    type = SOLE
  end
end


I've tested it with the following code, and it works as intended:

require "test/unit"

class TestXmlTree < Test::Unit::TestCase
 
  def setup
    @xml = "<A1/><A2><B1></B1><B2></B2><B3></B3></A2><A3><B4><C1></C1><C2></C2><C3></C3></B4><B5><C4></C4><C5></C5></B5></A3>"
    @tree = [
      { :elements => [], :name => "A1" },
      { :elements => [
          { :elements => [], :name => "B1" },
          { :elements => [], :name => "B2" },
          { :elements => [], :name => "B3" }
        ],  :name => "A2" },
      { :elements => [
          { :elements => [
              { :elements => [], :name => "C1" },
              { :elements => [], :name => "C2" },
              { :elements => [], :name => "C3" }
            ], :name => "B4"},
          { :elements => [
              { :elements => [], :name => "C4" },
              { :elements => [], :name => "C5" }
            ], :name => "B5" }
        ],  :name => "A3"}
    ]
   
  end
 
  def test_recurse
    xml = @xml
    tree = recurse([], xml)
    assert_equal(@tree, tree)
  end


First off, I suppose there are plenty of ready-rolled solutions out there that do a better job of this. The reason I went to the effort was that I was off-line over the weekend, and wanted to get something up and working. In any case, I figured, it would be good practice.

The one part of the program that I'm not happy with is the method called push_at_depth:

def push_at_depth(list, depth, tagname)
  if depth == 0
    list.push(
        { :name => tagname, :elements => [] } )
  elsif depth == 1
    list[-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 2
    list[-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 3
    list[-1][:elements][-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  elsif depth == 4
    list[-1][:elements][-1][:elements][-1][:elements][-1][:elements].push(
        { :name => tagname, :elements => [] } )
  end
  list
end

There must be a better way of doing this, but I can't figure it out.
In my pidgin-Ruby pseudo-code, this is how I imagine it would work:

def push_at_depth(list, depth, tagname)
    level = "[-1][:elements]" * depth
    list#{level}.push( { :name => tagname, :elements => [] } )
end

I'd be really grateful if someone could point me in the right direction with this.

Thanks.

Re: Parsing XML

http://www.xml.com/pub/a/2006/01/04/cre … ilder.html

Re: Parsing XML

Thanks for the link. I'm sure it will come in handy some day. The tutorial shows how to build an XML file, by passing objects such as hashes from Ruby. The problem I was tackling was to go the other way: reading an XML file, and turning it into Ruby objects.

My question was whether there is a neater way of doing something like:

list[-1][:elements][-1][:elements][-1][:elements][-1][:elements].push( { :name => tagname, :elements => [] } )

Rather than repeating [-1][:elements] again and again, I'd like to know if I can just say something like:

level = "[-1][:elements]" * depth
list#{level}.push( { :name => tagname, :elements => [] } )

I suspect this is some kind of 'reflection' or 'introspection', but can't figure it out.

Anyone?

Thanks.

Re: Parsing XML

Use REXML to parse XML.
http://www.xml.com/pub/a/2007/01/17/mak … ilder.html