Topic: Parser question

My users type strings like....

'car:3, house:1, cat:9, gonzo:2'

I need to extract the following information...

[['car',3], ['house',1], ['cat',9], ['gonzo',2]]

What's the shortest way to achieve this, given that all input string must be stripped, too ?

Re: Parser question

This is what I came up with:

input = 'car:3, house:1, cat:9, gonzo:2'

p input.gsub!(/\s/, '').split(',').inject([]) { |arr, val| arr << val.split(':') }

Tell me if there's anything you don't understand. smile

Re: Parser question

Here's a shorter way:

input = 'car:3, house:1, cat:9, gonzo:2'
input.split(/,\s*/).map { |s| s.split(':') }

Railscasts - Free Ruby on Rails Screencasts

Re: Parser question

Thx for the answers. I appreciate them as I'm learning Ruby and regular expressions.

Both solutions work, but the parser must be a little bit more robust, so the following string must be parseable as well:

"   car  :  2, :3, apple::1, banana"
=> [['car',2],['',3],['apple:',1],['banana',1]

The level is the number between the rightmost colon and the separation token (or the line end).
If there's no level (the number right after the colon) given then the level must default to 1.
If there's no item then the item must default to ''.

Re: Parser question

I think this is a good exercise for you. Try to do this on your own and post the code.

To give you a hint, you can keep the first part of my code example the same, just expand the block and try to parse the result the way you want:

input.split(/,\s*/).map do |s|
  # handle exceptions here and return the result...

I recommend taking each exception one at a time and trying to make it "pass". If you are familiar with test driven development, this is a good exercise for that too. Here's a list of your exceptions from easiest to hardest:

1. spaces around colon
2. entry without a colon and defaulting to 1
3. always the rightmost colon

Post your code here if you get stuck.

Railscasts - Free Ruby on Rails Screencasts

Re: Parser question

I've done it in a procedural way. Not very elegant, but it works.

parsed_skills = []
skill_str.split(',').each do |skilldescr|
    pos = skilldescr.rindex(':')
    if pos.nil?
        skill = skilldescr.strip
        level = 1
        skill = skilldescr[0..pos-1].strip
        level = skilldescr[pos+1..skilldescr.length].strip.to_i
    level = (level <= 0 or level > 5) ? 1 : level
    parsed_skills << [skill, level]

Re: Parser question

Good job. Now if you want to try to improve it (it's a good exercise) here's a few hints:

Try to fit the "split" method in there. You are basically repeating the logic of split with just a few differences.

you can use a regular expression to remove the surrounding white space (like I did with the comma separation above)

split allows you to pass a second parameter allowing you to specify a limit of split, but this groups together the last chunk of splits - there is a way to reverse this behavior though

Railscasts - Free Ruby on Rails Screencasts

Re: Parser question

I found the following solution, which makes more use of regular expression, but IMHO is less efficient due to the (unnecessary) reverse statements.

skill_str.split(/\s*,\s*/).collect { |skilldescr|
    skilltuple = skilldescr.reverse.split(/\s*:\s*/, 2)
    if skilltuple.length > 1
        level = 1
        skill = skilltuple[0].reverse
        level = skilltuple[0].reverse.to_i
        level = 1 unless 1..5.include?(level)
        skill = skilltuple[1].reverse
    [level, skill]

Re: Parser question

Oy, and it's not any cleaner either. Well for some reason that looked a lot better in my mind than in the code. Oh well, sorry to lead you astray.

Railscasts - Free Ruby on Rails Screencasts