Topic: Parsing URLs

def parse_urls(text)
URI::extract(text).uniq.each{|uri| text.gsub!(uri, "<a href='#{uri}'>#{uri}</a>")}
    return text
end

This is what I'm currently using and it works well. I just want to extend the parsing ability so that the following works too

parse_urls("www.link.com")
>> <a href="http://www.link.com">www.link.com</a>
parse_urls("link.com")
>> <a href="http://www.link.com">link.com</a> # same for .net, .org, etc

Any ideas?

Re: Parsing URLs

Well it wasn't as easy as I thought it would be. But thanks to some of the good fellas on #merb, I got the desired result.

If you want this parser in your app, just paste this at the bottom of your environment.rb

ActionView::Helpers::TextHelper::AUTO_LINK_RE = %r{
  (
    <a\s.*?>.*??|               # Opening <a> tag.. and any other text including html tags which might be before a url
    [^\w]|                      # or, first char before url
    ^                           # or, start of line
  )
  (
    (?:https?://)?              # optional protocol
    (?:[-\w]+\.)+               # subdomain/domain parts
    (?:com|net|org|[a-z][a-z]|edu|gov|biz|int|mil|info|name|museum|coop|aero) # TLD
    (?::\d+)?                   # Optional port
    (?:/(?:(?:[~\w\+@%=-]|(?:[,.;:][^\s$]))+)?)*     # Path
    (?:\?[\w\+@%&=.;-]+)?       # Query String ?foo=bar
    (?:\#[\w\-]*)?              # Anchor
  )
  (
    (?:[^\w]|$)                 # Trailing Character
  )
}xi

ActionView::Helpers::TextHelper.module_eval do
  def auto_link_urls(text, href_options = {})
    extra_options = tag_options(href_options.stringify_keys) || ""
    text.gsub(ActionView::Helpers::TextHelper::AUTO_LINK_RE) do
      all, leading, url, trailing = $&, $1, $2, $3
      if leading =~ /<a\s/i # don't replace URL's that are already linked
        all
      else
        text = block_given? ? yield(url) : url
        url = 'http://' + url unless url =~ /^https?:\/\//
        %(#{leading}<a href="#{url}"#{extra_options}>#{text}</a>#{trailing})
      end
    end
  end
end


Gotta love those merb dudes!

Re: Parsing URLs

NIce work! This is working great.

Re: Parsing URLs

Glad you found it useful smile

Post here if you find any bugs.

Re: Parsing URLs

The only thing I've noticed so far is a warning:

/config/environment.rb:151: warning: already initialized constant AUTO_LINK_RE

Haven't done too much investigating but I'm curious if anyone else sees it..

Also I'm curious why this goes in environment.rb.. couldn't it also live in the application helper?

Re: Parsing URLs

Yeah thats cause this is overwriting an existing activesupport method. If you change the name of the constant and method it should work in a helper.

Re: Parsing URLs

I'm having another small issue with this.. if I use it with a URL like:

then everything gets linked except the /profile&id=1135440

Is there any way to modify it to take into account urls like that?  FWIW the built in rails auto_link has the same issue.

Re: Parsing URLs

Good question.. been poking with regex and so far no luck.

May need to hit IRC on this one

Re: Parsing URLs

(?:\?[\w\+@%&=.;-\/]+)?       # Query String ?foo=bar

maybe? Sorry, can't get the [color] tag working - it's the extra '\/' inside the character class...

Re: Parsing URLs

specious wrote:

(?:\?[\w\+@%&=.;-\/]+)?       # Query String ?foo=bar

maybe? Sorry, can't get the [color] tag working - it's the extra '\/' inside the character class...

You were close.. this is what it was:

(?:\?[\w\+@%&=.;/-]+)?       # Query String ?foo=bar

Last edited by viniosity (2008-06-30 23:16:30)

Re: Parsing URLs

This post saved me hours!

Re: Parsing URLs

Hi guys, I'm Mannie, a developer on the BioCatalogue project.  This post is awesome, it fixed a lot of the issues we were having with auto_link.  The fix did however introduce some new bugs.

1: auto_link do not properly create mailto links.

support@genouest.org

for example, should generate

mailto:support@genouest.org

; instead, I get

http://genouest.org/

2:

http://xml.ddbj.nig.ac.jp/wabi/Method?serviceName=Blast&mode=methodList&lang=en

parses perfectly, but does not render correctly in the UI. 

http://xml.ddbj.nig.ac.jp/wabi/Method?serviceName=Blast&mode=methodList〈=en

is what I get on screen.

Any ideas?

Last edited by mannie (2010-07-06 10:24:09)