Topic: Regex question

Hello everyone,

I have a conundrum which needs a regex to solve. Whenever I am in this situation I hit the tutorial and instruction sites because my regex-fu is weak but this one I cannot seem to solve.

I have some conditional code on my site. I need all top-level pages to display something while non top-level pages do not. Since this code is in a template I need to determine if the page is top-level or not using the URL.

Here is the URL pattern

http://example.com/top-level
http://example.com/top-level/second-level
http://example.com/top-level/second-level/third-level

So what I need is a regex that matches only the first of those URLs.

Any help would be much appreciated since I have spent an hour reading up on regexes and trying to solve this and my brain cannot cope. I am sure the answer will be insultingly easy. (Something like ^http:\/\/.*\/{1}SOMETHINGHERE$ I guess)

Regards

RJ

Re: Regex question

here is another approach, just split the string and check the length?

if request.url.split('/').length == 1 then ... # top-level

Re: Regex question

I cannot do that as I am constrained to using pre-made code. It is Regex or nothing.

Re: Regex question

Here, try using this to find out the regexp you need. Rubular is a regular expression editor for ruby, type in a string and a regexp and it will show you if it matches or not, and what matches.

- Ben

Re: Regex question

RJ

How generic does this have to be? Do your URLs have parameters? Or do you just want something that works for the above and similar URLs? How about

^https?:\/(?:\/[\w.-]+){2}$

Re: Regex question

specious wrote:

RJ

How generic does this have to be? Do your URLs have parameters? Or do you just want something that works for the above and similar URLs? How about

^https?:\/(?:\/[\w.-]+){2}$

Specious I love you, that is all I needed! No parameters or anything.

Regexs are obviously not my strong suit so I am trying to get my head around that one. I would also like to modify it so that it works with an IP Address with a port instead of a domain name.

http://10.0.0.0:85/top-level
http://10.0.0.0:85/top-level/second-level
http://10.0.0.0:85/top-level/second-level/third-level

I don't suppose your regex-fu could solve this one? In the meantime I am going to try and decipher the original one.

Re: Regex question

Regexes are all about patterns. You've got to look for the patterns first, then decide how you are going to match them. In this instance, we aren't in the business of validating the URL; by the time it's got to your view, the URL must be valid or it wouldn't have got that far. So it can be made a lot simpler. In this case the pattern is protocol:/ (/stuff) x 2

In the original, we start by telling it to match the whole string, so we put the anchors ^ and $ at the beginning and the end respectively.

^          # start of string
http       # literal 'http'
s?         # literal 's', zero or one times
:          # literal :
\/         # literal /, escaped
(?:        # start non-capturing bracket
\/         # literal /, escaped
[\w.-]     # character class; \w is shorthand for 'word character'
+          # anything in the character class, one or more times
)          # close non-capturing bracket
{2}        # everything in the bracket, twice
$          # end of string

Because we don't have to validate the URL, just match it, we can just add \d (numbers 0-9) and : to the character class.

^https?:\/(?:\/[\d\w.\:-]+){2}

As previously noted, this will match all kinds of crap like https://:x33:.124:abc/123:stuff, but by the time it gets to your view it will already be a valid URL, so we don't care.

BTW, this will also match http://my.domain.com:8080/stuff which is probably what you want...

HTH

Re: Regex question

Thank you for that excellent explanation and some good insight. When I initially tried to do this myself I was falling into the trap you mentioned in that I was trying to model the URL too specifically.