Topic: Mechanize not recognizing my anchor tags when using CSS selector
Has anyone else had problems with Mechanize not recognizing anchor tags via CSS selectors?
The HTML in question looks like this mess (snippet with white space removed for clarity):
<td class='calendarCell' align='left'> <a href="http://www.mysite.org/index.php/site/ActivitiesCalendar/2010/02/10/">10</a> <p style="margin-bottom:15px; line-height:14px; text-align:left;"> <span class="sidenavHeadType"> Current Events</span><br /> <b><a href="http://www.mysite.org/index.php/site/ Clubs/banks_and_the_fed" class="a2">Banks and the Fed</a></b> <br /> 10:30am- 11:45am </p>
I'm trying to collect the data from these events. Everything is working except getting the anchor within the <p>. There's clearly an <a> tag inside the <b>, and I'm going to need to follow that link to get further details on this event.
In my rake task, I have:
agent.page.search(".calendarCell,.calendarToday").each do |item| day = item.at("a").text item.search("p").each do |e| anchor = e.at("a") puts anchor puts e.inner_html end end
What's interesting is that the item.at("a") always returns the anchor. But the e.at("a") returns nil. And when I do inner_html on the p element, it ignores the anchor entirely. Example output:
nil <span class="sidenavHeadType"> Photo Club</span><br><b>Indexing Slide Collections</b> <br> 2:00pm- 3:00pm
However, when I run the same scrape directly with Nokogiri:
doc.css(".calendarCell,.calendarToday").each do |item| day = item.at_css("a").text item.css("p").each do |e| link = e.at_css("a")[:href] puts e.inner_html end end
It recognizes the <a> inside the <p>, and it will return the href, etc.
<span class="sidenavHeadType"> Bridge Party</span><br><b><a href="http://www.mysite.org/index.php/site/Clubs/party_bridge_51209" class="a2">Party Bridge</a></b> <br> 7:00pm- 9:00pm
Mechanize is supposed to use Nokogiri, so I'm wondering if I have a bad version or if this affects others as well.
Thanks for any leads.