Skip Menu |

This queue is for tickets about the XML-XPathEngine CPAN distribution.

Report information
The Basics
Id: 66371
Status: open
Priority: 0/
Queue: XML-XPathEngine

People
Owner: Nobody in particular
Requestors: JEB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: 0.12
Fixed in: (no value)



Hello, First, love the XPath modules; thank you. I have a script running in a thread that is randomly dying with the following error message: thread failed to start: axis axis_attribute not implemented [Can't locate object method "getAttributes" via package "HTML::TreeBuilder::XPath::Attribute" at /usr/lib/perl5/vendor_perl/5.8.8/XML/XPathEngine/Step.pm line 226. This is with: XPathEngine.pm: $VERSION = '0.12'; Step.pm: # $Id: Step.pm,v 1.35 2001/04/01 16:56:40 matt Exp $ Im not sure what to make of this message; is this my bug, or yours? :) Hoep this helps, many thanks, JEB
On Fri Mar 04 02:55:32 2011, JEB wrote: Show quoted text
> Hello, > > First, love the XPath modules; thank you. > > I have a script running in a thread that is randomly dying with the > following error message: > > thread failed to start: axis axis_attribute not implemented [Can't > locate object method "getAttributes" via package > "HTML::TreeBuilder::XPath::Attribute" at > /usr/lib/perl5/vendor_perl/5.8.8/XML/XPathEngine/Step.pm line 226. > > > This is with: > XPathEngine.pm: $VERSION = '0.12'; > Step.pm: # $Id: Step.pm,v 1.35 2001/04/01 16:56:40 matt Exp $ > > Im not sure what to make of this message; is this my bug, or yours? :) > > Hoep this helps, many thanks,
Trying to call getAttributes on an attribute (HTML::TreeBuilder::XPath::Attribute) is definitely weird. And fails. What is the script, or at least the XPath expression that causes that error? __ mirod
CC: JEB [...] cpan.org
Subject: Re: [rt.cpan.org #66371]
Date: Fri, 04 Mar 2011 16:38:18 +0800
To: bug-XML-XPathEngine [...] rt.cpan.org
From: James Bromberger <james [...] rcpt.to>
On 4/03/2011 4:17 PM, MIROD via RT wrote: Show quoted text
> Trying to call getAttributes on an attribute > (HTML::TreeBuilder::XPath::Attribute) is definitely weird. And fails. > What is the script, or at least the XPath expression that causes that error?
It's a multi-threaded parallel web scraper robot monster. :) The expressions are all pretty straight forward, but there are around 400 I have stacked up to run; I haven't yet determined WHICH XPath on which URL it is that's triggering this as I am processing around 400 pages/min. I'll keep pulling it apart until I can offer a URL and an XPath. I have just put the entire XPath extraction inside a nice safe eval(), so the thread doesn't die() any more! Thanks ever so much for the response. JEB -- Mobile: +61 422 166 708, Email: james_AT_rcpt.to
Subject: Re: [rt.cpan.org #66371]
Date: Mon, 07 Mar 2011 20:35:36 +0800
To: bug-XML-XPathEngine [...] rt.cpan.org
From: James Bromberger <james [...] rcpt.to>
On 4/03/2011 4:38 PM, James Bromberger via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=66371 > > > On 4/03/2011 4:17 PM, MIROD via RT wrote:
>> Trying to call getAttributes on an attribute >> (HTML::TreeBuilder::XPath::Attribute) is definitely weird. And fails. >> What is the script, or at least the XPath expression that causes that error?
> I'll keep pulling it apart until I can offer a URL and an XPath. > > I have just put the entire XPath extraction inside a nice safe eval(), > so the thread doesn't die() any more!
I have seen this fail multiple times on the www.neowin.com web site's content. One such failure was on: http://www.neowin.net/forum/topic/828782-apple-genius-bar-iphones-30-call-drop-is-normal-in-nyc/ And possibly with the XPath (designed to get the image from the body which, if it has a height and width tag, is at least 30/30, and aspect ration between 1:2 or 2:1: (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and (not(@height) or @height>30) and ((not(@width) and not(@height)) or (@width/@height > 0.5 and @height/@width < 2))]/@src)[1] Does that fail for you? Many thanks, James -- Mobile: +61 422 166 708, Email: james_AT_rcpt.to
Subject: Re: [rt.cpan.org #66371]
Date: Mon, 07 Mar 2011 14:53:49 +0100
To: bug-XML-XPathEngine [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
On 03/07/2011 01:36 PM, James Bromberger via RT wrote: Show quoted text
> (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and > (not(@height) or @height>30) and ((not(@width) and not(@height)) or > (@width/@height> 0.5 and @height/@width< 2))]/@src)[1]
Oh, I see. It does fail for me too. The problem is the @width/@height > 0.5, which indeed is resooved using a call to getAttributes on an attribute. I have a patched version, I'll test it later and upload it to CPAN. In the mean time you can avoid the problem by separating the attributes: my $img= $tree->findnodes( q{(//div[@class="KonaBody"]//img[(not(@width) or @width>30) and (not(@height) or @height>30) and ((not(@width) and not(@height)) or (@width > @height * 0.5 and @height < @width * 2))]/@src)[1] }); also,, I believe the last 2 tests are redundant: @width/@height > 0.5 and @height/@width < 2 test exactly the same thing, so (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and (not(@height) or @height>30) and ((not(@width) and not(@height)) or (@width > @height * 0.5 ))]/@src)[1] I also have to investigate why on Earth the parser doesn't like '@width Show quoted text
> @height / 2', which triggers a syntax error on the XPath.
-- michel
Subject: Re: [rt.cpan.org #66371]
Date: Mon, 07 Mar 2011 22:06:06 +0800
To: bug-XML-XPathEngine [...] rt.cpan.org
From: James Bromberger <james [...] rcpt.to>
Hi Michel, Excellent. Many thanks for looking into this. And thank you for the work around. You're right about my path: I was trying to constrain the aspect ratio between 2:1 and 1:2, and obviously inverted both side of the second equation; I feel dumb! :) I'll look out for an update hitting CPAN give that a whirl. Many, many thanks, James On 7/03/2011 9:53 PM, xmltwig@gmail.com via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=66371 > > > On 03/07/2011 01:36 PM, James Bromberger via RT wrote:
>> (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and >> (not(@height) or @height>30) and ((not(@width) and not(@height)) or >> (@width/@height> 0.5 and @height/@width< 2))]/@src)[1]
> Oh, I see. It does fail for me too. > > The problem is the @width/@height > 0.5, which indeed is resooved using > a call to getAttributes on an attribute. > > I have a patched version, I'll test it later and upload it to CPAN. > > In the mean time you can avoid the problem by separating the attributes: > > my $img= $tree->findnodes( q{(//div[@class="KonaBody"]//img[(not(@width) > or @width>30) and > (not(@height) or @height>30) and ((not(@width) and not(@height)) or > (@width > @height * 0.5 and @height < @width * 2))]/@src)[1] > }); > > also,, I believe the last 2 tests are redundant: @width/@height > 0.5 > and @height/@width < 2 test exactly the same thing, so > > > (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and > (not(@height) or @height>30) and ((not(@width) and not(@height)) or > (@width > @height * 0.5 ))]/@src)[1] > > I also have to investigate why on Earth the parser doesn't like '@width
> > @height / 2', which triggers a syntax error on the XPath.
>
-- Mobile: +61 422 166 708, Email: james_AT_rcpt.to
Subject: xpath library bug
From: Jiri Palecek
Dne Po 07.bře.2011 07:36:00, james@rcpt.to napsal(a): Show quoted text
> On 4/03/2011 4:38 PM, James Bromberger via RT wrote:
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=66371 > > > > > On 4/03/2011 4:17 PM, MIROD via RT wrote:
> >> Trying to call getAttributes on an attribute > >> (HTML::TreeBuilder::XPath::Attribute) is definitely weird. And
> fails.
> >> What is the script, or at least the XPath expression that causes
> that error?
> > I'll keep pulling it apart until I can offer a URL and an XPath. > > > > I have just put the entire XPath extraction inside a nice safe
> eval(),
> > so the thread doesn't die() any more!
> I have seen this fail multiple times on the www.neowin.com web site's > content. One such failure was on: > > http://www.neowin.net/forum/topic/828782-apple-genius-bar-iphones-30- > call-drop-is-normal-in-nyc/ > > And possibly with the XPath (designed to get the image from the body > which, if it has a height and width tag, is at least 30/30, and aspect > ration between 1:2 or 2:1: > > (//div[@class="KonaBody"]//img[(not(@width) or @width>30) and > (not(@height) or @height>30) and ((not(@width) and not(@height)) or > (@width/@height > 0.5 and @height/@width < 2))]/@src)[1]
Actually, that expression is wrong. Not syntactically, but @width/ @height is indeed a LocationPath, not a division. For division, always use 'div', eg. @width div @height Regards Jiri Palecek