Bug #32767 for HTML-TreeBuilder-XPath: Implement findvalue for

Tue Jan 29 04:23:39 2008 SREZIC [...] cpan.org - Ticket created

Subject:

Implement findvalue for

It would be nice if HTML::TreeBuilder::XPath::Attribute would support the findvalue method, at least findvalue(".") which is the same as the getValue method. And while at it, supporting some other DOM-like methods would also be nice. E.g. "nodeName" as an alias for HTML::TreeBuilder::XPath::Node::tag, textContent as something like join "", grep { is a text node } $node->content_list getAttribute($attr) as an alias for attr($attr) Regards, Slaven

Tue Jan 29 06:53:51 2008 xmltwig [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #32767] Implement findvalue for
Date:	Tue, 29 Jan 2008 12:53:26 +0100
To:	bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From:	mirod <xmltwig [...] gmail.com>

Slaven_Rezic via RT wrote: Show quoted text

> Tue Jan 29 04:23:39 2008: Request 32767 was acted upon. > Transaction: Ticket created by SREZIC > Queue: HTML-TreeBuilder-XPath > Subject: Implement findvalue for > Broken in: 0.09 > Severity: Wishlist > Owner: Nobody > Requestors: SREZIC@cpan.org > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=32767 > > > > It would be nice if HTML::TreeBuilder::XPath::Attribute would support > the findvalue method, at least findvalue(".") which is the same as the > getValue method. > > And while at it, supporting some other DOM-like methods would also be > nice. E.g. "nodeName" as an alias for > HTML::TreeBuilder::XPath::Node::tag, textContent as something like > join "", grep { is a text node } $node->content_list > getAttribute($attr) as an alias for attr($attr)

I will look into adding findvalue to HTML::TreeBuilder::XPath::Attribute. I think it also makes sense to add a method that returns a list of strings, each string being the value of a hit. Beyond that, I don't really see any good reason to add DOM methods to HTML::TreeBuilder::XPath. These would belong in HTML::TreeBuilder itself. And if you want to use the DOM on HTML, then XML::LibXML can do that for you. HTML::TreeBuilder::XPath is meant to add a way to apply XPath queries to code that uses HTML::TreeBuilder, nothing more. Does it make sense? -- mirod

Tue Jan 29 06:53:54 2008 The RT System itself - Status changed from 'new' to 'open'

Tue Jan 29 15:06:11 2008 slaven [...] rezic.de - Correspondence added

Subject:	Re: [rt.cpan.org #32767] Implement findvalue for
Date:	29 Jan 2008 21:02:27 +0100
To:	bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From:	Slaven Rezic <slaven [...] rezic.de>

"xmltwig@gmail.com via RT" <bug-HTML-TreeBuilder-XPath@rt.cpan.org> writes: Show quoted text

> <URL: http://rt.cpan.org/Ticket/Display.html?id=32767 > > > Slaven_Rezic via RT wrote:

> > Tue Jan 29 04:23:39 2008: Request 32767 was acted upon. > > Transaction: Ticket created by SREZIC > > Queue: HTML-TreeBuilder-XPath > > Subject: Implement findvalue for > > Broken in: 0.09 > > Severity: Wishlist > > Owner: Nobody > > Requestors: SREZIC@cpan.org > > Status: new > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=32767 > > > > > > > It would be nice if HTML::TreeBuilder::XPath::Attribute would support > > the findvalue method, at least findvalue(".") which is the same as the > > getValue method. > > > > And while at it, supporting some other DOM-like methods would also be > > nice. E.g. "nodeName" as an alias for > > HTML::TreeBuilder::XPath::Node::tag, textContent as something like > > join "", grep { is a text node } $node->content_list > > getAttribute($attr) as an alias for attr($attr)

> > I will look into adding findvalue to HTML::TreeBuilder::XPath::Attribute. > > I think it also makes sense to add a method that returns a list of > strings, each string being the value of a hit. > > Beyond that, I don't really see any good reason to add DOM methods to > HTML::TreeBuilder::XPath. These would belong in HTML::TreeBuilder > itself. And if you want to use the DOM on HTML, then XML::LibXML can do > that for you. HTML::TreeBuilder::XPath is meant to add a way to apply > XPath queries to code that uses HTML::TreeBuilder, nothing more. >

In my case there was the need to process both HTML and XHTML files XPath constructs. I know, XML::LibXML is supposed to parse also HTML and may be set into a mode for parsing invalid XML, but in real life this does not work. So I need both, XML::LibXML and HTML::TreeBuilder::XPath, and I would like to keep the number of conditionals as low as possible. Regards, Slaven -- Slaven Rezic - slaven <at> rezic <dot> de Tk-AppMaster: a perl/Tk module launcher designed for handhelds http://tk-appmaster.sf.net

Wed Jan 30 06:31:43 2008 xmltwig [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #32767] Implement findvalue for
Date:	Wed, 30 Jan 2008 12:31:15 +0100
To:	bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From:	mirod <xmltwig [...] gmail.com>

slaven@rezic.de via RT wrote: Show quoted text

> In my case there was the need to process both HTML and XHTML files > XPath constructs. I know, XML::LibXML is supposed to parse also HTML > and may be set into a mode for parsing invalid XML, but in real life > this does not work. So I need both, XML::LibXML and > HTML::TreeBuilder::XPath, and I would like to keep the number of > conditionals as low as possible.

In that case, why don't you load the HTML using HTML::TreeBuilder, export it as XHTML using as_xml and then process it using XML::LibXML? The only problem I can see is that there are a few cases where the XML generated by HTML::TreeBuilder is not well-formed, but those are relatively rare. Wouldn't that solve your problem? -- mirod