Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 29805
Status: rejected
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: jmason [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.23
Fixed in: (no value)



Subject: as_text replaces /<br>/ with "", but a whitespace char would be better
hi -- An as_text() call against Sat 06/10/07<br>20:00 should produce something like 'Sat 06/10/07 20:00' (or maybe with a \n.) instead it produces 'Sat 06/10/0720:00' see http://rt.cpan.org/Ticket/Display.html?id=29799 for a bug report against Web::Scraper that provides a demo. (that module's maintainer indicated that this output was generated by the as_text method of HTML::Element.)
as_text won't and can't do that at the moment as a design decision. This was a conversation that came up at the 2006 Chicago Hackathon, and the question I put forward then was this - what elements would you do this for? Further, when would you do them? If I have a block of HTML 3, for example, that reads: <xmp><br></xmp> That <br> should not be converted, but a blind regexp engine would convert it. Beyond that, <br> is not the only element that would need this treatment. People expect the same with <hr> as well as <p>, <div>, <blockquote> and other block-level elements. as_text was never intended to be used as a sanitization method nor a display method - the man page specifically states that it is the concatenation of text elements as the tree is descended. Changing that is a design decision and won't be considered until the major version is bumped up to 4.0, which is down the road quite a ways.