Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 26436
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Jeff.Fearn [...] gmail.com
Requestors: eharrison [...] realestate.com.au
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.23
Fixed in: (no value)



Subject: as_trimmed_text in HTML::Element does not trim  
sub as_trimmed_text { my $text = shift->as_text(@_); $text =~ s/[\n\r\f\t ]+$//s; $text =~ s/^[\n\r\f\t ]+//s; $text =~ s/[\n\r\f\t ]+/ /g; return $text; } This fails to trim   from $text which is commonly used in HTML The following would resolve the problem: sub as_trimmed_text { my $text = shift->as_text(@_); $text =~ s/[\n\r\f\t\xA0 ]+$//s; $text =~ s/^[\n\r\f\t\xA0 ]+//s; $text =~ s/[\n\r\f\t\xA0 ]+/ /g; return $text; }
From: perl [...] cjmweb.net
On Mon Apr 16 22:41:10 2007, gzminiz wrote: Show quoted text
> sub as_trimmed_text {
Show quoted text
> This fails to trim   from $text which is commonly used in HTML > The following would resolve the problem:
This behavior is as designed. U+00A0 ( ) is not considered whitespace in the HTML specification; see http://www.w3.org/TR/html4/struct/text.html#h-9.1 That said, it wouldn't hurt if this was mentioned in the docs for as_trimmed_text.
Updated docs to be clearer on what white space will be cleaned.
From: dma_k [...] mail.ru
Птн Апр 20 02:31:18 2007, CJM писал: Show quoted text
> This behavior is as designed. U+00A0 ( ) is not considered > whitespace in the HTML specification; see > http://www.w3.org/TR/html4/struct/text.html#h-9.1
Pity. Would be useful in many cases, as API consumers expect. Maybe one can introduce yet another helper to trim also non-breaking spaces? Or pass an additional option as an argument e.g. as_trimmed_text('trim_nbsp' => 1).
Hi, what I did was add a parameter,extra_chars, that allows the user to add a string that will be used in the regexes. e.g. to remove the encoded or un-encoded   $h->as_trimmed_text(extra_chars => ' \xA0');
Subject: 4.0 released
Hi HTML::Tree ve4rsion 4.0 has been released which includes a fix for this issue. Cheers, Jeff.