Hi Breno,
On Sun Apr 25 13:21:51 2010, GARU wrote:
Show quoted text> On Sat Apr 24 20:12:20 2010, jfearn wrote:
> > You are correct, there is an option, no_space_compacting, which
> defaults
> > off, that controls this.
> >
> > $ perl -MHTML::TreeBuilder -E '$tree =
> > HTML::TreeBuilder->new(no_space_compacting => 1);
> > $tree->parse(q[<div>foo bar</div>]); print $tree->as_text, "\n"'
> > foo bar
> >
> > There are 4 spaces in the input & output, as expected.
> >
> > I'm not sure why this was defaulted off, but changing the default now
> > would have an unknown impact on current users :(
> >
> >
>
> Hey guys, thanks a lot for all the help on this issue. For the record,
> as the original requestor, I don't have a problem with ->as_text keeping
> its current behavior (i.e. not fixing it), as long as the documentation
> is updated on as_text to mention this and include reference to the
> 'no_space_compacting' attribute, while also adjusting the '...and any
> internal whitespace is collapsed.' part of the as_trimmed_text() right
> below.
I agree that this isn't clear. One of the reasons for this is that
as_text is part of HTML::Element, but no_space_compacting is part of
HTML::TreeBuilder. HTML::Element, being a lower level module, has no
idea no_space_compacting exists, so discussing it there seems a bit odd.
I'm looking in to how to phrase 'some modules using HTML::Element may
filter HTML when parsing it, check their options' ... OK, I'll work on
it some more ;)
Show quoted text> If I may give a suggestion, I think it would be nice to have something
> like:
>
> ->as_text( no_space_compacting => 1 )
>
> or even using another key, like (for example):
>
> ->as_text( internal_whitespace => 1 )
>
> to return the text with internal whitespace. Please note, however, that
> this should be local to that particular returned value, not a global
> attribute such as the current 'no_space_compacting' - which is part of
> the reason I suggested a different name :)
This would require migrating some of the parse time behaviour in
HTML::TreeBuilder in to the output behaviour of HTML::Element. I'm not
opposed to this, but it would impose a significant testing burden to
avoid breaking existing uses.
Show quoted text> Thanks again for all the help!
Thanks for the positive feedback :)