Skip Menu |

This queue is for tickets about the HTML-FormatText-WithLinks CPAN distribution.

Report information
The Basics
Id: 55238
Status: resolved
Priority: 0/
Queue: HTML-FormatText-WithLinks

People
Owner: Nobody in particular
Requestors: user42 [...] zip.com.au
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: <base> inside document
Date: Fri, 05 Mar 2010 07:32:20 +1100
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
It'd be good if HTML::FormatText::WithLinks applied the <base> element in the document to relative links, as well as from the "base" parameter option. It might be as easy as the few lines below. (Overwriting $self->{base} would be bad if the formatter object is re-used to format a second html document, but I see the HTML::Formatter docs advise against that.) I think making the document override the class/object parameter is the right priority. Pass in the download location in the option but then the document has precedence, the same as a browser is supposed to behave I think. I suppose there's also "xml:base" attributes in xhtml, if anyone's foolish enough to expect that to work. But extracting or tracking that value doesn't seem as easy as <base>.
--- WithLinks.pm.orig 2010-03-04 16:23:35.000000000 +1100 +++ WithLinks.pm 2010-03-05 07:24:49.000000000 +1100 @@ -179,6 +179,29 @@ return $text; } +sub head_start { + my ($self) = @_; + $self->SUPER::head_start(); + # descend into <head> for possible <base> there, even if superclass not + # interested (as of HTML::FormatText 2.04 it's not) + return 1; +} +# <base> is supposed to be inside <head>, but no need to demand that. +# "lynx -source" sticks a <base> at the very start of the document, before +# even <html>, so accepting <base> anywhere lets that work. +sub base_start { + my ($self, $node) = @_; + if (my $href = $node->attr('href')) { + $self->{base} = $href; + } + # allow for no superclass base_start() in HTML::FormatText 2.04 + if (! HTML::FormatText->can('base_start')) { + return 0; + } + # chain up if it exists in the future + return $self->SUPER::base_start(); +} + sub parse { my $self = shift;
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Mon, 8 Mar 2010 21:50:31 +0000
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Struan Donald <struan [...] exo.org.uk>
On 4 Mar 2010, at 4 20:36, Kevin Ryde via RT wrote: Show quoted text
> Thu Mar 04 15:36:04 2010: Request 55238 was acted upon. > Transaction: Ticket created by user42@zip.com.au > Queue: HTML-FormatText-WithLinks > Subject: <base> inside document > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: user42@zip.com.au > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=55238 > > > > It'd be good if HTML::FormatText::WithLinks applied the <base> element > in the document to relative links, as well as from the "base" parameter > option.
Thanks for the patch. I'll hopefully get a bit of time to have a look at this in the next week or two. Certainly looks reasonable. Thanks again Struan Show quoted text
> It might be as easy as the few lines below. (Overwriting $self->{base} > would be bad if the formatter object is re-used to format a second html > document, but I see the HTML::Formatter docs advise against that.) > > > I think making the document override the class/object parameter is the > right priority. Pass in the download location in the option but then > the document has precedence, the same as a browser is supposed to behave > I think. > > I suppose there's also "xml:base" attributes in xhtml, if anyone's > foolish enough to expect that to work. But extracting or tracking that > value doesn't seem as easy as <base>. > > > > --- WithLinks.pm.orig 2010-03-04 16:23:35.000000000 +1100 > +++ WithLinks.pm 2010-03-05 07:24:49.000000000 +1100 > @@ -179,6 +179,29 @@ > return $text; > } > > +sub head_start { > + my ($self) = @_; > + $self->SUPER::head_start(); > + # descend into <head> for possible <base> there, even if superclass not > + # interested (as of HTML::FormatText 2.04 it's not) > + return 1; > +} > +# <base> is supposed to be inside <head>, but no need to demand that. > +# "lynx -source" sticks a <base> at the very start of the document, before > +# even <html>, so accepting <base> anywhere lets that work. > +sub base_start { > + my ($self, $node) = @_; > + if (my $href = $node->attr('href')) { > + $self->{base} = $href; > + } > + # allow for no superclass base_start() in HTML::FormatText 2.04 > + if (! HTML::FormatText->can('base_start')) { > + return 0; > + } > + # chain up if it exists in the future > + return $self->SUPER::base_start(); > +} > + > sub parse { > > my $self = shift;
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Mon, 8 Mar 2010 22:47:16 +0000
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Struan Donald <struan [...] exo.org.uk>
On 4 Mar 2010, at 4 20:36, Kevin Ryde via RT wrote: Show quoted text
> Thu Mar 04 15:36:04 2010: Request 55238 was acted upon. > Transaction: Ticket created by user42@zip.com.au > Queue: HTML-FormatText-WithLinks > Subject: <base> inside document > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: user42@zip.com.au > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=55238 > > > I think making the document override the class/object parameter is the > right priority. Pass in the download location in the option but then > the document has precedence, the same as a browser is supposed to behave > I think.
I've had another think about this and I think I am going to vote no on this being the default as it might break existing code. Instead I'll add a config option of doc_overrides_base ( or something more pithy if I can think of it ) for people that want this option. I'll get this polished up and on the CPAN, hopefully by the end of the week. Thanks again, Struan
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Tue, 09 Mar 2010 11:20:09 +1100
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
"struan@exo.org.uk via RT" <bug-HTML-FormatText-WithLinks@rt.cpan.org> writes: Show quoted text
> > doc_overrides_base
What about a separate option like "content_location => $url". You can then pass that in as the downloaded location. It'd be overridden by a base in the document, or the base option. The advantage would be when combining formatting parameters from different parts of a program. The download part can supply a "content_location", some user options merged in might or might supply a forcing "base" parameter. (Not sure about the name "content_location" though. It'd match MIMEish stuff but maybe "default_base" would be clearer in the context of just html.)
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Wed, 10 Mar 2010 11:20:00 +1100
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
I wrote: Show quoted text
> > "default_base"
I think I like that one or something similar better than content_location. Whichever way I'll try to add the same to my HTML::FormatText::Lynx and friends.
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Wed, 10 Mar 2010 21:08:03 +0000
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Struan Donald <struan [...] exo.org.uk>
On 9 Mar 2010, at 9 00:22, Kevin Ryde via RT wrote: Show quoted text
> Queue: HTML-FormatText-WithLinks > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=55238 > > > "struan@exo.org.uk via RT" <bug-HTML-FormatText-WithLinks@rt.cpan.org> writes:
>> >> doc_overrides_base
> > What about a separate option like "content_location => $url". You can > then pass that in as the downloaded location. It'd be overridden by a > base in the document, or the base option. > > The advantage would be when combining formatting parameters from > different parts of a program. The download part can supply a > "content_location", some user options merged in might or might supply a > forcing "base" parameter.
I have to confess I'm not sure I understand what you mean by this. I'd prefer to keep the interface as simple as possible and to not break backward compatibility so I'll probably stick with what I've got. Thanks all the same for the feedback Struan
Subject: Re: [rt.cpan.org #55238] <base> inside document
Date: Fri, 12 Mar 2010 11:52:51 +1100
To: bug-HTML-FormatText-WithLinks [...] rt.cpan.org
From: Kevin Ryde <user42 [...] zip.com.au>
"struan@exo.org.uk via RT" <bug-HTML-FormatText-WithLinks@rt.cpan.org> writes: Show quoted text
> > I have to confess I'm not sure I understand what you mean by this.
"doc_overrides_base" is likely to be wanted most of the time (when formatting something downloaded). I think a "default_base" would be clearer (and still compatible). But you could start by applying the <base> element when there's no base parameter, and leave a further parameter for further thought.
Just to let you know I've rather shamefully only just got round to releasing this. It should be out as part of 0.12 shortly Struan
0.12 released and available on the CPAN