Subject: | <base> inside document |
Date: | Fri, 05 Mar 2010 07:32:20 +1100 |
To: | bug-HTML-FormatText-WithLinks [...] rt.cpan.org |
From: | Kevin Ryde <user42 [...] zip.com.au> |
It'd be good if HTML::FormatText::WithLinks applied the <base> element
in the document to relative links, as well as from the "base" parameter
option.
It might be as easy as the few lines below. (Overwriting $self->{base}
would be bad if the formatter object is re-used to format a second html
document, but I see the HTML::Formatter docs advise against that.)
I think making the document override the class/object parameter is the
right priority. Pass in the download location in the option but then
the document has precedence, the same as a browser is supposed to behave
I think.
I suppose there's also "xml:base" attributes in xhtml, if anyone's
foolish enough to expect that to work. But extracting or tracking that
value doesn't seem as easy as <base>.
--- WithLinks.pm.orig 2010-03-04 16:23:35.000000000 +1100
+++ WithLinks.pm 2010-03-05 07:24:49.000000000 +1100
@@ -179,6 +179,29 @@
return $text;
}
+sub head_start {
+ my ($self) = @_;
+ $self->SUPER::head_start();
+ # descend into <head> for possible <base> there, even if superclass not
+ # interested (as of HTML::FormatText 2.04 it's not)
+ return 1;
+}
+# <base> is supposed to be inside <head>, but no need to demand that.
+# "lynx -source" sticks a <base> at the very start of the document, before
+# even <html>, so accepting <base> anywhere lets that work.
+sub base_start {
+ my ($self, $node) = @_;
+ if (my $href = $node->attr('href')) {
+ $self->{base} = $href;
+ }
+ # allow for no superclass base_start() in HTML::FormatText 2.04
+ if (! HTML::FormatText->can('base_start')) {
+ return 0;
+ }
+ # chain up if it exists in the future
+ return $self->SUPER::base_start();
+}
+
sub parse {
my $self = shift;