On Aug 23, 2008, at 01:04, Christian Glahn via RT wrote:
Show quoted text> This appears to be a documentation bug.
>
> The synopsis suggests a hash reference passed to parse_*string()
> functions. However, if you look at the actual documentation you find
> that the function expects a string as the optional second parameter.
>
> In this case the synopsis is wrong and the function description is
> correct. I tested it with your code and it works nicely.
I just did this:
my $html = '<html><body><p>foo</p></body></html>';
my $parser = XML::LibXML->new;
my $doc = $parser->parse_html_string($html, '
http://foo.com/');
say $doc->baseURI;
And it still printed an undef.
Show quoted text> Another remark: if you know that your input is XHTML (rather than HTML
> strict) I suggest that you use the normal parse_string() function
> instead of its html sibling.
This is why I'm passing a hash. I'm parsing arbitrary Web pages that
will have god knows what kind of HTML in them. So my code actually
looks like this:
my $parser = XML::LibXML->new;
my $doc = $parser->parse_html_string($html, {
suppress_errors => 1, # Suppress errors
suppress_warnings => 1, # Suppress warnings
no_network => 1, # Don't make network requests.
recover => 1, # Relaxed parsing for bad HTML.
URI => '
http://foo.com/',
});
say $doc->baseURI;
Which also, BTW, outputs undef. And so does this:
my $doc = $parser->parse_html_string($html, '
http://foo.com/', {
suppress_errors => 1, # Suppress errors
suppress_warnings => 1, # Suppress warnings
no_network => 1, # Don't make network requests.
recover => 1, # Relaxed parsing for bad HTML.
});
say $doc->baseURI;
IOW, there is no way I can see to properly set baseURI.
David