Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 27522
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: ivacklin [...] cs.helsinki.fi
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: HTML::HeadParser doesn't grok some broken xhtml
Date: Sun, 10 Jun 2007 16:58:34 +0300
To: bug-HTML-Parser [...] rt.cpan.org
From: T Ilmari Vacklin <ivacklin [...] cs.helsinki.fi>
See <http://code-libre.org>. The XHTML has an initial bogus <option> which is probably why headparser fails to extract any headers.
This also occurs with variations on the <title> tag, such as: <head> <title> some title</title> </head> "some title" is essentially ignored. I discovered this using WWW::Mechanize: use WWW::Mechanize; my $mech = new WWW::Mechanize(); $mech->get('http://www.umm.edu/patiented/articles/what_other_drugs_used_parkinsons_disease_000051_8.htm'); print $mech->title, "\n"; The expected result is to print "Parkinson's disease", but nothing is printed at all. Cheers, Dave
On Wed Nov 05 16:57:07 2008, DIBERRI wrote: Show quoted text
> This also occurs with variations on the <title> tag, such as: > > <head> > <title> > some title</title> > </head> > > "some title" is essentially ignored.
The problem here was that HTML::HeadParser did not ignore the Unicode BOM in decoded form. I have commited a change that will fix this (in 3.58).