Still not quite right for some reason. I re-published, and even though
the content-type is now set, it looks like the unicode characters are
getting output as a byte stream, not a character stream -- something is
converting the multibyte character into a sequence of HTML entities
representing the underlying bytes.
Re-published version at
http://nelhage.com/sd/barnowl/ticket/36934c17-1c54-5678-988b-9b16f177160d/view.html
Re: [rt.cpan.org #49528] publish --html HTML still doesn't handle non-ASCII correctly
Date:
Thu, 17 Sep 2009 11:58:48 -0400
To:
Nelson Elhage via RT <bug-App-SD [...] rt.cpan.org>
From:
Jesse Vincent <jesse [...] fsck.com>
At this point, I suspect it's a bug in our HTML parsing chain. that path
should definitely be rewritten as it's insanely slow and a C dep for a
core feature.
On Thu 10.Sep'09 at 11:02:34 -0400, Nelson Elhage via RT wrote:
Show quoted text
> Queue: App-SD
> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=49528 >
>
> Still not quite right for some reason. I re-published, and even though
> the content-type is now set, it looks like the unicode characters are
> getting output as a byte stream, not a character stream -- something is
> converting the multibyte character into a sequence of HTML entities
> representing the underlying bytes.
>
> Re-published version at
> http://nelhage.com/sd/barnowl/ticket/36934c17-1c54-5678-988b-9b16f177160d/view.html
>
Thu Jan 06 09:03:05 2011spang [...] mit.edu - Correspondence added
I also believe that this is a bug in our HTML parsing chain. I can
reproduce with HTML::TreeBuilder 3.23, but not with the recently-released
4.1. I'm going to bump our HTML::TreeBuilder dep and close this bug.
Spang
Thu Jan 06 09:03:06 2011spang [...] mit.edu - Status changed from 'open' to 'resolved'