Skip Menu |

This queue is for tickets about the App-SD CPAN distribution.

Report information
The Basics
Id: 49528
Status: resolved
Priority: 0/
Queue: App-SD

People
Owner: Nobody in particular
Requestors: nelhage [...] mit.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: publish --html HTML still doesn't handle non-ASCII correctly
Date: Tue, 8 Sep 2009 15:53:10 -0400
To: bug-App-SD [...] rt.cpan.org
From: Nelson Elhage <nelhage [...] MIT.EDU>
Subject: Re: [rt.cpan.org #49528] publish --html HTML still doesn't handle non-ASCII correctly
Date: Thu, 10 Sep 2009 04:35:07 -0400
To: Nelson Elhage via RT <bug-App-SD [...] rt.cpan.org>
From: Jesse Vincent <jesse [...] fsck.com>
Just fixed in prophet git. On Tue 8.Sep'09 at 15:53:39 -0400, Nelson Elhage via RT wrote: Show quoted text
> Tue Sep 08 15:53:38 2009: Request 49528 was acted upon. > Transaction: Ticket created by nelhage > Queue: App-SD > Subject: publish --html HTML still doesn't handle non-ASCII correctly > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: nelhage@mit.edu > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=49528 > > > > c.f. http://nelhage.com/sd/barnowl/ticket/36934c17-1c54-5678-988b-9b16f177160d/view.html > > It looks fine using 'sd server'. >
Still not quite right for some reason. I re-published, and even though the content-type is now set, it looks like the unicode characters are getting output as a byte stream, not a character stream -- something is converting the multibyte character into a sequence of HTML entities representing the underlying bytes. Re-published version at http://nelhage.com/sd/barnowl/ticket/36934c17-1c54-5678-988b-9b16f177160d/view.html
CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #49528] publish --html HTML still doesn't handle non-ASCII correctly
Date: Thu, 17 Sep 2009 11:58:48 -0400
To: Nelson Elhage via RT <bug-App-SD [...] rt.cpan.org>
From: Jesse Vincent <jesse [...] fsck.com>
At this point, I suspect it's a bug in our HTML parsing chain. that path should definitely be rewritten as it's insanely slow and a C dep for a core feature. On Thu 10.Sep'09 at 11:02:34 -0400, Nelson Elhage via RT wrote: Show quoted text
> Queue: App-SD > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=49528 > > > Still not quite right for some reason. I re-published, and even though > the content-type is now set, it looks like the unicode characters are > getting output as a byte stream, not a character stream -- something is > converting the multibyte character into a sequence of HTML entities > representing the underlying bytes. > > Re-published version at > http://nelhage.com/sd/barnowl/ticket/36934c17-1c54-5678-988b-9b16f177160d/view.html >
I also believe that this is a bug in our HTML parsing chain. I can reproduce with HTML::TreeBuilder 3.23, but not with the recently-released 4.1. I'm going to bump our HTML::TreeBuilder dep and close this bug. Spang