Skip Menu |

This queue is for tickets about the HTML-Scrubber CPAN distribution.

Report information
The Basics
Id: 72659
Status: rejected
Priority: 0/
Queue: HTML-Scrubber

People
Owner: Nobody in particular
Requestors: JIRA [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: utf8 issues
There seem to be an issue with scrubbing utf8 encoded html. The returned data are not in perl internal encoding so one have to to decode on it.
I just wrote a test for this and am not seeing issues... which quite likely means I do not understanding things correctly since UTF tends to be subtle and vengeful! Could you send me a failing test for this - it will make it much easier to fix, and show that its fixed. Failing that, some sample code. Nigel.
Still awaiting some failure examples for this - if the input string is correctly labeled as utf8 then there should be no issues. If, however, you have a byte string with utf8 content you are lying about the character sets to the code and nasty things may happen - in that sort of case you should set the input filehandle encoding or explicitly d/encode the string. Intending to close this off unless I get some form of further info as I cannot reproduce an issue.
Tests I have run are showing that the module is utf clean, and no response from original reporter giving any further information regarding the bug.