Wed Nov 23 06:55:00 2011JIRA [...] cpan.org - Ticket created
Subject:
utf8 issues
There seem to be an issue with scrubbing utf8 encoded html.
The returned data are not in perl internal encoding so one have to to
decode on it.
Wed Nov 23 08:18:57 2011nigel.metheringham [...] gmail.com - Correspondence added
I just wrote a test for this and am not seeing issues... which quite likely means I do not
understanding things correctly since UTF tends to be subtle and vengeful!
Could you send me a failing test for this - it will make it much easier to fix, and show that its
fixed.
Failing that, some sample code.
Nigel.
Wed Nov 23 08:18:58 2011The RT System itself - Status changed from 'new' to 'open'
Tue Feb 07 16:01:44 2012nigel.metheringham [...] gmail.com - Correspondence added
Still awaiting some failure examples for this - if the input string is correctly labeled as utf8 then
there should be no issues.
If, however, you have a byte string with utf8 content you are lying about the character sets to
the code and nasty things may happen - in that sort of case you should set the input filehandle
encoding or explicitly d/encode the string.
Intending to close this off unless I get some form of further info as I cannot reproduce an issue.
Sat Dec 22 14:39:56 2012nigel.metheringham [...] gmail.com - Correspondence added
Tests I have run are showing that the module is utf clean, and no response from original reporter
giving any further information regarding the bug.
Sat Dec 22 14:39:57 2012nigel.metheringham [...] gmail.com - Status changed from 'open' to 'rejected'