Subject: | UTF8 fixer note |
Date: | Thu, 22 Jun 2017 20:38:07 +1000 |
To: | bug-Encoding-FixLatin [...] rt.cpan.org |
From: | Thomas Rutter <tom [...] thomasrutter.com> |
Not sure if this is still active but though I'd write a quick note that converting overlong UTF8 sequences to their equivalent short encoding introduces a potential security flaw in some software as it allows for any character to pass through certain filtering/parsing by disguising it in its overlong form, knowing it'll be converted back to the illegal payload later. It would be better to replace them with the Unicode replacement char (simply removing them can introduce a similar security flaw whereby inserting an invalid UTF8 sequence in the middle of an illegal payload can mask it from filters, with the invalid sequence removed from the middle later).
CheersThomas