Subject: | utf8 problem |
Date: | Tue, 18 Nov 2008 12:06:21 +0000 |
To: | <bug-BBCode-Parser [...] rt.cpan.org> |
From: | Raymond Field <raymond [...] mvine.com> |
When there is a UTF8 character inside a set of bb tags, the code strips off
the leading byte in the range \xc0 - \xf0, rendering the resulting UTF8
character unprintable.
For example:
[b]Everyone deserves a voice. The question ishow loud?[/b]
Find out now by creating your own klustera new breed of group
decision-making tool that helps you bubble-up new ideas, identify the best
ones, and make better decisions.
[URL=http://www.kluster.com/]Click here to visit[/URL]
Is formatted as:
<div class="bbcode-body">
<b>Everyone deserves a voice. The question is?how loud?</b><br/>
<br/>
Find out now by creating your own klustera new breed of group
decision-making tool that helps you bubble-up new ideas, identify the best
ones, and make better decisions.<br/>
<br/>
<a href="http://www.kluster.com/" rel="nofollow">Click here to visit</a>
</div>
Notice that the first "bar" inside the [b][/b] bbtags is broken, whereas the
second bar "klustera" is not.
The simplest change I could find was to code line 703 (in version 0.34)
which could be changed from:
$text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]+//;
To
if (!$this->get("is_utf8")) {
$text =~ s/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]+//;
}
where is_utf8 is a settable option. In my test I just commented out the
substitution and the characters printed OK.
Regards,
Raymond Field