Skip Menu |

This queue is for tickets about the HTML-WikiConverter CPAN distribution.

Report information
The Basics
Id: 53531
Status: open
Priority: 0/
Queue: HTML-WikiConverter

People
Owner: Nobody in particular
Requestors: mturner [...] cc.umanitoba.ca
Cc: turnermm02 [...] shaw.ca
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.68
Fixed in: (no value)



CC: turnermm02 [...] shaw.ca
Subject: UTF*
A user of DokuWikiFck from Slovenia reported a problem with wide characters. I am forwarding his eamil here: OK, I managed to do some Windows testing, using apache 2.2.14 + PHP 5.2.12 + ActivePerl 5.8.9, with latest Dokuwiki and DokuwikiFCK. fckgLite works out-of-the-box, fckg does not. Here are the details: - just to see what happens I modified the Windows version of saveFCK.pl like this (notice the UTF-8 options for both input and output streams): binmode(STDOUT, ":utf8"); my $html; if (exists $options{'file'}) { open FH, "<:encoding(utf-8)", $options{'file'}; $html = join "", <FH>; close FH; } else { $html = join "", <>; } print $html; This way the script outputs properly encoded UTF8 text. I then passed the same $html variable to WikiConverter. my $wc = new HTML::WikiConverter( 'dialect' => "DokuWikiFCK", 'base_uri' => $options{'base_uri'}); print $wc->html2wiki($html); This produces an error saying "Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 170." So I modified WikiConverter.pm and commented out these two lines: (line 224) #$html = decode( $self->encoding, $html ); (line 258) #$output = encode( $self->encoding, $output ); And as a result I get properly encoded text! Perhaps this is the way to go - Dokuwiki uses UTF8 by default, so modifying saveFCK.pl as shown shouldn't cause any problems. But I need to find a way to eliminate the encode/decode error message, so we can use the original WikiConverter module (good for upgrades). Thanks, Myron Turner
On Fri Jan 08 23:30:03 2010, TURNERMM wrote: Show quoted text
> So I modified WikiConverter.pm and commented out these two lines: > > (line 224) > #$html = decode( $self->encoding, $html ); > > (line 258) > #$output = encode( $self->encoding, $output ); > > And as a result I get properly encoded text! > > Perhaps this is the way to go - Dokuwiki uses UTF8 by default, so > modifying saveFCK.pl as shown shouldn't cause any problems. But I need > to find a way to eliminate the encode/decode error message, so we can > use the original WikiConverter module (good for upgrades).
I also just stumbled over this. I believe that this a bug in HTML::WikiConverter - it'll blindly decode its input with Encode::decode, even if your main program has already taken care of this e.g. with PERL_UNICODE=S. I've worked around this issue by manually reencoding my Unicode string with Encode::encode, and then decoding the HTML::WikiConverter output with Encode::decode: #!/usr/bin/perl -CSDAL use warnings; use strict; use HTML::WikiConverter; use Encode; my $foo = "<b>input string</b>"; my $wc = HTML::WikiConverter->new( dialect => 'Markdown', md_extra => 1, encoding => "utf8"); my $md = Encode::decode("utf8", $wc->html2wiki(Encode::encode("utf8", $foo)));