Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 12722
Status: resolved
Priority: 0/
Queue: PPI

People
Owner: Nobody in particular
Requestors: cpan [...] perlmeister.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.903
Fixed in: (no value)



Subject: PPI 0.906 chokes on embedded POD with umlauts
The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: wget http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm #!/usr/bin/perl use PPI::Document; my $d = PPI::Document->load("Log4perl.pm"); $d or print PPI::Tokenizer::errstr(), "\n"; results "Source code contains unsupported characters (first one encountered was '�')" because of the line Ceki Gülcü, "Short introduction to log4j", somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. Anyway, thanks for this great module!
Date: Mon, 09 May 2005 13:21:26 +1000
From: Adam Kennedy <adam [...] phase-n.com>
To: bug-PPI [...] rt.cpan.org
Subject: Re: [cpan #12722] PPI 0.906 chokes on embedded POD with umlauts
RT-Send-Cc:
PPI is about 80-90% capable of handling all of latin1. But in a few places it isn't capable. The errors from those places were drowning out all other legitimate errors, so I've disabled any support for full latin-1 manually at this time. If you would like to help, I would really appreciate some unit tests specifically testing where latin-1 both _is_ and _isn't_ allowed, so that I can clean up the various corners where there are problems and be sure that they are working sufficiently well. Regards Adam K Michael_Schilli via RT wrote: Show quoted text
> This message about PPI was sent to you by MSCHILLI <MSCHILLI@cpan.org> via rt.cpan.org > > Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=12722 > > > The PPI tokenizer chokes on Perl modules containing umlauts in their embedded POD documentation. Example: > > wget > http://search.cpan.org/src/MSCHILLI/Log-Log4perl-0.51/lib/Log/Log4perl.pm > > #!/usr/bin/perl > use PPI::Document; > my $d = PPI::Document->load("Log4perl.pm"); > $d or print PPI::Tokenizer::errstr(), "\n"; > > results "Source code contains unsupported characters (first one encountered was '&#65533;')" because of the line > > Ceki Gülcü, "Short introduction to log4j", > > somewhere in the POD part. Would be great if Latin-1 chars would be acceptable as well, perl allows them in strings, regexes and POD. > > Anyway, thanks for this great module!
This is a duplicate of bug 11682