Subject: | Special Characters in Formats Make PPI::Document::new() fail |
Special characters in Perl formats make PPI::Document::new() fail. The
following script defines a simple format:
format STDOUT =
ä@<<<<<<<
'Name'
.
write STDOUT;
When trying to build a PPI document for this script by
use PPI;
use PPI::Dumper;
my $module=new PPI::Document('format.pl');
my $dumper=PPI::Dumper->new($module)
or die PPI::Document->errstr;
the constructor fails with
Fatal error... regex failed to match in 'ä@<<<<<<<
' when expected at /.../site_perl/5.10.0/PPI/Token/Word.pm line 178.
The failure is caused by the special character (German umlaut "ä") in
the picture line - PPI::Document::new() succeeds when this character is
removed.
According to perlform, all literal characters are valid in format
definition picture lines: "Picture lines contain output field
definitions, intermingled with literal text." So, it would be fine if
PPI (and tools based on it) could handle the special character.
When looking at the PPI dump for a format variation without the special
character, it seems to me PPI is not aware of the special format
definition "context". Instead, it seems to treat the tokens as if they
were pure code, interpreting "<<" as an operator, for example:
PPI::Document
PPI::Statement
PPI::Token::Word 'format'
PPI::Token::Whitespace ' '
PPI::Token::Word 'STDOUT'
PPI::Token::Whitespace ' '
PPI::Token::Operator '='
PPI::Token::Whitespace '\n'
PPI::Token::Cast '@'
PPI::Token::Operator '<<'
PPI::Token::Operator '<<'
PPI::Token::Operator '<<'
PPI::Token::Operator '<'
PPI::Token::Whitespace '\n'
PPI::Token::Whitespace ' '
PPI::Token::Quote::Single ''Name''
PPI::Token::Whitespace '\n'
PPI::Token::Operator '.'
PPI::Token::Whitespace '\n'
PPI::Token::Whitespace '\n'
PPI::Token::Word 'write'
PPI::Token::Whitespace ' '
PPI::Token::Word 'STDOUT'
PPI::Token::Structure ';'
PPI::Token::Whitespace '\n'
The assumption/theory that format definitions are tokenized without
treating them special is supported by the fact that when the special
character is embedded into quotes, PPI can handle it without problem.
(Unfortunately, this is no workaround as the quotes are literal
characters from the formats point of view.)
format STDOUT =
'ä'@<<<<<<<
'Name'
.
write STDOUT;
Here is the PPI dump of this script:
PPI::Document
PPI::Statement
PPI::Token::Word 'format'
PPI::Token::Whitespace ' '
PPI::Token::Word 'STDOUT'
PPI::Token::Whitespace ' '
PPI::Token::Operator '='
PPI::Token::Whitespace '\n'
PPI::Token::Quote::Single ''ä''
PPI::Token::Cast '@'
PPI::Token::Operator '<<'
PPI::Token::Operator '<<'
PPI::Token::Operator '<<'
PPI::Token::Operator '<'
PPI::Token::Whitespace '\n'
PPI::Token::Whitespace ' '
PPI::Token::Quote::Single ''Name''
PPI::Token::Whitespace '\n'
PPI::Token::Operator '.'
PPI::Token::Whitespace '\n'
PPI::Token::Whitespace '\n'
PPI::Token::Word 'write'
PPI::Token::Whitespace ' '
PPI::Token::Word 'STDOUT'
PPI::Token::Structure ';'
PPI::Token::Whitespace '\n'
I am using PPI 1.203 with a non-threading perl 5.10.0 under Linux.
Thanks in advance!
Subject: | format_with_special_character.pl |
format STDOUT =
ä@<<<<<<<
'Name'
.
write STDOUT;
Subject: | format_without_special_character.pl |
format STDOUT =
@<<<<<<<
'Name'
.
write STDOUT;
Subject: | ppiDumper.pl |
use strict;
use warnings;
use PPI;
use PPI::Dumper;
my $module=new PPI::Document($ARGV[0]);
my $dumper=PPI::Dumper->new($module) or die PPI::Document->errstr;
$dumper->print;
Subject: | format_with_quoted_special_character.pl |
format STDOUT =
'ä'@<<<<<<<
'Name'
.
write STDOUT;