Bug #13326 for PPI: index_location invalid for some newline and whitespace related stuff

Mon Jun 20 16:37:30 2005 Guest - Ticket created

Subject:

index_location invalid for some newline and whitespace related stuff

I'm using the location property to figure out what symbol is at the cursor position, and in doing so I found a few things that didn't work correctly. It seems to have mostly to do with newline related things. a) When calculating the length of the last line, this should probably be the case: # Does the token have additional characters # after their last newline. if ( $content =~ /\n([^\n]+?)$/ ) { $location->[1] += length($1); } (the original only matches one char of the last line) b) When writing tests for this I came upon something else. I think the whitespace parsing for comments is a bit off. It seems like the preceeding whitespace is regarded as part of the comment. So this comment: #commment isn't parsed as " " + "#comment", but " #comment". Is this a bug, or should it behave like that? The attached patch contains tests and a fix for a) and b) is pointed out in the test case. WinXP perl -v This is perl, v5.8.6 built for MSWin32-x86-multi-thread /J

*** c:/DOCUME~1/linjoh/LOCALS~1/Temp/Document.pm Mon Jun 20 22:28:12 2005 --- c:/DOCUME~1/linjoh/LOCALS~1/Temp/Document.pm3908F22 Mon Jun 20 22:28:12 2005 *************** *** 379,385 **** # Does the token have additional characters # after their last newline. ! if ( $content =~ /\n([^\n])$/ ) { $location->[1] += length($1); } --- 379,384 ---- # Does the token have additional characters # after their last newline. ! if ( $content =~ /\n([^\n]+?)$/ ) { $location->[1] += length($1); } *** c:/DOCUME~1/linjoh/LOCALS~1/Temp/12_location.t Mon Jun 20 22:29:54 2005 --- c:/DOCUME~1/linjoh/LOCALS~1/Temp/12_location.t3908EKM Mon Jun 20 22:29:54 2005 *************** *** 22,28 **** use PPI; # Execute the tests ! use Test::More tests => 105; my $test_source = <<'END_PERL'; my $foo = 'bar'; --- 22,28 ---- use PPI; # Execute the tests ! use Test::More tests => 269; my $test_source = <<'END_PERL'; my $foo = 'bar'; *************** *** 39,72 **** THAT } 1; END_PERL my @test_locations = ( [ 1, 1 ], ! [ 1, 4 ], ! [ 1, 9 ], ! [ 1, 11 ], ! [ 1, 16 ], ! [ 3, 1 ], ! [ 4, 1 ], ! [ 4, 5 ], ! [ 4, 9 ], ! [ 5, 2 ], ! [ 5, 5 ], ! [ 5, 6 ], ! [ 5, 11 ], ! [ 5, 13 ], ! [ 5, 18 ], ! [ 5, 20 ], ! [ 5, 22 ], ! [ 5, 23 ], ! [ 5, 31 ], ! [ 5, 33 ], ! [ 5, 41 ], ! [ 5, 42 ], ! [ 13, 1 ], ! [ 15, 1 ], ! [ 15, 2 ], ); --- 39,136 ---- THAT } + sub bar { + baz(); + + #Note that there are leading 4 x space, not 1 x tab in the sub bar + + bas(); + } + 1; END_PERL my @test_locations = ( [ 1, 1 ], ! [ 1, 3 ], # ! [ 1, 4 ], #$foo ! [ 1, 8 ], # ! [ 1, 9 ], #= ! [ 1, 10 ], # ! [ 1, 11 ], #'bar' ! [ 1, 16 ], #; ! ! [ 1, 17 ], #\n ! [ 2, 1 ], #\n ! ! [ 3, 1 ], # # comment ! ! [ 4, 1 ], #sub ! [ 4, 4 ], # ! [ 4, 5 ], #foo ! [ 4, 8 ], # ! [ 4, 9 ], #{ ! [ 4, 10 ], #\n ! ! [ 5, 1 ], # tab ! [ 5, 2 ], #my ! [ 5, 4 ], # ! [ 5, 5 ], #( ! [ 5, 6 ], #$this ! [ 5, 11 ], #, ! [ 5, 12 ], # ! [ 5, 13 ], #$that ! [ 5, 18 ], #) ! [ 5, 19 ], # ! [ 5, 20 ], #= ! [ 5, 21 ], # ! [ 5, 22 ], #( ! [ 5, 23 ], #<<'THIS' ! [ 5, 31 ], #, ! [ 5, 32 ], # ! [ 5, 33 ], #<<"THAT" ! [ 5, 41 ], #) ! [ 5, 42 ], #; ! [ 5, 43 ], #\n ! ! [ 13, 1 ], #} ! [ 13, 2 ], #\n ! ! [ 14, 1 ], #\n ! ! [ 15, 1 ], #sub ! [ 15, 4 ], # ! [ 15, 5 ], #bar ! [ 15, 8 ], # ! [ 15, 9 ], #{ ! [ 15, 10 ], #\n ! ! [ 16, 1 ], # whitespace ! [ 16, 5 ], #baz ! [ 16, 8 ], #( ! [ 16, 9 ], #) ! [ 16, 10 ], #; ! [ 16, 11 ], #\n ! ! [ 17, 1 ], #\n ! ! [ 18, 1 ], #comment !BUG! Should be 5, and a whitespace chunk before it (now the preceeding whitespace is part of the comment string) ! ! [ 19, 1 ], #\n whitespace ! ! [ 20, 5 ], #bas ! [ 20, 8 ], #( ! [ 20, 9 ], #) ! [ 20, 10 ], #; ! [ 20, 11 ], #\n ! ! [ 21, 1 ], #} ! [ 21, 2 ], #\n ! ! [ 22, 1 ], #\n ! ! [ 23, 1 ], #1 ! [ 23, 2 ], #; ! [ 23, 3 ], #\n ); *************** *** 82,95 **** ok( $Document->index_locations, '->index_locations returns true' ); # Now check the locations of every token ! my @tokens = grep { ! $_->isa('PPI::Token::Whitespace') } $Document->tokens; ! is( scalar(@tokens), scalar(@test_locations), 'Number of non-whitespace tokens matches expected' ); foreach my $i ( 0 .. $#test_locations ) { my $location = $tokens[$i]->location; is( ref($location), 'ARRAY', "Token $i: ->location returns an ARRAY ref" ); is( scalar(@$location), 2, "Token $i: ->location returns a 2 element ARRAY ref" ); ok( ($location->[0] > 0 and $location->[1] > 0), "Token $i: ->location returns two positive positions" ); ! is_deeply( $test_locations[$i], $tokens[$i]->location, "Token $i: ->location matches expected" ); } ok( $Document->flush_locations, '->flush_locations returns true' ); --- 146,160 ---- ok( $Document->index_locations, '->index_locations returns true' ); # Now check the locations of every token ! my @tokens = $Document->tokens; # grep { ! $_->isa('PPI::Token::Whitespace') } ! is( scalar(@tokens), scalar(@test_locations), 'Number of tokens matches expected' ); foreach my $i ( 0 .. $#test_locations ) { my $location = $tokens[$i]->location; + # my $token = "$tokens[$i]"; $token =~ s|\n|\\n|gs; print "\n$location->[0], $location->[1]: |$token|\n"; is( ref($location), 'ARRAY', "Token $i: ->location returns an ARRAY ref" ); is( scalar(@$location), 2, "Token $i: ->location returns a 2 element ARRAY ref" ); ok( ($location->[0] > 0 and $location->[1] > 0), "Token $i: ->location returns two positive positions" ); ! is_deeply( $tokens[$i]->location, $test_locations[$i], "Token $i: ->location matches expected" ); } ok( $Document->flush_locations, '->flush_locations returns true' );

Mon Jun 20 18:26:26 2005 Guest - Correspondence added

From:

johanl[ÄT]DarSerMan.com

Show quoted text

> a) When calculating the length of the last line, this should probably > be the case: > > # Does the token have additional characters > # after their last newline. > if ( $content =~ /\n([^\n]+?)$/ ) { > $location->[1] += length($1); > }

Oops. That broke the tests of my own modules when there is a token that contains newlines (like POD). The new rex matches end of string explicitly with \z instead of $. I wrote some more tests to demonstrate this. Please consider this patch in place of the other one. And hopefully it will... uh, actually work this time. /J

*** c:/DOCUME~1/linjoh/LOCALS~1/Temp/Document.pm Tue Jun 21 00:19:27 2005 --- c:/DOCUME~1/linjoh/LOCALS~1/Temp/Document.pm1828BeB Tue Jun 21 00:19:27 2005 *************** *** 379,385 **** # Does the token have additional characters # after their last newline. ! if ( $content =~ /\n([^\n])$/ ) { $location->[1] += length($1); } --- 379,385 ---- # Does the token have additional characters # after their last newline. ! if ( $content =~ / \n ( [^\n]+? ) \z/x ) { #Beware of $, it matches end of string or $location->[1] += length($1); } *** c:/DOCUME~1/linjoh/LOCALS~1/Temp/12_location.t Tue Jun 21 00:20:59 2005 --- c:/DOCUME~1/linjoh/LOCALS~1/Temp/12_location.t18281Ga Tue Jun 21 00:20:59 2005 *************** *** 22,28 **** use PPI; # Execute the tests ! use Test::More tests => 105; my $test_source = <<'END_PERL'; my $foo = 'bar'; --- 22,28 ---- use PPI; # Execute the tests ! use Test::More tests => 333; my $test_source = <<'END_PERL'; my $foo = 'bar'; *************** *** 39,72 **** THAT } 1; END_PERL my @test_locations = ( [ 1, 1 ], ! [ 1, 4 ], ! [ 1, 9 ], ! [ 1, 11 ], ! [ 1, 16 ], ! [ 3, 1 ], ! [ 4, 1 ], ! [ 4, 5 ], ! [ 4, 9 ], ! [ 5, 2 ], ! [ 5, 5 ], ! [ 5, 6 ], ! [ 5, 11 ], ! [ 5, 13 ], ! [ 5, 18 ], ! [ 5, 20 ], ! [ 5, 22 ], ! [ 5, 23 ], ! [ 5, 31 ], ! [ 5, 33 ], ! [ 5, 41 ], ! [ 5, 42 ], ! [ 13, 1 ], ! [ 15, 1 ], ! [ 15, 2 ], ); --- 39,166 ---- THAT } + sub bar { + baz(); + + #Note that there are leading 4 x space, not 1 x tab in the sub bar + + bas(); + } + + =head2 fluzz() + + Print "fluzz". Return 1. + + =cut + sub fluzz { + print "fluzz"; + } + 1; END_PERL my @test_locations = ( [ 1, 1 ], ! [ 1, 3 ], # ! [ 1, 4 ], #$foo ! [ 1, 8 ], # ! [ 1, 9 ], #= ! [ 1, 10 ], # ! [ 1, 11 ], #'bar' ! [ 1, 16 ], #; ! ! [ 1, 17 ], #\n ! [ 2, 1 ], #\n ! ! [ 3, 1 ], # # comment ! ! [ 4, 1 ], #sub ! [ 4, 4 ], # ! [ 4, 5 ], #foo ! [ 4, 8 ], # ! [ 4, 9 ], #{ ! [ 4, 10 ], #\n ! ! [ 5, 1 ], # tab ! [ 5, 2 ], #my ! [ 5, 4 ], # ! [ 5, 5 ], #( ! [ 5, 6 ], #$this ! [ 5, 11 ], #, ! [ 5, 12 ], # ! [ 5, 13 ], #$that ! [ 5, 18 ], #) ! [ 5, 19 ], # ! [ 5, 20 ], #= ! [ 5, 21 ], # ! [ 5, 22 ], #( ! [ 5, 23 ], #<<'THIS' ! [ 5, 31 ], #, ! [ 5, 32 ], # ! [ 5, 33 ], #<<"THAT" ! [ 5, 41 ], #) ! [ 5, 42 ], #; ! [ 5, 43 ], #\n ! ! [ 13, 1 ], #} ! [ 13, 2 ], #\n ! ! [ 14, 1 ], #\n ! ! [ 15, 1 ], #sub ! [ 15, 4 ], # ! [ 15, 5 ], #bar ! [ 15, 8 ], # ! [ 15, 9 ], #{ ! [ 15, 10 ], #\n ! ! [ 16, 1 ], # whitespace ! [ 16, 5 ], #baz ! [ 16, 8 ], #( ! [ 16, 9 ], #) ! [ 16, 10 ], #; ! [ 16, 11 ], #\n ! ! [ 17, 1 ], #\n ! ! [ 18, 1 ], #comment !BUG! Should be 5, and a whitespace chunk before it (now the preceeding whitespace is part of the comment string) ! ! [ 19, 1 ], #\n whitespace ! ! [ 20, 5 ], #bas ! [ 20, 8 ], #( ! [ 20, 9 ], #) ! [ 20, 10 ], #; ! [ 20, 11 ], #\n ! ! [ 21, 1 ], #} ! [ 21, 2 ], #\n ! ! [ 22, 1 ], #\n ! ! [ 23, 1 ], #=head2 ! ! [ 28, 1 ], #sub ! [ 28, 4 ], # ! [ 28, 5 ], #fluzz ! [ 28, 10 ], # ! [ 28, 11 ], #{ ! [ 28, 12 ], #\n ! ! [ 29, 1 ], # ! [ 29, 5 ], #print ! [ 29, 10 ], # ! [ 29, 11 ], #"fluzz" ! [ 29, 18 ], #; ! [ 29, 19 ], #\n ! ! [ 30, 1 ], #} ! [ 30, 2 ], #\n ! ! [ 31, 1 ], #\n ! ! [ 32, 1 ], #1 ! [ 32, 2 ], #; ! [ 32, 3 ], #\n ); *************** *** 82,95 **** ok( $Document->index_locations, '->index_locations returns true' ); # Now check the locations of every token ! my @tokens = grep { ! $_->isa('PPI::Token::Whitespace') } $Document->tokens; ! is( scalar(@tokens), scalar(@test_locations), 'Number of non-whitespace tokens matches expected' ); foreach my $i ( 0 .. $#test_locations ) { my $location = $tokens[$i]->location; is( ref($location), 'ARRAY', "Token $i: ->location returns an ARRAY ref" ); is( scalar(@$location), 2, "Token $i: ->location returns a 2 element ARRAY ref" ); ok( ($location->[0] > 0 and $location->[1] > 0), "Token $i: ->location returns two positive positions" ); ! is_deeply( $test_locations[$i], $tokens[$i]->location, "Token $i: ->location matches expected" ); } ok( $Document->flush_locations, '->flush_locations returns true' ); --- 176,190 ---- ok( $Document->index_locations, '->index_locations returns true' ); # Now check the locations of every token ! my @tokens = $Document->tokens; # grep { ! $_->isa('PPI::Token::Whitespace') } ! is( scalar(@tokens), scalar(@test_locations), 'Number of tokens matches expected' ); foreach my $i ( 0 .. $#test_locations ) { my $location = $tokens[$i]->location; + # my $token = "$tokens[$i]"; $token =~ s|\n|\\n|gs; print "\n$location->[0], $location->[1]: |$token|\n"; is( ref($location), 'ARRAY', "Token $i: ->location returns an ARRAY ref" ); is( scalar(@$location), 2, "Token $i: ->location returns a 2 element ARRAY ref" ); ok( ($location->[0] > 0 and $location->[1] > 0), "Token $i: ->location returns two positive positions" ); ! is_deeply( $tokens[$i]->location, $test_locations[$i], "Token $i: ->location matches expected" ); } ok( $Document->flush_locations, '->flush_locations returns true' );

Mon Jul 04 02:24:54 2005 adamk [...] cpan.org - Correspondence added

Show quoted text

> #commment > > isn't parsed as " " + "#comment", but " #comment". Is this a > bug, or should it behave like that?

This is because I originall (and still do to an extent) want to be able to handle "line comments". That is, a comment that occupies an entire line. It's not only faster to parse, but (more importantly) is a lot easier to manipulate. I'm still unsure if we should remove the "line comment" concept entirely though.

Mon Jul 04 02:51:59 2005 adamk [...] cpan.org - Given to ADAMK

Mon Jul 04 02:51:59 2005 adamk [...] cpan.org - Status changed from 'new' to 'resolved'

Mon Jul 04 02:51:59 2005 adamk [...] cpan.org - Correspondence added 30 min

Patch applied. Fixed some whitespace problems in the patch (your editor doesn't use hard tabs) and cleaned out the commented out stuff (CVS is good for preserving that stuff). I think I need to treat the line comment issue as a separate one from this bug. I'll have a think about it and make sure it gets resolved one way or the other before the 1.000 release. At the moment, I'm tending towards splitting it up into whitespace + comment + whitespace(newline) and adding a ->is_entire_line method to Token::Comment and some ->remove_line method to trim out not just the comment, but it's leading and trailing whitespace too.

Bug #13326 for PPI: index_location invalid for some newline and whitespace related stuff

Preferred bug tracker