Bug #30496 for PPI: calculate document char offset when indexing locations

Mon Nov 05 10:09:54 2007 jgangemi [...] gmail.com - Ticket created

Subject:

calculate document char offset when indexing locations

i've mentioned this to adam in a seperate email, but wanted to file an enhancement request as well. i'd like for ppi to calculate the character offset of tokens within a document, ie: package Foo; use strict; 1; the package token would start at offset of 0, while the word token representing 'Foo' would start at offset 8. the 'use' statement offset would have an offset of 14, and so on. attached you'll find a quick diff that illustrates the changes i made to to the PPI::Document and PPI::Dumper modules to add this enhancement.

Subject:

Dumper.patch

--- Dumper.pm 2007-11-04 11:53:24.000000000 -0500 +++ Dumper.pm.OLD 2007-11-04 11:52:32.000000000 -0500 @@ -253,7 +253,7 @@ if ( $Element->isa('PPI::Token') ) { my $location = $Element->location; if ($location) { - $loc_string = sprintf("[ % 4d, % 3d, % 3d, % 4d ] ", @$location); + $loc_string = sprintf("[ % 4d, % 3d, % 3d ] ", @$location); } } # Output location or pad with 20 spaces

Subject:

Document.patch

--- Document.pm.OLD 2007-11-04 11:48:31.000000000 -0500 +++ Document.pm 2007-11-04 12:00:49.000000000 -0500 @@ -146,6 +146,7 @@ my $self = $class->SUPER::new; $self->{readonly} = ! 1; $self->{tab_width} = 1; + $self->{offset} = 0; return $self; } @@ -585,7 +586,7 @@ # Calculate the new location if needed. $location = $_ ? $self->_add_location( $location, $Tokens[$_ - 1], \$heredoc ) - : [ 1, 1, 1 ]; + : [ 1, 1, 1, 0 ]; $first = $_; last; } @@ -609,6 +610,8 @@ my ($self, $start, $Token, $heredoc) = @_; my $content = $Token->{content}; + $self->{offset} += length($content); + # Does the content contain any newlines my $newlines =()= $content =~ /\n/g; unless ( $newlines ) { @@ -616,13 +619,14 @@ return [ $start->[0], $start->[1] + length($content), - $start->[2] + $self->_visual_length($content, $start->[2]) + $start->[2] + $self->_visual_length($content, $start->[2]), + $self->{offset}, ]; } # This is the more complex case where we hit or # span a newline boundary. - my $location = [ $start->[0] + $newlines, 1, 1 ]; + my $location = [ $start->[0] + $newlines, 1, 1, $self->{offset} ]; if ( $heredoc and $$heredoc ) { $location->[0] += $$heredoc; $$heredoc = 0; @@ -633,6 +637,7 @@ if ( $content =~ /\n([^\n]+?)\z/ ) { $location->[1] += length($1); $location->[2] += $self->_visual_length($1, $location->[2]); + $location->[3] += length($1); } $location;

Mon Nov 05 18:37:01 2007 chris+rt [...] chrisdolan.net - Correspondence added

From:

cpan [...] chrisdolan.net

Jae, I have not looked at your attachments yet, but I can say right away that the current PPI cannot reliably provide character offsets for all documents because it does not (yet) preserve \n vs. \r\n vs. \r line endings. So, a \n file on Windows or a \r\n file on Unix will count characters wrong, for example. Chris

Mon Nov 05 18:37:13 2007 The RT System itself - Status changed from 'new' to 'open'

Wed Nov 07 09:52:22 2007 jgangemi [...] gmail.com - Correspondence added

the changes i made ask the token for its length, so once the line endings are preserved, i would think the calculations would still be correct. i assume you'd have to tweak my heredoc/multiple line ending calculations, well, right now all my dev work is being done on a mac, so the line endings are consistent. if nothing else, it would allow me an easier upgrade path to new versions of ppi. it's a lot easier to figure that out as the document is being built vs when i dump the document out to build an xml representation of it. On Mon Nov 05 18:37:01 2007, CDOLAN wrote: Show quoted text

> Jae, > > I have not looked at your attachments yet, but I can say right away that > the current PPI cannot reliably provide character offsets for all > documents because it does not (yet) preserve \n vs. \r\n vs. \r line > endings. So, a \n file on Windows or a \r\n file on Unix will count > characters wrong, for example. > > Chris

Wed Nov 07 20:16:25 2007 adamkennedybackup [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #30496] calculate document char offset when indexing locations
Date:	Thu, 8 Nov 2007 12:16:04 +1100
To:	bug-PPI [...] rt.cpan.org
From:	"Adam Kennedy" <adamkennedybackup [...] gmail.com>

One solution might be to check the document for "correct" newlines at parse-time, and then set a $document->{local_newlines} = 1 flag of some sort. If that flag is set, then we know that we can run the 4-element location parse safely. Adam K On 08/11/2007, Jae Gangemi via RT <bug-PPI@rt.cpan.org> wrote: Show quoted text

> > > Queue: PPI > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30496 > > > > the changes i made ask the token for its length, so once the line > endings are preserved, i would think the calculations would still be > correct. > > i assume you'd have to tweak my heredoc/multiple line ending > calculations, > > well, right now all my dev work is being done on a mac, so the line > endings are consistent. if nothing else, it would allow me an easier > upgrade path to new versions of ppi. > > it's a lot easier to figure that out as the document is being built vs > when i dump the document out to build an xml representation of it. > > On Mon Nov 05 18:37:01 2007, CDOLAN wrote:

> > Jae, > > > > I have not looked at your attachments yet, but I can say right away that > > the current PPI cannot reliably provide character offsets for all > > documents because it does not (yet) preserve \n vs. \r\n vs. \r line > > endings. So, a \n file on Windows or a \r\n file on Unix will count > > characters wrong, for example. > > > > Chris

> > > > >

Bug #30496 for PPI: calculate document char offset when indexing locations

Preferred bug tracker