Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the PPI CPAN distribution.

Report information
The Basics
Id: 30496
Status: open
Priority: 0/
Queue: PPI

People
Owner: Nobody in particular
Requestors: jgangemi@gmail.com (no email address)
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: calculate document char offset when indexing locations
i've mentioned this to adam in a seperate email, but wanted to file an enhancement request as well. i'd like for ppi to calculate the character offset of tokens within a document, ie: package Foo; use strict; 1; the package token would start at offset of 0, while the word token representing 'Foo' would start at offset 8. the 'use' statement offset would have an offset of 14, and so on. attached you'll find a quick diff that illustrates the changes i made to to the PPI::Document and PPI::Dumper modules to add this enhancement.
Subject: Dumper.patch
--- Dumper.pm 2007-11-04 11:53:24.000000000 -0500 +++ Dumper.pm.OLD 2007-11-04 11:52:32.000000000 -0500 @@ -253,7 +253,7 @@ if ( $Element->isa('PPI::Token') ) { my $location = $Element->location; if ($location) { - $loc_string = sprintf("[ % 4d, % 3d, % 3d, % 4d ] ", @$location); + $loc_string = sprintf("[ % 4d, % 3d, % 3d ] ", @$location); } } # Output location or pad with 20 spaces
Subject: Document.patch
--- Document.pm.OLD 2007-11-04 11:48:31.000000000 -0500 +++ Document.pm 2007-11-04 12:00:49.000000000 -0500 @@ -146,6 +146,7 @@ my $self = $class->SUPER::new; $self->{readonly} = ! 1; $self->{tab_width} = 1; + $self->{offset} = 0; return $self; } @@ -585,7 +586,7 @@ # Calculate the new location if needed. $location = $_ ? $self->_add_location( $location, $Tokens[$_ - 1], \$heredoc ) - : [ 1, 1, 1 ]; + : [ 1, 1, 1, 0 ]; $first = $_; last; } @@ -609,6 +610,8 @@ my ($self, $start, $Token, $heredoc) = @_; my $content = $Token->{content}; + $self->{offset} += length($content); + # Does the content contain any newlines my $newlines =()= $content =~ /\n/g; unless ( $newlines ) { @@ -616,13 +619,14 @@ return [ $start->[0], $start->[1] + length($content), - $start->[2] + $self->_visual_length($content, $start->[2]) + $start->[2] + $self->_visual_length($content, $start->[2]), + $self->{offset}, ]; } # This is the more complex case where we hit or # span a newline boundary. - my $location = [ $start->[0] + $newlines, 1, 1 ]; + my $location = [ $start->[0] + $newlines, 1, 1, $self->{offset} ]; if ( $heredoc and $$heredoc ) { $location->[0] += $$heredoc; $$heredoc = 0; @@ -633,6 +637,7 @@ if ( $content =~ /\n([^\n]+?)\z/ ) { $location->[1] += length($1); $location->[2] += $self->_visual_length($1, $location->[2]); + $location->[3] += length($1); } $location;
From: cpan [...] chrisdolan.net
Jae, I have not looked at your attachments yet, but I can say right away that the current PPI cannot reliably provide character offsets for all documents because it does not (yet) preserve \n vs. \r\n vs. \r line endings. So, a \n file on Windows or a \r\n file on Unix will count characters wrong, for example. Chris
the changes i made ask the token for its length, so once the line endings are preserved, i would think the calculations would still be correct. i assume you'd have to tweak my heredoc/multiple line ending calculations, well, right now all my dev work is being done on a mac, so the line endings are consistent. if nothing else, it would allow me an easier upgrade path to new versions of ppi. it's a lot easier to figure that out as the document is being built vs when i dump the document out to build an xml representation of it. On Mon Nov 05 18:37:01 2007, CDOLAN wrote: Show quoted text
> Jae, > > I have not looked at your attachments yet, but I can say right away that > the current PPI cannot reliably provide character offsets for all > documents because it does not (yet) preserve \n vs. \r\n vs. \r line > endings. So, a \n file on Windows or a \r\n file on Unix will count > characters wrong, for example. > > Chris
Subject: Re: [rt.cpan.org #30496] calculate document char offset when indexing locations
Date: Thu, 8 Nov 2007 12:16:04 +1100
To: bug-PPI [...] rt.cpan.org
From: "Adam Kennedy" <adamkennedybackup [...] gmail.com>
One solution might be to check the document for "correct" newlines at parse-time, and then set a $document->{local_newlines} = 1 flag of some sort. If that flag is set, then we know that we can run the 4-element location parse safely. Adam K On 08/11/2007, Jae Gangemi via RT <bug-PPI@rt.cpan.org> wrote: Show quoted text
> > > Queue: PPI > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30496 > > > > the changes i made ask the token for its length, so once the line > endings are preserved, i would think the calculations would still be > correct. > > i assume you'd have to tweak my heredoc/multiple line ending > calculations, > > well, right now all my dev work is being done on a mac, so the line > endings are consistent. if nothing else, it would allow me an easier > upgrade path to new versions of ppi. > > it's a lot easier to figure that out as the document is being built vs > when i dump the document out to build an xml representation of it. > > On Mon Nov 05 18:37:01 2007, CDOLAN wrote:
> > Jae, > > > > I have not looked at your attachments yet, but I can say right away that > > the current PPI cannot reliably provide character offsets for all > > documents because it does not (yet) preserve \n vs. \r\n vs. \r line > > endings. So, a \n file on Windows or a \r\n file on Unix will count > > characters wrong, for example. > > > > Chris
> > > > >