Skip Menu |

This queue is for tickets about the CPAN-Site CPAN distribution.

Report information
The Basics
Id: 39831
Status: resolved
Worked: 4.3 hours (260 min)
Priority: 0/
Queue: CPAN-Site

People
Owner: Nobody in particular
Requestors: matisse [...] spamcop.net
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.21
Fixed in: (no value)



Subject: CPAN::Site::Index::package_inventory() has problem with Text-PDF-0.29a.tar.gz
The current release of the Text-PDF distro has something odd going on such that the inspect_entry() subroutine doesn't properly find all the package statements, in particular the Text::PDF package is not found. In CPAN::Site::Index::inspect_entry() there is a while loop: while($out_buf =~ s/^([^\n]*)\n//) I did some investigation about what the regex in that loop matches, and it seems like the Text-PDF-0.29a distro has a few cases where the "package" statement has a bunch of unusual characters right before - perhaps nulls? This could be the result of some odd tar format, but I don't know. I've attached the Text-PDF distro tarball. I also might have a patch to contribute, but I need to test it some more and get permission from $work to release it.
Subject: Text-PDF-0.29a.tar.gz
Download Text-PDF-0.29a.tar.gz
application/x-gzip 50.2k

Message body not shown because it is not plain text.

My mistake! I was using an older version of CPAN::Site (0.18) when I found this problem, and thought I was using the latest. This problem was fixed in 0.19. My apologies for the false report.
Looks like this problem has resurfaced, at least in CPAN::Site::Index 0.21 I'll do some testing tonight and see if I can understand why.
I have created a failing unit test for the problem, and created a patched version of Index.pm that fixes it. I have also requested clearance from my company to release these changes and hope to get sign-off next week, in which case I will attach a tarball to this ticket.
I've gotten clearance to post by proposed fix and unit tests. I've attached a tarball with the proposed changes.
Download CPAN-Site-0.21.tar.gz
application/x-gzip 63.6k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #39831] CPAN::Site::Index::package_inventory() has problem with Text-PDF-0.29a.tar.gz
Date: Fri, 12 Dec 2008 21:52:17 +0100
To: Matisse Enzer via RT <bug-CPAN-Site [...] rt.cpan.org>
From: NLnet webmaster <webmaster [...] nlnet.nl>
* Matisse Enzer via RT (bug-CPAN-Site@rt.cpan.org) [081211 21:55]: Show quoted text
> Queue: CPAN-Site > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=39831 > > > I've gotten clearance to post by proposed fix and unit tests. > I've attached a tarball with the proposed changes.
You change if( m/^ (?:use\s+version\s*;\s*)? (?:our)? \s* \$ (?: \w+\:\:)* VERSION \s* \= \s* (.*)/x ) { register into if( m/^ (?:use\s+version\s*;\s*)? (?:our)? \s* \$ (?: \w+\:\:)* VERSION \s* \= \s* (.*)/x ) { $version = eval "my \$v = $1"; } else { print "VERSION not found in $file\n" if $debug; $version = 0; } register Doesn't that mean that you call register all the time? For each processed line? Ever ran your program with $debug on? Actually, CPANs package-list sees version '0' as different from version 'undef'. So, the default is incorrect. You changed: if( m/^\s* package \s* ((\w+\:\:)*\w+) \s* ;/x ) { $package = $1; print "package=$package\n" if $debug; next; } into if( m/^\s* package \s* ((\w+\:\:)*\w+) \s* ;/x ) { $package = $1 || 'unknown'; print "package=$package\n" if $debug; } "unknown" can never happen. "next" is a minor optimization. So no need to take those changes. I have changed original block into if( m/^\s* package \s* ((?:\w+\:\:)*\w+) \s* ;/x ) { $package = $1; warn "package=$package\n" if $debug; register $package, undef, $dist; next; } which first registers the package with an <undef> version. It solves the same problem as you encountered, but with much less run-time. The register() method needed some improvement as well. Attached is the new version to be... ok? -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Download CPAN-Site-0.22.tar.gz
application/x-gunzip 65.2k

Message body not shown because it is not plain text.

For some reason I didn't get the email from RT when you posted on Dec 12, so I just saw your response. Sorry for the delay. I will look this weekend. Your comments are probably correct (I do run the program with debug on, but probably did not pay attention to the effect of running register(0 so often.)
I've looked at your changes and they seem fine to me. Thank you for respond to this issue, and I apologize for missing your response at first. As one more tiny suggestion I've attached a patch for Index.pm that factors out two complex regular expressions from inspect_entry() into compiled regular expression, for example: my $PACKAGE_REGEX = qr/^\s* package \s* ((?:\w+\:\:)*\w+) \s* ;/ox; The only reasons for this change are to make the code a little more readable, possibly faster, and maybe in future add tests specifically for those regular expression (one would probably need to declare them with 'our' in that case.) As far as I am concerned you could mark this ticket "fixed in 0.22"
Index: /menzer/CPAN-Site/lib/CPAN/Site/Index.pm =================================================================== --- /menzer/CPAN-Site/lib/CPAN/Site/Index.pm (revision 3524) +++ /menzer/CPAN-Site/lib/CPAN/Site/Index.pm (working copy) @@ -135,6 +135,9 @@ || $subdir eq 'lib'; # inside lib } +my $PACKAGE_REGEX = qr/^\s* package \s* ((?:\w+\:\:)*\w+) \s* ;/ox; +my $VERSION_REGEX = qr/^ (?:use\s+version\s*;\s*)? + (?:our)? \s* \$ (?: \w+\:\:)* VERSION \s* \= \s* (.*)/ox; sub inspect_entry { my $fn = $File::Find::name; return if ! -f $fn || $fn !~ $tar_gz; @@ -207,7 +210,7 @@ local $_ = $1; $file && $file =~ m/\.pm$/ or next; - if( m/^\s* package \s* ((?:\w+\:\:)*\w+) \s* ;/x ) + if( $_ =~ $PACKAGE_REGEX ) { $package = $1; warn "package=$package\n" if $debug; register $package, undef, $dist; @@ -212,8 +215,7 @@ warn "package=$package\n" if $debug; register $package, undef, $dist; } - elsif( m/^ (?:use\s+version\s*;\s*)? - (?:our)? \s* \$ (?: \w+\:\:)* VERSION \s* \= \s* (.*)/x ) + elsif( $_ =~ $VERSION_REGEX ) { local $VERSION; # destroyed by eval $version = eval "my \$v = $1"; $version = $version->numify if ref $version;
Subject: Re: [rt.cpan.org #39831] CPAN::Site::Index::package_inventory() has problem with Text-PDF-0.29a.tar.gz
Date: Mon, 22 Dec 2008 15:31:30 +0100
To: Matisse Enzer via RT <bug-CPAN-Site [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Matisse Enzer via RT (bug-CPAN-Site@rt.cpan.org) [081221 18:45]: Show quoted text
> Queue: CPAN-Site > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=39831 > > > my $PACKAGE_REGEX = qr/^\s* package \s* ((?:\w+\:\:)*\w+) \s* ;/ox; > > The only reasons for this change are to make the code a little more > readable, possibly faster, and maybe in future add tests specifically > for those regular expression (one would probably need to declare them > with 'our' in that case.)
The code just behind the regexes use $1, so in stead of improving the readability, this would reduce it. Besides, it is surely not an performance improvement (even if I cared about that). On the moment that we add tests, I may decide to follow your suggestion... not earlier. -- Thanks anyway, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
fixed in 0.22