Skip Menu |

This queue is for tickets about the SVN-Dump CPAN distribution.

Report information
The Basics
Id: 25467
Status: resolved
Priority: 0/
Queue: SVN-Dump

People
Owner: Nobody in particular
Requestors: jpeacock [...] cpan.org
RSCHUPP [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: 0.05



Subject: Cannot process cvs2svn-created dumpfile
While trying to use the supplied svndump_replace_author.pl script on a CVS repository converted with cvs2svn, I discovered that SVN::Dump::Reader::read_record is chopping lines that are not empty from an otherwise legitimate dumpfile. I have a proposed patch which makes SVN::Dump::Reader a subclass of IO::File (instead of IO::Handle), then uses getpos() and setpos() to rewind the filehandle if the "blank" line isn't actually blank, e.g. @@ -65,10 +65,18 @@ # chop empty line after the record my $type = $headers->type(); - <$fh> if $type !~ /\A(?:format|uuid)\z/; + if ( $type !~ /\A(?:format|uuid)\z/ ) { + my $pos = $fh->getpos; + my $line = <$fh>; + $fh->setpos($pos) unless $line =~ /^$/; + } It is a much larger patch than would otherwise be necessary, because it changes the interface (the {fh} object is no longer a filehandle), so it has to update the test files as well. I think a much better fix would be to make read_record() leading blank line agnostic (i.e. an unknown number of blank lines will be consumed at the start of read_record(), rather than trying to chop trailing blank lines. I can produce a patch to do this instead if you would like. John
From: BOOK [...] cpan.org
On Thu Mar 15 11:12:30 2007, JPEACOCK wrote: Show quoted text
> While trying to use the supplied svndump_replace_author.pl script on a > CVS repository converted with cvs2svn, I discovered that > SVN::Dump::Reader::read_record is chopping lines that are not empty from > an otherwise legitimate dumpfile.
Can you send me your dump (or an extract of it that show the problem)? I'd like to add it to the test suite. Show quoted text
> I have a proposed patch which makes SVN::Dump::Reader a subclass of > IO::File (instead of IO::Handle), then uses getpos() and setpos() to > rewind the filehandle if the "blank" line isn't actually blank, e.g.
I want SVN::Dump to be able to read dumps from STDIN, to allow it to receive dumps from 'svnadmin dump'. Wouldn't its being a subclass of IO::File prevent this? Show quoted text
> I think a much better fix would be to make read_record() leading blank > line agnostic (i.e. an unknown number of blank lines will be consumed at > the start of read_record(), rather than trying to chop trailing blank > lines. I can produce a patch to do this instead if you would like.
When I first wrote SVN::Dump, my first test was to be able to produce dumps that were identical to the one I was reading (see t/25dump.t for example). In my opinion, being able to create the "identity" function would demonstrate I correctly reversed-engineered the dump format. Is the number of blank lines really not significant? Thanks for your report and patch proposal. -- BooK
Subject: Re: [rt.cpan.org #25467] Cannot process cvs2svn-created dumpfile
Date: Thu, 15 Mar 2007 12:28:03 -0400
To: bug-SVN-Dump [...] rt.cpan.org
From: John Peacock <jpeacock [...] rowman.com>
Philippe Bruhat via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=25467 > > > On Thu Mar 15 11:12:30 2007, JPEACOCK wrote:
>> While trying to use the supplied svndump_replace_author.pl script on a >> CVS repository converted with cvs2svn, I discovered that >> SVN::Dump::Reader::read_record is chopping lines that are not empty from >> an otherwise legitimate dumpfile.
> > Can you send me your dump (or an extract of it that show the problem)? > I'd like to add it to the test suite.
Let me manufacture one - not that my data is necessarily secure, but I'd rather have a very straightforward reconstruction recipe (which makes for better tests). Show quoted text
>
>> I have a proposed patch which makes SVN::Dump::Reader a subclass of >> IO::File (instead of IO::Handle), then uses getpos() and setpos() to >> rewind the filehandle if the "blank" line isn't actually blank, e.g.
> > I want SVN::Dump to be able to read dumps from STDIN, to allow it to > receive dumps from 'svnadmin dump'. Wouldn't its being a subclass > of IO::File prevent this?
I'm not sure if it is possible to use IO::File to seek on a non-seekable handle (like STDIN) outside of using IO::Block (which uses layers to cache IO). This is the best argument for just ignoring leading blank lines at the beginning of read_header_block, etc., rather than trying to figure out when you can safely chop. Show quoted text
>
>> I think a much better fix would be to make read_record() leading blank >> line agnostic (i.e. an unknown number of blank lines will be consumed at >> the start of read_record(), rather than trying to chop trailing blank >> lines. I can produce a patch to do this instead if you would like.
> > When I first wrote SVN::Dump, my first test was to be able to produce > dumps that were identical to the one I was reading (see t/25dump.t > for example). In my opinion, being able to create the "identity" > function would demonstrate I correctly reversed-engineered the dump > format. > > Is the number of blank lines really not significant?
Not as far as I can tell (short of pulling up svndmin/load.c). It appears that the places where your code assumes that there will be *two* blank lines may not be required by the underlying dump format. I have to assume that cvs2svn is going to produce 100% compatible dumpfiles (and indeed I can load up either with or without the doubled blank lines). John -- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
Subject: Re: [rt.cpan.org #25467] Cannot process cvs2svn-created dumpfile
Date: Thu, 15 Mar 2007 12:45:45 -0400
To: bug-SVN-Dump [...] rt.cpan.org
From: John Peacock <jpeacock [...] rowman.com>
John Peacock via RT wrote: Show quoted text
> This is the best argument for just ignoring leading blank > lines at the beginning of read_header_block, etc., rather than trying to > figure out when you can safely chop.
Attached is a proof of concept patch which correctly reads a cvs2svn dumpfile (and let me correctly filter my svn:author names). It suffers from two immediate problems: 1) It misjudges the end of file test (either by reading off the end of the file or by throwing the wrong error when it does); 2) It fails tests: Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/23record.t 8 2048 20 8 4 6 8 10 12 14 16 18 t/24reader.t 255 65280 14 28 1-14 t/25dump.t 255 65280 14 28 1-14 t/29fail.t 1 256 20 1 2 mostly due to fallout from #1, AFAICT. It does show that it should be possible to skip blank lines at the start of read_header_block instead of blinding chopping lines after some blocks. I think that saving some state information (so you know when to throw an actual EOF error as opposed to just ceasing the read) should make this work again. John -- John Peacock Director of Information Research and Technology Rowman & Littlefield Publishing Group 4501 Forbes Boulevard Suite H Lanham, MD 20706 301-459-3366 x.5010 fax 301-429-5748
diff -ru SVN-Dump-0.03/lib/SVN/Dump/Reader.pm SVN-Dump-0.03.blanklines/lib/SVN/Dump/Reader.pm --- SVN-Dump-0.03/lib/SVN/Dump/Reader.pm 2006-11-20 19:15:21.000000000 -0500 +++ SVN-Dump-0.03.blanklines/lib/SVN/Dump/Reader.pm 2007-03-15 12:32:01.000000000 -0400 @@ -60,16 +60,8 @@ { my $included = $fh->read_record(); $record->set_included_record( $included ); - <$fh>; # chop the empty line that follows } - # chop empty line after the record - my $type = $headers->type(); - <$fh> if $type !~ /\A(?:format|uuid)\z/; - - # chop another one after a node with only a prop block - <$fh> if $type eq 'node' && $record->has_prop_only(); - # uuid and format record only contain headers return $record; } @@ -83,7 +75,8 @@ my $line = <$fh>; croak _eof() if !defined $line; chop $line; - last if $line eq ''; # stop on empty line + next if $line eq '' && ! keys %$headers; # ignore leading empty line + last if $line eq '' && keys %$headers; # stop on trailing empty line my ($key, $value) = split /: /, $line, 2; $headers->{$key} = $value;
Subject: Re: [rt.cpan.org #25467] Cannot process cvs2svn-created dumpfile
Date: Fri, 16 Mar 2007 14:38:07 -0400
To: bug-SVN-Dump [...] rt.cpan.org
From: John Peacock <jpeacock [...] rowman.com>
Attached is an example CVS repository (based on the test Subversion file dump/full/test123-r0-r10.svn). I included the cvs2svn optionfile and both the dump it created and the dump I created by using your svndump_rename_author.pl script to change my username to studly caps... ;-) If I use the svndump_identity.pl script, it is obvious that the extra line after a copy is not required by the Subversion dumpfile format: --- cvs2svn-dump 2007-03-16 14:23:44.000000000 -0400 +++ cvs2svn-dump.identity 2007-03-16 14:35:36.000000000 -0400 @@ -283,26 +283,31 @@ Node-copyfrom-rev: 3 Node-copyfrom-path: /project/trunk/crunchle.txt + Node-path: project/branches/initial/latin.txt Node-action: add Node-copyfrom-rev: 3 Node-copyfrom-path: /project/trunk/latin.txt + Node-path: project/branches/initial/loremipsum.txt Node-action: add Node-copyfrom-rev: 3 Node-copyfrom-path: /project/trunk/loremipsum.txt + ... HTH John
Download cvs2svn-example.tgz
application/octet-stream 15.5k

Message body not shown because it is not plain text.

On Thu Mar 15 11:12:30 2007, JPEACOCK wrote: Show quoted text
> While trying to use the supplied svndump_replace_author.pl script on a > CVS repository converted with cvs2svn, I discovered that > SVN::Dump::Reader::read_record is chopping lines that are not empty from > an otherwise legitimate dumpfile.
I've hit the same problem, and wrote my own patch to overcome it, before finding this ticket. The attached patch is against 0.04 and doesn't seem to cause any test failures.
--- lib/SVN/Dump/Reader.pm +++ lib/SVN/Dump/Reader.pm @@ -62,12 +62,10 @@ <$fh>; # chop the empty line that follows } - # chop empty line after the record - my $type = $headers->type(); - <$fh> if $type !~ /\A(?:format|uuid)\z/; - - # chop another one after a node with only a prop block - <$fh> if $type eq 'node' && $record->has_prop_only(); + # chop off empty lines + my $old_pos = tell $fh; + while (<$fh> eq $NL) { $old_pos = tell $fh }; + seek $fh, $old_pos, 0; # uuid and format record only contain headers return $record;
I have the same problem as described in the ticket. The SVN dump in question was produced by svndumpfilter. Attached my fix for the problem. The patches proposed so far tackle the problem in the wrong place, by diddling when to drop trailing empty lines. Instead, I ignore leading empty lines when looking for headers. Note that the patch also patches t/23record.t: now you must actually read the EOF record (i.e. read_record() returns undef) to make sure that the input is fully processed. Also t/dump/fail/h_blank.svn and t/dump/fail/h_empty.svn should be deleted: an empty file or a file containing only empty lines doesn't throw an exception, it simple produces no records.
diff -ubr SVN-Dump-0.04.orig/lib/SVN/Dump/Reader.pm SVN-Dump-0.04/lib/SVN/Dump/Reader.pm --- SVN-Dump-0.04.orig/lib/SVN/Dump/Reader.pm 2008-06-12 16:52:27.000000000 +0200 +++ SVN-Dump-0.04/lib/SVN/Dump/Reader.pm 2008-09-17 17:46:46.870185000 +0200 @@ -27,13 +27,10 @@ sub read_record { my ($fh) = @_; - # no more records? - return if eof($fh); - my $record = SVN::Dump::Record->new(); # first get the headers - my $headers = $fh->read_header_block(); + my $headers = $fh->read_header_block() or return; $record->set_headers_block( $headers ); # get the property block @@ -59,16 +56,8 @@ { my $included = $fh->read_record(); $record->set_included_record( $included ); - <$fh>; # chop the empty line that follows } - # chop empty line after the record - my $type = $headers->type(); - <$fh> if $type !~ /\A(?:format|uuid)\z/; - - # chop another one after a node with only a prop block - <$fh> if $type eq 'node' && $record->has_prop_only(); - # uuid and format record only contain headers return $record; } @@ -77,15 +66,25 @@ my ($fh) = @_; local $/ = $NL; - my $headers = SVN::Dump::Headers->new(); + + # skip empty lines + my $line; while(1) { - my $line = <$fh>; - croak _eof() if !defined $line; + $line = <$fh>; + return if !defined $line; chop $line; - last if $line eq ''; # stop on empty line + last unless $line eq ''; + } + my $headers = SVN::Dump::Headers->new(); + while(1) { my ($key, $value) = split /: /, $line, 2; $headers->{$key} = $value; + + $line = <$fh>; + croak _eof() if !defined $line; + chop $line; + last if $line eq ''; # stop on empty line } croak "Empty line found instead of a header block line $." diff -ubr SVN-Dump-0.04.orig/t/23record.t SVN-Dump-0.04/t/23record.t --- SVN-Dump-0.04.orig/t/23record.t 2008-06-12 16:52:27.000000000 +0200 +++ SVN-Dump-0.04/t/23record.t 2008-09-18 09:15:31.435651000 +0200 @@ -19,6 +19,7 @@ my $dump = SVN::Dump::Reader->new($fh); my $r = $dump->read_record(); is_same_string( $r->as_string(), $expected, "Read $f record" ); - is( tell($fh), -s $f, "Read all of $f" ); + $r = $dump->read_record(); + ok( !$r && tell($fh) == -s $f, "Read all of $f" ); }