Skip Menu |

This queue is for tickets about the Archive-Ar CPAN distribution.

Report information
The Basics
Id: 124061
Status: open
Priority: 0/
Queue: Archive-Ar

People
Owner: Nobody in particular
Requestors: PHILIPP [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Clobbering deliberately zero'd timestamps, etc. and deterministic archives
Date: Thu, 11 Jan 2018 15:44:32 -0700
To: bug-Archive-Ar [...] rt.cpan.org
From: Philip Prindeville <philipp [...] cpan.org>
Hi. I had a been trying to use Archive::Ar to fake the behavior of “ar D” on really old binutils toolchains which didn’t support it (basically, ‘ar D’ tells ‘ar’ to build “deterministic” archives by setting the uid, guid, and date all to the empty string, and the mode to ‘644’). Per the ar(1) man page: D Operate in deterministic mode. When adding files and the archive index use zero for UIDs, GIDs, timestamps, and use consistent file modes for all files. When this option is used, if ar is used with identical options and identical input files, multiple runs will create identical output files regardless of the input files' owners, groups, file modes, or modification times. I tried to write a trivial program to run through a library and fix it up, but turns out I wasn’t able to, mostly because of this one line: 224 $params->{date} ||= timelocal(localtime()); which I fixed by changing it to: 224 $params->{date} = timelocal(localtime()) unless (exists $params->{date}); so that’s my suggested fix. Some opportunistic comments/suggestions/improvements: The mangling of the filename and overwriting what’s in the hash: 217 (undef, undef, $filename) = File::Spec->splitpath($filename); 218 219 $params->{name} = $filename; seems like overkill. Especially since the ar spec requires that the filename be a basename only with slash appended to it, e.g. “myfile.o/“. If you pass this in, splitpath() will return “” as the filename. Safer to do: 217 (undef, undef, $params->{name}) = File::Spec->splitpath($filename) 218 unless (exists $params->{name}); or otherwise: 217 (undef, undef, $filename) = File::Spec->splitpath($filename); 218 219 $params->{name} = $filename . ‘/‘; per the Wiki article on AR: https://en.wikipedia.org/wiki/Ar_(Unix) System V ar uses a '/' character (0x2F) to mark the end of the filename; this allows for the use of spaces without the use of an extended filename. Not sure about this line: 226 $params->{mode} ||= "100644”; as “644” is the canonical value in binutils/bfd/archive.c. I think “100644” is a Debian-ism. And the pod here: 560 =item * C<add_data(I<"filename">, I<$filedata>)> 561 562 Takes an filename and a set of data to represent it. Unlike C<add_files>, C<add_data> 563 is a virtual add, and does not require data on disk to be present. The 564 data is a hash that looks like: is incorrect, as it omits the 2nd argument, $data: 205 sub add_data 206 { 207 my($this, $filename, $data, $params) = @_; and as a general comment, it’s an arbitrary restriction that you can’t add_data() over an existing file. Sometimes you want to rewrite a file in place, preserving its order in the archive… otherwise you have to remove it AND ALL THE FILES WHICH FOLLOW IT and add then back in the original order. That’s a little tedious. Looking at: 390 if(exists($this->{_filehash}->{$file->{name}})) 391 { 392 $this->_dowarn("Can't _addFile because virtual file already exists with that name in the archive"); 393 return; 394 } 395 396 push @{$this->{_files}}, $file->{name}; it’s not clear to me why this can’t just be: 390 push @{$this->{_files}}, $file->{name} unless (exists($this->{_filehash}->{$file->{name}})); instead. Attaching diffs for the patched version I’m using.

Message body is not shown because sender requested not to inline it.

Thanks for your patch and the careful description of your suggested changes. I agree with all your points. One question: your line numbers do not match the current release. Is your patch against version 2.02? John
On Sat Feb 17 01:36:27 2018, JBAZIK wrote: Show quoted text
> Thanks for your patch and the careful description of your suggested > changes. I agree with all your points. One question: your line > numbers do not match the current release. Is your patch against > version 2.02? > > John
No, it was against 1.12 because I was working on an old distro (CentOS 6.8 if I remember). I'll rework and retest the patch. Thanks
On Thu Jan 11 17:48:38 2018, PHILIPP wrote: Show quoted text
> The mangling of the filename and overwriting what’s in the hash: > > 217 (undef, undef, $filename) = File::Spec-
> >splitpath($filename);
> 218 > 219 $params->{name} = $filename; > > > seems like overkill. Especially since the ar spec requires that the > filename be a basename only with slash appended to it, e.g. > “myfile.o/“. If you pass this in, splitpath() will return “” as the > filename. Safer to do: > > 217 (undef, undef, $params->{name}) = File::Spec-
> >splitpath($filename)
> 218 unless (exists $params->{name}); > > or otherwise: > > 217 (undef, undef, $filename) = File::Spec-
> >splitpath($filename);
> 218 > 219 $params->{name} = $filename . ‘/‘; > > per the Wiki article on AR: https://en.wikipedia.org/wiki/Ar_(Unix) > > System V ar uses a '/' character (0x2F) to mark the end of the > filename; this allows for the use of spaces without the use of an > extended filename.
Actually, doing the right thing in all cases turns out to be non-trivial. There are certain Gnu/Linux toolchains that don't follow the System V standard of taking the basename of the file and appending slash. Let me think about how to best handle this. Guessing the correct name when adding files is hard, but not mangling the names of an existing archive should be trivial. If we do a: $attr = $ar->get_contents($file); and then edit the contents, and put it back as: $bytes = $ar->add_data($file, $attr->{data}, $attr); then the name we're given (either $file or $attr->{name}) should be considered "canonical" and not get modified in any way. Similarly Debian uses "100644" for the modes, but most other systems use "644" instead. I'll need to think about adding tests to infer the correct values in all cases.