Subject: | Problem setting newline |
Date: | Mon, 29 Feb 2016 20:29:07 +0000 |
To: | "bug-Archive-Zip [...] rt.cpan.org" <bug-Archive-Zip [...] rt.cpan.org> |
From: | "Smith, Daniel E (UK)" <daniel.smith [...] baesystems.com> |
Hi, I have a problem with archive.zip::MemberRead and a suggested solution
I have a perl program which reads zip files that may have unix or windows line ends, I determine these by looking at the first 10000 characters and counting each type of line end. Then I need to set this up in archive::zip (because otherwise it assumes the newline is determinable by the host machine type)
There is the handy method MemberRead::SetLineEnd($eol_character) which my script then calls, this method sets this internal global variable $nl to the supplied character
However this does not change the behaviour of MemberRead because the internal line recognising regexp is cached in $self->{sep_as_re} inside the function input_record_separator, and this function is not called when $nl changes.
As a workaround I call the method input_record_separator('') manually after setting $nl up, but this also does not work on windows because it assumes that TWO $nls comprise an input record separator in the line:
Return "(?:$nl){2,}";
Therefore, as well as calling two methods to get one thing done I have had to hack the input_record_separator function so the line says:
Return "(?:$nl)";
I recommend that archive.zip::member read should determine the line end character correctly at startup by looking at the first N characters in the file via e.g.:
My $buffer;
$first_10000_chars = $fh->read($buffer,10000);
And then:
If ($first_10000_chars =~ /\015\012/) {
$self->{eol_char} = "015\012"; #its a pc file
}elsif( $first_10000_chars =~ /\015/) {
$self->{eol_char} = "/\015/"; #its a mac file
}elsif ($first_10000_chars =~ /\012/) {
$self->{eol_char} = "/\012/"; #its a unix file
}
And then the stored regex should be corrected:
$self->{eol_regexp} = qr/$eol_char/;
Hope that makes sense!
By the way, I am sorry I have sent this to Archive-zip, but member_read itself seems to have a lot of unresolved bugs so I thought a response was unlikely.
Daniel Smith
Radar Systems Engineer, Sampson
BAE Systems Maritime Services,
Newport Road,
Cowes,
Isle of Wight, PO31 8PF, UK
*: +44 (0) 198320 2937
*: daniel.smith@baesystems.com<mailto:daniel.smith@baesystems.com> / www.baesystems.com<http://www.baesystems.com/>
BAE Systems Surface Ships Limited
Registered Office: Warwick House, PO Box 87, Farnborough Aerospace Centre,
Farnborough, Hampshire, GU14 6YU, UK
Registered in England & Wales, Registration No: 06160534
Connect with BAE Systems: [cid:image001.gif@01D0355E.1F422190] <http://baes.podbean.com/> [cid:image002.png@01D0355E.1F422190] <https://www.facebook.com/BAESystemsplc> [cid:image003.png@01D0355E.1F422190] <http://twitter.com/BAES_MARITIME> [cid:image004.png@01D0355E.1F422190] <https://www.linkedin.com/company/5087279?trk=tyah&trkInfo=tarId:1421833059327,tas:bae%20systems%20mar,idx:2-1-2> [cid:image005.png@01D0355E.1F422190] <http://www.youtube.com/user/BAESystemsplc> [cid:image006.png@01D0355E.1F422190] <https://plus.google.com/110790584610874995971> www.baesystems.com<http://www.baesystems.com/>
****************************************************************************
BAE SYSTEMS PROPRIETARY
This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.
****************************************************************************
Message body is not shown because it is too large.