Skip Menu |

This queue is for tickets about the File-Tail-App CPAN distribution.

Report information
The Basics
Id: 75606
Status: new
Priority: 0/
Queue: File-Tail-App

People
Owner: Nobody in particular
Requestors: kcason [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 0.4
  • v0.0.1
  • v0.0.2
  • v0.0.3
Fixed in: (no value)



Discovered this bug in v0.0.3 but it is also in v0.4. Environment: Linux 2.6.9-89.0.9.ELsmp #1 SMP Wed Aug 19 08:07:15 EDT 2009 i686 i686 i386 GNU/Linux This is perl, v5.8.5 built for i386-linux-thread-multi Situation: The lastrun file values, if zero, sometimes get the value "0 but true". This is a Perl feature that allows boolean comparisons but, in this case, causes the md5 check to process the first record in the input file over and over and over... There is a discussion of "0 but true" here: http://stackoverflow.com/questions/5318011/why-does-perl-think-0-but- true-is-a-number In this case, the lastrun file looks something like this: "0 but true"-1803087-145add8648ec9426016a019c3d0693aa-42 but was expected to look like this: 0-1803087-145add8648ec9426016a019c3d0693aa-42 The reason is the lastrun file values are expected to be integers, but "0 but true" is not an integer. Therefore, the following scenario presents itself: 1) A variable is set to zero, ex: fileStart=0; 2) That variable is inadvertantly written to the lastrun file as boolean, which is valid as "0 but true", but not an integer. 3) The next iteration through the program the lastrun file is read, ex: lastFileStart="0 but true"; 4) A comparison of: if (fileStart eq lastFileStart) is expected to be if (0 eq 0) <-- which is true is actually if (0 eq "0 but true") <-- which is false Proposal: Use the int() function to ensure lastrun values are integers instead of boolean. For instance, see line #4 below: 1 sub _set_lastrun_data { 2 my($new_posi, $file_ident, $md5_chk, $md5_len, $cur_file) = @_; 3 $md5_chk ||= 0; 4 $new_posi = int("$new_posi"); # avoid "0 but true" situation 5 6 open my $curpos_fh, '>', $cur_file 7 or Carp::croak "Could not write $cur_file: $!"; 8 print {$curpos_fh} qq($new_posi-$file_ident-$md5_chk-$md5_len); 9 close $curpos_fh; } I've tested this and it appears to fix the bug. There are several places in the code where I added the int() function, and that code is include as an attachement with the assignment of v0.5.
Subject: App.pm
package File::Tail::App; use strict; use warnings; use File::Tail; use Carp (); sub import { my $caller = caller(); no strict 'refs'; *{ $caller . '::tail_app' } = \&tail_app; } $File::Tail::App::VERSION = '0.5'; sub File::Tail::seek_to { my($tail, $seek_to) = @_; # work around possible "0 but true" value for $seek_to if ($seek_to) { $seek_to = int("$seek_to"); } Carp::croak 'argument to seek_to() must be all digits' if $seek_to !~ m/^\d+$/; $tail->{'curpos'} = sysseek $tail->{'handle'}, $seek_to, 0; } sub File::Tail::app { my($tail, $args_ref) = @_; $args_ref->{'line_handler'} = sub { print shift; } if !$args_ref->{'line_handler'}; Carp::croak '"line_handler" must be a code ref' if ref $args_ref->{'line_handler'} ne 'CODE'; $args_ref->{'verbose'} = 0 if !defined $args_ref->{'verbose'}; my $lastrun_file = $args_ref->{'lastrun_file'}; my $do_md5_check = $args_ref->{'do_md5_check'} || 0; my($previous_position, $file_ident, $md5_chk, $md5_len) = defined $lastrun_file ? _get_lastrun_data($tail->{'input'}, $lastrun_file, $do_md5_check, $tail) : ('','','',''); $args_ref->{'seek_on_zero'} = 1 if !exists $args_ref->{'seek_on_zero'}; if(exists $args_ref->{'seek_to'} && defined $args_ref->{'seek_to'}) { if($args_ref->{'seek_to'} eq '0' && $args_ref->{'seek_on_zero'}) { $tail->seek_to($args_ref->{'seek_to'}); } elsif($args_ref->{'seek_to'} =~ m/^\d+$/) { $tail->seek_to($args_ref->{'seek_to'}); } } my $start_size = -s $tail->{'input'}; my $start_handle = $tail->{'handle'}; $tail->seek_to($previous_position) if $previous_position ne $start_size && $previous_position ne ''; while( defined( my $line = $tail->read() ) ) { my @stat = stat $tail->{'input'}; my $replaced = 0; if(-s $tail->{'input'} < $start_size) { Carp::carp "$tail->{'input'} was truncated: " . sysseek($tail->{'handle'},0,1) if $args_ref->{'verbose'}; $tail->seek_to(length $line); $replaced++; } elsif($do_md5_check && $md5_chk ne _get_md5_info($tail, $md5_len, $do_md5_check)) { Carp::carp "MD5 Check changed: " . sysseek($tail->{'handle'},0,1) if $args_ref->{'verbose'}; $replaced++; } if($replaced) { $tail->seek_to(length $line); $start_size = $stat[7]; $md5_len = $stat[7] < 42 ? $stat[7] : 42; $md5_chk = _get_md5_info($tail, $md5_len, $do_md5_check); } # do simple checks then tell them about it & reset some vars if needed if($stat[1] ne $file_ident && $file_ident) { Carp::carp "$tail->{'input'} was replaced: " . sysseek($tail->{'handle'},0,1) if $args_ref->{'verbose'}; $file_ident = $stat[1]; } if($start_handle ne $tail->{'handle'}) { # checking descriptor via fileno() is same check but numerically Carp::carp "descriptor/handle changed: " . sysseek($tail->{'handle'},0,1) if $args_ref->{'verbose'}; $start_handle = $tail->{'handle'}; } $args_ref->{'line_handler'}->($line); _set_lastrun_data( sysseek($tail->{'handle'},0,1), $file_ident, $md5_chk, $md5_len, $lastrun_file ) if defined $lastrun_file; Carp::carp "$tail->{'input'} is at : " . sysseek($tail->{'handle'},0,1) if $args_ref->{'verbose'} > 1; } } sub tail_app { my ($args_ref) = @_; Carp::croak 'tail_app() requires a hashref as its first argument' if ref $args_ref ne 'HASH'; Carp::croak 'missing "new" key from tail_app arg' if !exists $args_ref->{'new'}; Carp::croak '"new" must be an array ref' if ref $args_ref->{'new'} ne 'ARRAY'; my $tail = File::Tail->new(@{ $args_ref->{'new'} }) or Carp::croak "Could not create File::Tail object: $!"; $tail->app({ 'line_handler' => $args_ref->{'line_handler'}, 'verbose' => $args_ref->{'verbose'}, 'seek_to' => $args_ref->{'seek_to'}, 'seek_on_zero' => $args_ref->{'seek_on_zero'}, 'lastrun_file' => $args_ref->{'lastrun_file'}, 'do_md5_check' => $args_ref->{'do_md5_check'}, }); } sub _get_lastrun_data { my($tail_file, $cur_file, $do_md5_check, $tail) = @_; my @stat = stat $tail_file; my $previous_position = 0; # start at zero if 1st time ... my $cur_tail_ident = $stat[1]; #:$stat[10]# or if file's changed my $_md5_len = $stat[7] < 42 ? $stat[7] : 42; my $_md5_chk = _get_md5_info($tail, $_md5_len, $do_md5_check); my($curpos, $logged_ident, $md5_chk, $md5_len) = ('','',$_md5_chk,$_md5_len); if(-e $cur_file) { open my $curat_fh, '<', $cur_file or Carp::croak "Could not read $cur_file: $!"; chomp(my $first_line = <$curat_fh>); close $curat_fh; ($curpos, $logged_ident, $md5_chk, $md5_len) = split /-/, $first_line; $curpos = 0 if $logged_ident ne $cur_tail_ident; $previous_position = int($curpos); if($do_md5_check) { $md5_len = $_md5_len if !defined $md5_len || !$md5_len; $md5_chk = $_md5_chk if !defined $md5_chk || !$md5_chk; } } return ($previous_position, $cur_tail_ident, $md5_chk, $md5_len); } sub _set_lastrun_data { my($new_posi, $file_ident, $md5_chk, $md5_len, $cur_file) = @_; $md5_chk ||= 0; $new_posi = int("$new_posi"); # avoid "0 but true" situation open my $curpos_fh, '>', $cur_file or Carp::croak "Could not write $cur_file: $!"; print {$curpos_fh} qq($new_posi-$file_ident-$md5_chk-$md5_len); close $curpos_fh; } sub _get_md5_info { my($tail, $md5_len, $do_md5_check) = @_; return if !$do_md5_check; require Digest::MD5; # only do the module if needed my $data_to_md5 = ''; # to avoid uninitialized value warnings my $origpos = sysseek($tail->{'handle'},0,1); $tail->seek_to(0); sysread $tail->{'handle'}, $data_to_md5, $md5_len; $tail->seek_to($origpos); return Digest::MD5::md5_hex($data_to_md5); } 1; __END__ =head1 NAME File::Tail::App - Perl extension for making apps that tail files =head1 SYNOPSIS use File::Tail::App; tail_app({ 'new' => ['logfile.log'], 'line_handler' => \&_wag_tail, }); sub _wag_tail { my($line) = @_; # do what you want with the $line } =head1 DESCRIPTION Adds two methods for a File::Tail object and one function. =head2 tail_app() As in the SYNOPSIS, creates an app that processes a file's tail line by line. Its only arg is a hashref with these keys: =over 4 =item new Required. This is an array ref of the array you give to File::Tail's new method. =item line_handler This is a code ref that takes the current line in a string as its only argument. if you do not specify this the line is simply printed out. =item seek_to Before you start processing the file you can sysseek to this part of the file. Like sysseek(), its value is in bytes. =item seek_on_zero If this is false and seek_to is 0, then don't $tail->seek_to() it, Default is 1. =item lastrun_file This is a file that is used to track the current position. It is recommended to use this so that if the program is terminated unexpectedly it can resume exactly where it left off so you will avoid duplicated or missing data. If you'd rather do this your self the you'll need to get the current position before the call to app and send it via the seek_to key and log the new length at the end of your line processing function. You'll also need to be able to tell if the file you're tailing changes drastically because say you are currently at 12345 and the script is killed and then the log file is truncated before it is restarted. You don't want to start at 12345 in that case, you want to start at 0. The lastrun_file parameter handles all of that and more for you. Feel free to take a look at how it works if you need to write your own for some reason, like you want to use SQL DB instead of a file, etc etc... =item do_md5_check If using lastrun_file also do a check on the MD5 sum of a small part of data in the beginning of the file to see if its been truncated and handle appropriately. (Thanks to Ben Thomas for the MD5 idea!!) =back =head2 $tail->app() Same args as tail_app except for the "new" key since you've already create the File::Tail object. =head2 $tail->seek_to() Given a digit, it will move to that position in the $tail object's handle. Useful for resuming where you left off. Like sysseek(), its argument is in bytes. =head2 EXPORT None by default. tail_app can be exported. =head1 EXAMPLE This example will process x.log as its updated. To avoid double logging it only allows one instance of itself to be run (See L<Unix::PID>). To avoid missing or double processing data if a problem arises it uses a lastrun file. So this is a pretty sturdy, reliable, and easy to code log processor: #!/usr/bin/perl use strict; use warnings; use Unix::PID '/var/run/xlogr.pid'; use File::Tail::App qw(tail_app); tail_app({ 'new' => ['x.log'], 'line_handler' => \&_wag_tail, 'lastrun_file' => 'x.lastrun', }); sub _wag_tail { my($line) = @_; # do what you want with the $line } =head1 BUGS This was stress tested pretty vigorously for various circumstances so it should be pretty solid. If you do come across a problem please contact me with the code and steps neccessary to reproduce the bug so I can release a fix ASAP. =head1 SEE ALSO L<File::Tail> =head1 AUTHOR Daniel Muey, L<http://drmuey.com/cpan_contact.pl> =head1 COPYRIGHT AND LICENSE Copyright 2005 by Daniel Muey This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut