Skip Menu |

This queue is for tickets about the Statistics-Lite CPAN distribution.

Report information
The Basics
Id: 22697
Status: resolved
Priority: 0/
Queue: Statistics-Lite

People
Owner: brianiacus [...] yahoo.com
Requestors: cpan [...] clotho.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.0
Fixed in: 3.4



Subject: Tests fail
The tests are failing. However, because you are using Test.pm instead of Test::More, CPAN.pm does not detect the failures. I recommend that the tests be corrected (I believe the code is right and the tests are wrong) and that the module be updated to use Test::More, which is actively maintained. Below is a snippet from the end of the tests. This is Perl 5.8.6 on MacOSX 10.4. -- Chris ok 17 ok 18 not ok 19 # Test 19 got: "0.666666666666667" (test.pl at line 49) # Expected: "1" # test.pl line 49 is: ok($stats{variance},1); not ok 20 # Test 20 got: "0.816496580927726" (test.pl at line 50) # Expected: "1" # test.pl line 50 is: ok($stats{stddev},1);
From: Alexandr Ciornii <alexchorny [...] gmail.com>
On Oct 30 09:57:09 2006, CLOTHO wrote: Made following changes: - tests switched to Test::More - Fixed bug in 'variance' - Fixed tests for 'variance' To author: You may simply upload attached distribution to PAUSE. ------- Alexandr Ciornii, http://chorny.net
--- Lite.pm.dist Sun Mar 26 19:48:49 2006 +++ Lite.pm Tue Jan 23 14:18:12 2007 @@ -87,7 +87,7 @@ return unless @_; return 0 unless @_ > 1; my $mean= mean @_; - return (sum map { ($_ - $mean)**2 } @_) / $#_; + return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); } sub stddev
use strict; use Test::More tests => 21; use_ok('Statistics::Lite', ':all'); is(min(1,2,3),1,'min'); is(max(1,2,3),3,'max'); is(range(1,2,3),2,'range'); is(sum(1,2,3),6,'sum'); is(count(1,2,3),3); is(mean(1,2,3),2); is(median(1,2,3),2); is(mode(1,2,3),2); ok(abs(variance(1,2,3)-0.66666666666666)<0.0000000001,'variance'); ok(abs(stddev(1,2,3)-0.81649658092772)<0.0000000001,'stddev'); my %stats= statshash(1,2,3); is($stats{min},1); is($stats{max},3); is($stats{range},2); is($stats{sum},6); is($stats{count},3); is($stats{mean},2); is($stats{median},2); is($stats{mode},2); ok(abs($stats{variance}-0.66666666666666)<0.0000000001,'variance'); ok(abs($stats{stddev}-0.81649658092772)<0.0000000001,'stddev');
Download Statistics-Lite-2.01.tar.gz
application/x-gzip 3.2k

Message body not shown because it is not plain text.

package Statistics::Lite; use strict; use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS); require Exporter; $VERSION = '2.0'; @ISA = qw(Exporter); @EXPORT = (); @EXPORT_OK = qw(min max range sum count mean median mode variance stddev statshash statsinfo); %EXPORT_TAGS= ( all => [ @EXPORT_OK ], funcs => [qw<min max range sum count mean median mode variance stddev>], stats => [qw<statshash statsinfo>], ); sub count { return scalar @_; } sub min { return unless @_; return $_[0] unless @_ > 1; my $min= shift; foreach(@_) { $min= $_ if $_ < $min; } return $min; } sub max { return unless @_; return $_[0] unless @_ > 1; my $max= shift; foreach(@_) { $max= $_ if $_ > $max; } return $max; } sub range { return unless @_; return 0 unless @_ > 1; return abs($_[1]-$_[0]) unless @_ > 2; my $min= shift; my $max= $min; foreach(@_) { $min= $_ if $_ < $min; $max= $_ if $_ > $max; } return $max - $min; } sub sum { return unless @_; return $_[0] unless @_ > 1; my $sum; foreach(@_) { $sum+= $_; } return $sum; } sub mean { return unless @_; return $_[0] unless @_ > 1; return sum(@_)/scalar(@_); } sub median { return unless @_; return $_[0] unless @_ > 1; @_= sort{$a<=>$b}@_; return $_[$#_/2] if @_&1; my $mid= @_/2; return ($_[$mid-1]+$_[$mid])/2; } sub mode { return unless @_; return $_[0] unless @_ > 1; my %count; foreach(@_) { $count{$_}++; } my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } return mean(keys %count); } sub variance { return unless @_; return 0 unless @_ > 1; my $mean= mean @_; return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); } sub stddev { return unless @_; return 0 unless @_ > 1; return sqrt variance @_; } sub statshash { return unless @_; return ( count => 1, min => $_[0], max => $_[0], range => 0, sum => $_[0], mean => $_[0], median => $_[0], mode => $_[0], variance => 0, stddev => 0, ) unless @_ > 1; my $count= scalar(@_); @_= sort{$a<=>$b}@_; my $median; if(@_&1) { $median= $_[$#_/2]; } else { my $mid= @_/2; $median= ($_[$mid-1]+$_[$mid])/2; } my $sum= 0; my %count; foreach(@_) { $sum+= $_; $count{$_}++; } my $mean= $sum/$count; my $variance= mean map { ($_ - $mean)**2 } @_; my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } return ( count => $count, min => $_[0], max => $_[-1], range => ($_[-1] - $_[0]), sum => $sum, mean => $mean, median => $median, mode => mean(keys %count), variance => $variance, stddev => sqrt($variance), ); } sub statsinfo { my %stats= statshash(@_); return <<"."; min = $stats{min} max = $stats{max} range = $stats{range} sum = $stats{sum} count = $stats{count} mean = $stats{mean} median = $stats{median} mode = $stats{mode} variance = $stats{variance} stddev = $stats{stddev} . } 1; __END__ =head1 NAME Statistics::Lite - Small stats stuff. =head1 SYNOPSIS use Statistics::Lite qw(:all); $min= min @data; $mean= mean @data; %data= statshash @data; print "sum= $data{sum} stddev= $data{stddev}\n"; print statsinfo(@data); =head1 DESCRIPTION This module is a lightweight, functional alternative to larger, more complete, object-oriented statistics packages. As such, it is likely to be better suited, in general, to smaller data sets. This is also a module for dilettantes. When you just want something to give some very basic, high-school-level statistical values, without having to set up and populate an object first, this module may be useful. =over 6 =head2 NOTE This version now uses unbiased estimators (previous versions used biased estimators) for variance and standard deviation. To get the same biased C<stddev()> and C<variance()> available in previous versions, simply add a zero to the data set: $stddev_biased= stddev 0, @data; =back =head1 FUNCTIONS =over 4 =item C<min(@data)>, C<max(@data)>, C<range(@data)>, C<sum(@data)>, C<count(@data)> Return the minimum value, maximum value, range (max - min), sum, or count of values in C<@data>. (Count simply returns C<scalar(@data)>.) =item C<mean(@data)>, C<median(@data)>, C<mode(@data)> Calculates the mean, median, or mode average of the values in C<@data>. (In the event of ties in the mode average, their mean is returned.) =item C<variance(@data)>, C<stddev(@data)> Return the standard deviation or variance of C<@data>. =item C<statshash(@data)> Returns a hash whose keys are the names of all the functions listed above, with the corresponding values, calculated for the data set. =item C<statsinfo(@data)> Returns a string describing the data set, using the values detailed above. =back =head2 Import Tags The C<:all> import tag imports all functions from this module into the current namespace (use with caution). To import the individual statistical funcitons, use the import tag C<:funcs>; use C<:stats> to import C<statshash(@data)> and C<statsinfo(@data)>. =head1 AUTHOR Brian Lalonde E<lt>brian@webcoder.infoE<gt> =head1 SEE ALSO perl(1). =cut
Show quoted text
> --- Lite.pm.dist Sun Mar 26 19:48:49 2006 > +++ Lite.pm Tue Jan 23 14:18:12 2007 > @@ -87,7 +87,7 @@ > return unless @_; > return 0 unless @_ > 1; > my $mean= mean @_; > - return (sum map { ($_ - $mean)**2 } @_) / $#_; > + return (sum map { ($_ - $mean)**2 } @_) / scalar(@_); > }
This actually depends on whether you want your variance to include sampling as a dependent quantity (sampling variance) or not. That determines whether the denominator is N or N-1. In the real world, you almost always want N-1. N is only appropriate when you can be sure that your sample is representative of the entire population. So, I recommend that this particular part of the patch be rejected. Alternatively, one could include both implementations of variance (N and N-1), but that kind of defeats the ::Lite part of the module. More info: http://en.wikipedia.org/wiki/Standard_deviation#Estimating_population_standard_deviation_from_sample_standard_deviation Chris
From: sabol [...] alderaan.gsfc.nasa.gov
On Tue Jan 23 09:55:54 2007, CDOLAN wrote: Show quoted text
> This actually depends on whether you want your variance to include > sampling as a dependent quantity (sampling variance) or not. That > determines whether the denominator is N or N-1. In the real world, you > almost always want N-1. N is only appropriate when you can be sure that > your sample is representative of the entire population. > > So, I recommend that this particular part of the patch be rejected. > Alternatively, one could include both implementations of variance (N and > N-1), but that kind of defeats the ::Lite part of the module.
I agree with Chris that this patch be rejected. The docs make it clear that the variance and standard deviations computed by Statistics::Lite are the biased (N-1) kind and a trivial workaround for computing the unbiased kind is provided in the documentation. The tests are correct; it is the statshash() code that is wrong. I recommend the following patch: --- Lite.pm.orig Sun Mar 26 11:48:49 2006 +++ Lite.pm Thu Feb 1 03:00:08 2007 @@ -122,7 +122,7 @@ my %count; foreach(@_) { $sum+= $_; $count{$_}++; } my $mean= $sum/$count; - my $variance= mean map { ($_ - $mean)**2 } @_; + my $variance= (sum map { ($_ - $mean)**2 } @_) / $#_; my $maxhits= max(values %count); foreach(keys %count) { delete $count{$_} unless $count{$_} == $maxhits; } Also, I agree that it would be nice if the test suite were updated to use Test::More.
Going to mark this as fixed. If this is still happening, reopen or refile, please.