Subject: | error code 25 returned with message Can't take sqrt of -2.22045e-16 |
Date: | Tue, 8 Dec 2015 14:45:27 -0700 |
To: | <bug-Statistics-Discrete [...] rt.cpan.org> |
From: | "Seth Williams" <seth.williams [...] galaxysemi.com> |
I think the calculation of variance is incorrect. The variance should NEVER
be negative. I'm no statistician but below you can see the edits I made
which I think is the more correct calculation and avoids a negative
variance. I left the original lines commented. Can someone verify my
changes?
# Reason for changes:
From what I read online, especially at this site
(http://www.statcan.gc.ca/edu/power-pouvoir/ch12/5214891-eng.htm#a2) when
using a frequency table you should square the difference between the mean
and result. This will avoid the negative variance, and seems to be the
correct calculation.
# Summary of changes:
I changed this line: $cumul_value += ($v**2) *
$self->{"data_frequency"}{$v};
To this: $cumul_value += (($v - $mean)**2) *
$self->{"data_frequency"}{$v};
I changed this line: $self->{"stats"}{"Desc"}{"variance"} =
$square_mean - ($mean**2);
To this:
$self->{"stats"}{"Desc"}{"variance"} = $square_mean;
# Entire subroutine variance() from Discrete.pm:
sub variance {
my $self = shift;
if(!defined($self->{"stats"}{"Desc"}{"variance"})) {
my $mean = $self->mean();
my $count = $self->count();
my $cumul_value = 0;
my $square_mean = 0;
my $v;
# key is the measurement, value is the number of occurences
foreach $v(keys %{$self->{"data_frequency"}}) {
#$cumul_value += ($v**2) * $self->{"data_frequency"}{$v};
$cumul_value += (($v - $mean)**2) * $self->{"data_frequency"}{$v};
}
if($count > 0) {
$square_mean = $cumul_value / $count;
}
#$self->{"stats"}{"Desc"}{"variance"} = $square_mean - ($mean**2);
$self->{"stats"}{"Desc"}{"variance"} = $square_mean;
}
return $self->{"stats"}{"Desc"}{"variance"};
}