Subject: | Bug in frequency_distribution |
Date: | Thu, 17 Aug 2006 14:32:09 +0300 |
To: | bug-Statistics-Descriptive [...] rt.cpan.org |
From: | "Offer Kaye" <offer.kaye [...] gmail.com> |
Hi,
I've stumbled on a bug where I asked for 4 bins and got 5, this
happens if the max value has a trailing 0, but even then not always :(
Here for example is the output of a test script:
Show quoted text
> ~/temp/test_case.pl
$VAR1 = {
'0.2873' => 104,
'0.14395' => 212,
'0.43065' => 38,
'0.574' => 0,
'0.5740' => 1
};
As you can see 2 keys are the same, except for the trailing 0.
I'm using S::D version 2.6, perl 5.8.8 on an AMD station running RHEL 3.0 .
While I don't understand exactly why the bug happens, I've narrowed it
down to the code that saves the max value, and written a fix that
seems to solve the problem (makes sure the max value is always dealt
with as a number, not a string). Here is the output of "diff -u":
********************* DIFF START **********************************
--- Descriptive.pm.ORIG 2006-08-17 13:49:58.000000000 +0300
+++ Descriptive.pm 2006-08-17 14:12:52.000000000 +0300
@@ -341,8 +341,8 @@
$bins{$iter} = 0;
push @k, $iter; ##Keep the "keys" unstringified
}
- $bins{$self->{max}} = 0;
- push @k, $self->{max};
+ $bins{0+$self->{max}} = 0;
+ push @k, (0+$self->{max});
}
ELEMENT: foreach $element (@{$self->{data}}) {
********************* DIFF END **********************************
Here is the sample script that shows the problem. Sorry about the size
of the data set, when I make it smaller the problem goes away.
Weird...
********************* TEST-CASE START *******************************
#! /usr/bin/perl
use strict;
use warnings;
use Statistics::Descriptive;
my $data = Statistics::Descriptive::Full->new();
$data->add_data(
'0.5740',
'0.4045',
'0.4018',
'0.3856',
'0.3834',
'0.3825',
'0.3787',
'0.3735',
'0.3692',
'0.3679',
'0.3646',
'0.3630',
'0.3625',
'0.3615',
'0.3582',
'0.3553',
'0.3536',
'0.3533',
'0.3511',
'0.3477',
'0.3442',
'0.3428',
'0.3420',
'0.3403',
'0.3399',
'0.3385',
'0.3341',
'0.3312',
'0.3304',
'0.3187',
'0.3142',
'0.3133',
'0.3118',
'0.3045',
'0.2961',
'0.2939',
'0.2922',
'0.2918',
'0.2873',
'0.2854',
'0.2736',
'0.2735',
'0.2733',
'0.2720',
'0.2719',
'0.2712',
'0.2707',
'0.2696',
'0.2684',
'0.2656',
'0.2638',
'0.2567',
'0.2558',
'0.2546',
'0.2519',
'0.2504',
'0.2497',
'0.2457',
'0.2439',
'0.2429',
'0.2387',
'0.2372',
'0.2359',
'0.2343',
'0.2297',
'0.2287',
'0.2262',
'0.2256',
'0.2251',
'0.2230',
'0.2224',
'0.2204',
'0.2191',
'0.2153',
'0.2146',
'0.2140',
'0.2135',
'0.2126',
'0.2112',
'0.2112',
'0.2106',
'0.2097',
'0.2092',
'0.2089',
'0.2086',
'0.2083',
'0.2074',
'0.2009',
'0.1999',
'0.1984',
'0.1939',
'0.1925',
'0.1913',
'0.1909',
'0.1909',
'0.1901',
'0.1868',
'0.1865',
'0.1864',
'0.1853',
'0.1813',
'0.1811',
'0.1792',
'0.1781',
'0.1776',
'0.1771',
'0.1768',
'0.1763',
'0.1759',
'0.1745',
'0.1744',
'0.1702',
'0.1687',
'0.1686',
'0.1684',
'0.1682',
'0.1681',
'0.1680',
'0.1661',
'0.1660',
'0.1660',
'0.1631',
'0.1630',
'0.1616',
'0.1611',
'0.1605',
'0.1601',
'0.1599',
'0.1593',
'0.1588',
'0.1586',
'0.1586',
'0.1581',
'0.1564',
'0.1560',
'0.1559',
'0.1509',
'0.1499',
'0.1492',
'0.1491',
'0.1486',
'0.1458',
'0.1452',
'0.1423',
'0.1419',
'0.1406',
'0.1374',
'0.1366',
'0.1366',
'0.1346',
'0.1343',
'0.1332',
'0.1329',
'0.1322',
'0.1317',
'0.1313',
'0.1312',
'0.1307',
'0.1294',
'0.1294',
'0.1291',
'0.1288',
'0.1283',
'0.1275',
'0.1232',
'0.1214',
'0.1204',
'0.1197',
'0.1181',
'0.1174',
'0.1173',
'0.1165',
'0.1164',
'0.1164',
'0.1151',
'0.1147',
'0.1145',
'0.1124',
'0.1113',
'0.1113',
'0.1108',
'0.1105',
'0.1098',
'0.1091',
'0.1083',
'0.1069',
'0.1062',
'0.1061',
'0.1052',
'0.1043',
'0.1040',
'0.1040',
'0.1039',
'0.1038',
'0.1037',
'0.1034',
'0.1031',
'0.1031',
'0.1031',
'0.1031',
'0.1030',
'0.1027',
'0.1024',
'0.1023',
'0.1022',
'0.1013',
'0.1011',
'0.0978',
'0.0954',
'0.0954',
'0.0942',
'0.0940',
'0.0927',
'0.0921',
'0.0915',
'0.0903',
'0.0903',
'0.0881',
'0.0858',
'0.0851',
'0.0849',
'0.0845',
'0.0841',
'0.0839',
'0.0836',
'0.0833',
'0.0831',
'0.0822',
'0.0813',
'0.0813',
'0.0810',
'0.0809',
'0.0809',
'0.0808',
'0.0799',
'0.0796',
'0.0790',
'0.0785',
'0.0765',
'0.0765',
'0.0757',
'0.0731',
'0.0731',
'0.0730',
'0.0727',
'0.0719',
'0.0713',
'0.0706',
'0.0705',
'0.0704',
'0.0690',
'0.0679',
'0.0674',
'0.0670',
'0.0668',
'0.0657',
'0.0655',
'0.0643',
'0.0643',
'0.0642',
'0.0639',
'0.0628',
'0.0625',
'0.0623',
'0.0619',
'0.0612',
'0.0610',
'0.0606',
'0.0573',
'0.0569',
'0.0565',
'0.0559',
'0.0558',
'0.0555',
'0.0553',
'0.0552',
'0.0549',
'0.0538',
'0.0519',
'0.0501',
'0.0499',
'0.0484',
'0.0468',
'0.0458',
'0.0457',
'0.0456',
'0.0447',
'0.0445',
'0.0424',
'0.0402',
'0.0396',
'0.0384',
'0.0375',
'0.0370',
'0.0362',
'0.0361',
'0.0358',
'0.0354',
'0.0352',
'0.0348',
'0.0348',
'0.0347',
'0.0346',
'0.0327',
'0.0323',
'0.0308',
'0.0304',
'0.0302',
'0.0281',
'0.0278',
'0.0275',
'0.0257',
'0.0247',
'0.0235',
'0.0233',
'0.0225',
'0.0223',
'0.0218',
'0.0213',
'0.0206',
'0.0206',
'0.0206',
'0.0206',
'0.0206',
'0.0205',
'0.0204',
'0.0203',
'0.0199',
'0.0180',
'0.0177',
'0.0174',
'0.0171',
'0.0164',
'0.0156',
'0.0133',
'0.0133',
'0.0118',
'0.0117',
'0.0097',
'0.0094',
'0.0090',
'0.0085',
'0.0081',
'0.0077',
'0.0065',
'0.0061',
'0.0061',
'0.0045',
'0.0044',
'0.0040',
'0.0040',
'0.0039',
'0.0022',
'0.0018',
'0.0006'
);
my %freqs = $data->frequency_distribution(4);
use Data::Dumper;
print Dumper(\%freqs);
********************* TEST-CASE END **********************************
If you manage to analyze why this bug happens, I would appreciate a note :)
Regards,
--
Offer Kaye