Subject: | kmeans does not always return same result |
Date: | Thu, 26 Jun 2014 11:41:40 -0600 |
To: | bug-pdl-stats [...] rt.cpan.org |
From: | Derek Lamb <derek [...] boulder.swri.edu> |
Hi Maggie,
I am trying to install PDL::Stats 0.6.5 on my Mac, with PDL-2.007_03. I got a test failure in stats-kmeans.t at line 180, which runs t_kmeans_bad(). Tracing it back, I found that kmeans does not always give the same results for the same input data. In the example below, ATTEMPTs 1-3 give the same result (which would cause t_kmeans_bad() to fail), and then ATTEMPT 4 gives a different result, which would cause t_kmeans_bad() to pass. FWIW, the execution time of ATTEMPT 4 was longer—it seemed to hang for a second before returning the result. Maybe there’s a memory issue or something. When I ran a loop of 100 iterations of t_kmeans_bad(), I got a result that was approx 1E-8 (pass) every time, and then when I ran it one more time I got a result that was approximately 0.772 (fail). Maybe this has to do with 64-bit support in PDL. $pdl -V attached.
best,
Derek
derek@localhost:PDL-Stats-0.6.5-75bTju$ pdl -Mblib
Show quoted text
pdl> use PDL::Stats::Basic
pdl> use PDL::Stats::Kmeans
pdl> $data = sequence 7, 3;
pdl> $data = $data->setbadat(4,0);
###########ATTEMPT 1:
Show quoted textpdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null
FULL => 0
NCLUS => 2
NSEED => 7
NTRY => 10
V => 1
overall ms: 4.15740740740741
iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831]
Show quoted textpdl> p $m{ms}
[
[ 1.5555556 0.66666667]
[ 1.25 0.66666667]
[ 1.25 0.66666667]
]
###########ATTEMPT 2:
Show quoted textpdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null
FULL => 0
NCLUS => 2
NSEED => 7
NTRY => 10
V => 1
overall ms: 4.15740740740741
iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831]
Show quoted textpdl> p $m{ms}
[
[ 1.5555556 0.66666667]
[ 1.25 0.66666667]
[ 1.25 0.66666667]
]
###########ATTEMPT 3:
Show quoted textpdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null
FULL => 0
NCLUS => 2
NSEED => 7
NTRY => 10
V => 1
overall ms: 4.15740740740741
iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831]
Show quoted textpdl> p $m{ms}
[
[ 1.5555556 0.66666667]
[ 1.25 0.66666667]
[ 1.25 0.66666667]
]
###########ATTEMPT 4:
Show quoted textpdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null
FULL => 0
NCLUS => 2
NSEED => 7
NTRY => 10
V => 1
overall ms: 4.15740740740741
iter 0 R2 [0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154]
Show quoted textpdl> p $m{ms}
[
[ 1.25 0.25]
[ 1.25 0.66666667]
[ 1.25 0.66666667]
]
Message body is not shown because sender requested not to inline it.