Skip Menu |

This queue is for tickets about the PDL-Stats CPAN distribution.

Report information
The Basics
Id: 96742
Status: resolved
Priority: 0/
Queue: PDL-Stats

People
Owner: Nobody in particular
Requestors: derek [...] boulder.swri.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: kmeans does not always return same result
Date: Thu, 26 Jun 2014 11:41:40 -0600
To: bug-pdl-stats [...] rt.cpan.org
From: Derek Lamb <derek [...] boulder.swri.edu>
Hi Maggie, I am trying to install PDL::Stats 0.6.5 on my Mac, with PDL-2.007_03. I got a test failure in stats-kmeans.t at line 180, which runs t_kmeans_bad(). Tracing it back, I found that kmeans does not always give the same results for the same input data. In the example below, ATTEMPTs 1-3 give the same result (which would cause t_kmeans_bad() to fail), and then ATTEMPT 4 gives a different result, which would cause t_kmeans_bad() to pass. FWIW, the execution time of ATTEMPT 4 was longer—it seemed to hang for a second before returning the result. Maybe there’s a memory issue or something. When I ran a loop of 100 iterations of t_kmeans_bad(), I got a result that was approx 1E-8 (pass) every time, and then when I ran it one more time I got a result that was approximately 0.772 (fail). Maybe this has to do with 64-bit support in PDL. $pdl -V attached. best, Derek derek@localhost:PDL-Stats-0.6.5-75bTju$ pdl -Mblib Show quoted text
pdl> use PDL::Stats::Basic pdl> use PDL::Stats::Kmeans pdl> $data = sequence 7, 3; pdl> $data = $data->setbadat(4,0);
###########ATTEMPT 1: Show quoted text
pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null FULL => 0 NCLUS => 2 NSEED => 7 NTRY => 10 V => 1 overall ms: 4.15740740740741 iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] Show quoted text
pdl> p $m{ms}
[ [ 1.5555556 0.66666667] [ 1.25 0.66666667] [ 1.25 0.66666667] ] ###########ATTEMPT 2: Show quoted text
pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null FULL => 0 NCLUS => 2 NSEED => 7 NTRY => 10 V => 1 overall ms: 4.15740740740741 iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] Show quoted text
pdl> p $m{ms}
[ [ 1.5555556 0.66666667] [ 1.25 0.66666667] [ 1.25 0.66666667] ] ###########ATTEMPT 3: Show quoted text
pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null FULL => 0 NCLUS => 2 NSEED => 7 NTRY => 10 V => 1 overall ms: 4.15740740740741 iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] Show quoted text
pdl> p $m{ms}
[ [ 1.5555556 0.66666667] [ 1.25 0.66666667] [ 1.25 0.66666667] ] ###########ATTEMPT 4: Show quoted text
pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
CNTRD => Null FULL => 0 NCLUS => 2 NSEED => 7 NTRY => 10 V => 1 overall ms: 4.15740740740741 iter 0 R2 [0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154] Show quoted text
pdl> p $m{ms}
[ [ 1.25 0.25] [ 1.25 0.66666667] [ 1.25 0.66666667] ]

Message body is not shown because sender requested not to inline it.

Thanks so much for checking, Derek! The classical K-means is non-deterministic in nature (the initial assignment of elements to clusters is done randomly), so sometimes it can give a non-optimal solution. I should change the test so it doesn't fail like that though. Best, Maggie On 2014-06-26 13:42:00, derek@boulder.swri.edu wrote: Show quoted text
> Hi Maggie, > > I am trying to install PDL::Stats 0.6.5 on my Mac, with PDL-2.007_03. > I got a test failure in stats-kmeans.t at line 180, which runs > t_kmeans_bad(). Tracing it back, I found that kmeans does not always > give the same results for the same input data. In the example below, > ATTEMPTs 1-3 give the same result (which would cause t_kmeans_bad() to > fail), and then ATTEMPT 4 gives a different result, which would cause > t_kmeans_bad() to pass. FWIW, the execution time of ATTEMPT 4 was > longer—it seemed to hang for a second before returning the result. > Maybe there’s a memory issue or something. When I ran a loop of 100 > iterations of t_kmeans_bad(), I got a result that was approx 1E-8 > (pass) every time, and then when I ran it one more time I got a result > that was approximately 0.772 (fail). Maybe this has to do with 64-bit > support in PDL. $pdl -V attached. > > best, > Derek > > derek@localhost:PDL-Stats-0.6.5-75bTju$ pdl -Mblib
> pdl> use PDL::Stats::Basic > pdl> use PDL::Stats::Kmeans > pdl> $data = sequence 7, 3; > pdl> $data = $data->setbadat(4,0);
> > ###########ATTEMPT 1:
> pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
> CNTRD => Null > FULL => 0 > NCLUS => 2 > NSEED => 7 > NTRY => 10 > V => 1 > overall ms: 4.15740740740741 > iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 > 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] >
> pdl> p $m{ms}
> > [ > [ 1.5555556 0.66666667] > [ 1.25 0.66666667] > [ 1.25 0.66666667] > ] > > ###########ATTEMPT 2:
> pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
> CNTRD => Null > FULL => 0 > NCLUS => 2 > NSEED => 7 > NTRY => 10 > V => 1 > overall ms: 4.15740740740741 > iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 > 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] >
> pdl> p $m{ms}
> > [ > [ 1.5555556 0.66666667] > [ 1.25 0.66666667] > [ 1.25 0.66666667] > ] > > ###########ATTEMPT 3:
> pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
> CNTRD => Null > FULL => 0 > NCLUS => 2 > NSEED => 7 > NTRY => 10 > V => 1 > overall ms: 4.15740740740741 > iter 0 R2 [0.75723831 0.75723831 0.75723831 0.75723831 0.75723831 > 0.75723831 0.75723831 0.75723831 0.75723831 0.75723831] >
> pdl> p $m{ms}
> > [ > [ 1.5555556 0.66666667] > [ 1.25 0.66666667] > [ 1.25 0.66666667] > ] > > ###########ATTEMPT 4:
> pdl> %m = $data->kmeans({NCLUS=>2,NTRY=>10,V=>1});
> CNTRD => Null > FULL => 0 > NCLUS => 2 > NSEED => 7 > NTRY => 10 > V => 1 > overall ms: 4.15740740740741 > iter 0 R2 [0.78619154 0.78619154 0.78619154 0.78619154 0.78619154 > 0.78619154 0.78619154 0.78619154 0.78619154 0.78619154] >
> pdl> p $m{ms}
> > [ > [ 1.25 0.25] > [ 1.25 0.66666667] > [ 1.25 0.66666667] > ]