Bug #88784 for PDL-Stats: t/stats_glm.t failures with PDL-2.006

Wed Sep 18 10:26:29 2013 chm [...] cpan.org - Ticket created

Subject:

t/stats_glm.t failures with PDL-2.006_07

The current PDL::Stats distribution fails to test successfully with the
latest CPAN Developers release. This is probably due to the clean up
and unification of slice() functionality which causes what appears to
be existing bad code to now fail.

Attached are patches against the 0.6.4 release GLM.pm and
Kmeans.pm files generated on my system. The changes are
basically adding (my $tmp = slice()) to .= and *= lines to work
under the perl debugger (required for perls before 5.16.x).

The rest were changes needed to allow tests to pass (GLM.pm
line 2306 needed a () around $comp, and the next problem is
at line 1393 in the same file where the arguments to zeroes()
look funny and result in an Empty piddle which breaks in the
slice op at line 1398.

I've attached patch file for GLM.pm and Kmeans.pm which should
allow you to recreate this current status. Because of the problems
I am unable to determine if there are any issues from the new
64bit index support.

Subject:

patch-GLMpm.txt

--- GLM/GLM.pm 2013-09-18 10:08:43.036709400 -0400 +++ /home/chris.h.marshall/pdl/GLM.pm 2013-09-18 10:04:55.167709400 -0400 @@ -651,7 +651,7 @@ # Internally normalise data # (double) it or ushort y and sequence iv won't work right my $ymean = $y->abs->sumover->double / $y->dim(0); - $ymean->where( $ymean==0 ) .= 1; + (my $tmp = $ymean->where( $ymean==0 )) .= 1; my $y2 = $y / $ymean->dummy(0); # Do the fit @@ -1063,8 +1063,8 @@ for (@$ivs_ref) { my $last = zeroes $_->dim(0); my $i_neg = which $_( ,0) == -1; - $last($i_neg) .= 1; - $_->where($_ == -1) .= 0; + (my $tmp = $last($i_neg)) .= 1; + (my $tmp = $_->where($_ == -1)) .= 0; $_ = $_->glue(1, $last); my @v = split ' ~ ', $ids->[$i]; @@ -1382,10 +1382,10 @@ # something not treated as BAD by _array_to_pdl to start off marking group membership # if no $opt->{BTWN}, everyone ends up in the same grp my $s = '_'; - $s .= $_->($n) + (my $tmp = $s) .= $_->($n) for (@$raw_ivs[@{ $opt->{BTWN} }]); push @grp, $s; # group membership - $s .= $subj($n); # keep track of total uniq subj + (my $tmp = $s) .= $subj($n); # keep track of total uniq subj $grp_s{$s} = 1; } my $grp = PDL::Stats::Kmeans::iv_cluster \@grp; @@ -1395,7 +1395,8 @@ for my $g (0 .. $grp->dim(1)-1) { my $gsub = $subj( which $grp( ,$g) )->effect_code; my ($nobs, $nsub) = $gsub->dims; - $spdl($d0:$d0+$nobs-1, $d1:$d1+$nsub-1) .= $gsub; + $DB::single = 1; + (my $tmp = $spdl($d0:$d0+$nobs-1, $d1:$d1+$nsub-1)) .= $gsub; $d0 += $nobs; $d1 += $nsub; } @@ -1471,7 +1472,7 @@ } @se; for my $i (0 .. $#se) { - $cm_ref->{"# $se[$i] # se"} + (my $tmp = $cm_ref->{"# $se[$i] # se"}) .= sqrt( $ret->{"| $se[$i] || err ms"} / $n_obs[$i] ); } @@ -1503,7 +1504,7 @@ my $var_e = effect_code( $var_ref ); - $var_e->where( $var_e == -1 ) .= 0; + (my $tmp = $var_e->where( $var_e == -1 )) .= 0; return $var_e; } @@ -1553,13 +1554,13 @@ for my $l (0 .. $var->max - 1) { my $v = $var_e( ,$l); - $v->index( which $var == $l ) .= 1; - $v->index( which $var == $var->max ) .= -1; + (my $tmp = $v->index( which $var == $l )) .= 1; + (my $tmp = $v->index( which $var == $var->max )) .= -1; } if ($var->badflag) { my $ibad = which $var->isbad; - $var_e($ibad, ) .= -99; + (my $tmp = $var_e($ibad, )) .= -99; $var_e = $var_e->setvaltobad(-99); } @@ -1600,7 +1601,7 @@ my $pos = which $factor == 1; my $neg = which $factor == -1; my $w = $pos->nelem / $neg->nelem; - $factor($neg) *= $w; + (my $tmp = $factor($neg)) *= $w; } return wantarray? ($var_e, $map_ref) : $var_e; @@ -1773,7 +1774,7 @@ my $se_b = ones( $coeff->dims? $coeff->dims : 1 ); $opt{CONST} and - $se_b(-1) .= sqrt( $ret{ss_residual} / $ret{F_df}->(1) * $C(-1,-1) ); + (my $tmp = $se_b(-1)) .= sqrt( $ret{ss_residual} / $ret{F_df}->(1) * $C(-1,-1) ); # get the se for bs by successivly regressing each iv by the rest ivs if ($ivs->dim(1) > 1) { @@ -1786,11 +1787,11 @@ my $ss_res_k = $ivs( ,$k)->squeeze->sse( sumover($b_G * $G->transpose) ); - $se_b($k) .= sqrt( $ret{ss_residual} / $ret{F_df}->(1) / $ss_res_k ); + (my $tmp = $se_b($k)) .= sqrt( $ret{ss_residual} / $ret{F_df}->(1) / $ss_res_k ); } } else { - $se_b(0) + (my $tmp = $se_b(0)) .= sqrt( $ret{ss_residual} / $ret{F_df}->(1) / sum( $ivs( ,0)**2 ) ); } @@ -1906,7 +1907,7 @@ my $iv = $s->glue(1, @ivs[ @i_rest ]); my $b = $y->ols_t($iv); $pred = sumover($b(0:-2) * $iv->transpose) + $b(-1); - $r{ss}->($i) .= $y->sse($pred) - $ss_pe; + (my $tmp = $r{ss}->($i)) .= $y->sse($pred) - $ss_pe; } # STEP 3: get precitor x subj interaction as error term @@ -1926,7 +1927,7 @@ my $iv = $iv_p->glue(1, $e_rest); my $b = $y->ols_t($iv); my $pred = sumover($b(0:-2) * $iv->transpose) + $b(-1); - $r{ss_err}->($i) .= $y->sse($pred) - $r{'(ss_residual)'}; + (my $tmp = $r{ss_err}->($i)) .= $y->sse($pred) - $r{'(ss_residual)'}; } # Finally, get MS, F, etc @@ -2070,19 +2071,19 @@ = PDL::Fit::LM::lmfit( $G, $self, $opt{WT}, \&_logistic, $init, { Maxiter=>$opt{MAXIT}, Eps=>$opt{EPS} } ); - $coeff_chisq($k) .= $self->dm( $y_G ) - $ret{Dm}; + (my $tmp = $coeff_chisq($k)) .= $self->dm( $y_G ) - $ret{Dm}; } } else { # d0 is, by definition, the deviance with only intercept - $coeff_chisq(0) .= $ret{D0} - $ret{Dm}; + (my $tmp = $coeff_chisq(0)) .= $ret{D0} - $ret{Dm}; } my $y_c = PDL::Fit::LM::lmfit( $ivs, $self, $opt{WT}, \&_logistic_no_intercept, $opt{INITP}->(0:-2)->copy, { Maxiter=>$opt{MAXIT}, Eps=>$opt{EPS} } ); - $coeff_chisq(-1) .= $self->dm( $y_c ) - $ret{Dm}; + (my $tmp = $coeff_chisq(-1)) .= $self->dm( $y_c ) - $ret{Dm}; $ret{b} = $coeff; $ret{b_chisq} = $coeff_chisq; @@ -2107,18 +2108,18 @@ # independent variable $x, and fit parameters as specified above. # Use the .= (dot equals) assignment operator to express the equality # (not just a plain equals) - $ym .= 1 / ( 1 + exp( -1 * (sumover($b * $x->transpose) + $c) ) ); + (my $tmp = $ym) .= 1 / ( 1 + exp( -1 * (sumover($b * $x->transpose) + $c) ) ); my (@dy) = map {$dyda -> slice(",($_)") } (0 .. $par->dim(0)-1); # Partial derivative of the function with respect to each slope # fit parameter ($b in this case). Again, note .= assignment # operator (not just "equals") - $dy[$_] .= $x( ,$_) * $ym * (1 - $ym) + (my $tmp = $dy[$_]) .= $x( ,$_) * $ym * (1 - $ym) for (0 .. $b->dim(0)-1); # Partial derivative of the function re intercept par - $dy[-1] .= $ym * (1 - $ym); + (my $tmp = $dy[-1]) .= $ym * (1 - $ym); } sub _logistic_no_intercept { @@ -2130,14 +2131,14 @@ # independent variable $x, and fit parameters as specified above. # Use the .= (dot equals) assignment operator to express the equality # (not just a plain equals) - $ym .= 1 / ( 1 + exp( -1 * sumover($b * $x->transpose) ) ); + (my $tmp = $ym) .= 1 / ( 1 + exp( -1 * sumover($b * $x->transpose) ) ); my (@dy) = map {$dyda -> slice(",($_)") } (0 .. $par->dim(0)-1); # Partial derivative of the function with respect to each slope # fit parameter ($b in this case). Again, note .= assignment # operator (not just "equals") - $dy[$_] .= $x( ,$_) * $ym * (1 - $ym) + (my $tmp = $dy[$_]) .= $x( ,$_) * $ym * (1 - $ym) for (0 .. $b->dim(0)-1); } @@ -2303,8 +2304,8 @@ # sort within comp my $ic = $icomp($ivar_sort)->iv_cluster; for my $comp (0 .. $ic->dim(1)-1) { - my $i = $self(which($ic( ,$comp)), $comp)->qsorti->(-1:0); - $ivar_sort(which $ic( ,$comp)) + my $i = $self(which($ic( ,$comp)), ($comp))->qsorti->(-1:0); + (my $tmp = $ivar_sort(which $ic( ,$comp))) .= $ivar_sort(which $ic( ,$comp))->($i)->sever; } return wantarray? ($ivar_sort, pdl(0 .. $ic->dim(1)-1)) : $ivar_sort; @@ -2406,7 +2407,7 @@ $p ++; my $tl = ''; $tl = $opt{IVNM}->[$iD[2]] . " $x" if $self->dim($iD[2]) > 1; - $tl.= ' ' . $opt{IVNM}->[$iD[3]] . " $y" if $self->dim($iD[3]) > 1; + $tl .= ' ' . $opt{IVNM}->[$iD[3]] . " $y" if $self->dim($iD[3]) > 1; $w->env( 0, $self->dim($iD[0])-1, $min - 2*$range/5, $max + $range/5, { XTitle=>$opt{IVNM}->[$iD[0]], YTitle=>$opt{DVNM}, Title=>$tl, PANEL=>$p, AXIS=>['BCNT', 'BCNST'], Border=>1, } ) @@ -2708,4 +2709,4 @@ 1; - \ No newline at end of file +

Subject:

patch-Kmeanspm.txt

--- Kmeans/Kmeans.pm 2013-09-18 10:08:51.380709400 -0400 +++ /home/chris.h.marshall/pdl/Kmeans.pm 2013-09-18 10:05:01.726709400 -0400 @@ -319,8 +319,8 @@ croak "1D pdl only please"; my $a = zeroes 2, $self->nelem; - $a((0), ) .= sequence $self->nelem; - $a((1), ) .= $self; + (my $tmp = $a((0), )) .= sequence $self->nelem; + (my $tmp = $a((1), )) .= $self; my $d = _d_point2line( $a, $a( ,(0)), $a( ,(-1)) ); @@ -624,12 +624,12 @@ for my $l (0 .. $var->max) { my $v = $var_a( ,$l); - $v->index( which $var == $l ) .= 1; + (my $tmp = $v->index( which $var == $l )) .= 1; } if ($var->badflag) { my $ibad = which $var->isbad; - $var_a($ibad, ) .= -1; + (my $tmp = $var_a($ibad, )) .= -1; $var_a->inplace->setvaltobad(-1); } @@ -737,4 +737,4 @@ 1; - \ No newline at end of file +

Wed Sep 18 23:16:45 2013 MAGGIEXYZ [...] cpan.org - Correspondence added

Thanks for the patch! I'll try to get out a new release tomorrow. On 2013-09-18 10:26:29, CHM wrote: Show quoted text

> The current PDL::Stats distribution fails to test successfully with the > latest CPAN Developers release. This is probably due to the clean up > and unification of slice() functionality which causes what appears to > be existing bad code to now fail. > > Attached are patches against the 0.6.4 release GLM.pm and > Kmeans.pm files generated on my system. The changes are > basically adding (my $tmp = slice()) to .= and *= lines to work > under the perl debugger (required for perls before 5.16.x). > > The rest were changes needed to allow tests to pass (GLM.pm > line 2306 needed a () around $comp, and the next problem is > at line 1393 in the same file where the arguments to zeroes() > look funny and result in an Empty piddle which breaks in the > slice op at line 1398. > > I've attached patch file for GLM.pm and Kmeans.pm which should > allow you to recreate this current status. Because of the problems > I am unable to determine if there are any issues from the new > 64bit index support.

Wed Sep 18 23:16:45 2013 The RT System itself - Status changed from 'new' to 'open'

Wed Sep 18 23:16:45 2013 MAGGIEXYZ [...] cpan.org - Taken

Thu Sep 19 08:15:04 2013 chm [...] cpan.org - Correspondence added

On Wed Sep 18 23:16:45 2013, MAGGIEXYZ wrote:
Show quoted text

> Thanks for the patch! I'll try to get out a new release tomorrow.

Hi Maggie-

The patch is just to get you to where I stopped, it only
has an actual fix for one problem and the next problem
is at 1398 where the args to zeroes were not valid and
resulting in an Empty piddle.

Also, did you get my e-mail regarding 64bit index checks
for PDL-based distributions? Once the bugs are fixed,
the question is whether your code will work with a PDL
larger than 2**32 elements. The key in your code is to
use PDL_Indx as the C type for dimension indexes and
dimension extents/offsets rather than PDL_Long.

Regards,
Chris
Show quoted text

> On 2013-09-18 10:26:29, CHM wrote:

> > The current PDL::Stats distribution fails to test successfully with the
> > latest CPAN Developers release. This is probably due to the clean up
> > and unification of slice() functionality which causes what appears to
> > be existing bad code to now fail.
> >
> > Attached are patches against the 0.6.4 release GLM.pm and
> > Kmeans.pm files generated on my system. The changes are
> > basically adding (my $tmp = slice()) to .= and *= lines to work
> > under the perl debugger (required for perls before 5.16.x).
> >
> > The rest were changes needed to allow tests to pass (GLM.pm
> > line 2306 needed a () around $comp, and the next problem is
> > at line 1393 in the same file where the arguments to zeroes()
> > look funny and result in an Empty piddle which breaks in the
> > slice op at line 1398.
> >
> > I've attached patch file for GLM.pm and Kmeans.pm which should
> > allow you to recreate this current status. Because of the problems
> > I am unable to determine if there are any issues from the new
> > 64bit index support.

>
>

Thu Sep 19 22:14:00 2013 MAGGIEXYZ [...] cpan.org - Correspondence added

Chris, Thank you so much for going through the code! That was quite a bit of manual work there. I really appreciate it. I'll look into the 64bit index situation after the build is fixed. Right now it looks like I'm missing some commits from another box that didn't get pushed to github. I'll have to work through the changes this weekend (amid a block party!). Best, Maggie On 2013-09-19 08:15:04, CHM wrote: Show quoted text

> On Wed Sep 18 23:16:45 2013, MAGGIEXYZ wrote:

> > Thanks for the patch! I'll try to get out a new release tomorrow.

> > Hi Maggie- > > The patch is just to get you to where I stopped, it only > has an actual fix for one problem and the next problem > is at 1398 where the args to zeroes were not valid and > resulting in an Empty piddle. > > Also, did you get my e-mail regarding 64bit index checks > for PDL-based distributions? Once the bugs are fixed, > the question is whether your code will work with a PDL > larger than 2**32 elements. The key in your code is to > use PDL_Indx as the C type for dimension indexes and > dimension extents/offsets rather than PDL_Long. > > Regards, > Chris

> > On 2013-09-18 10:26:29, CHM wrote:

> > > The current PDL::Stats distribution fails to test successfully with the > > > latest CPAN Developers release. This is probably due to the clean up > > > and unification of slice() functionality which causes what appears to > > > be existing bad code to now fail. > > > > > > Attached are patches against the 0.6.4 release GLM.pm and > > > Kmeans.pm files generated on my system. The changes are > > > basically adding (my $tmp = slice()) to .= and *= lines to work > > > under the perl debugger (required for perls before 5.16.x). > > > > > > The rest were changes needed to allow tests to pass (GLM.pm > > > line 2306 needed a () around $comp, and the next problem is > > > at line 1393 in the same file where the arguments to zeroes() > > > look funny and result in an Empty piddle which breaks in the > > > slice op at line 1398. > > > > > > I've attached patch file for GLM.pm and Kmeans.pm which should > > > allow you to recreate this current status. Because of the problems > > > I am unable to determine if there are any issues from the new > > > 64bit index support.

> > > >

Sat Sep 21 21:46:43 2013 MAGGIEXYZ [...] cpan.org - Status changed from 'open' to 'resolved'

Sat Sep 21 21:46:43 2013 MAGGIEXYZ [...] cpan.org - Fixed in 0.6.4_1 added

Bug #88784 for PDL-Stats: t/stats_glm.t failures with PDL-2.006_07