Bug #52035 for Statistics-Test-WilcoxonRankSum: question: if the sign of the Z is defined properly

Tue Nov 24 13:23:22 2009 eroshkin [...] burnham.org - Ticket created

CC:	"bug-statistics-test-wilcoxonranksum [...] rt.cpan.org" <bug-statistics-test-wilcoxonranksum [...] rt.cpan.org>
Subject:	question: if the sign of the Z is defined properly
Date:	Tue, 24 Nov 2009 10:22:58 -0800
To:	"Iingrid.falk [...] loria.fr" <Iingrid.falk [...] loria.fr>
From:	Alexey Eroshkin <eroshkin [...] burnham.org>

Hello Ingrid, I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and found it very useful. One question: if the sign of the Z is defined properly. I have used your example three times and I have changed the order of datasetsd (1 and 2) have for the two last cases : Ranks of dataset 1 are lower than expected Ranks of dataset 1 are higher than expected Z score was the same - z: -2.283606 Is this is a correct behavior of the program or the sign of Z should change? Thanks Alexey Burnham Instuitute use Statistics::Test::WilcoxonRankSum; my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2); my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); $wilcox_test->load_data(\@dataset_1, \@dataset_2); my $prob = $wilcox_test->probability(); my $pf = sprintf '%f', $prob; # prints 0.091022 print $wilcox_test->probability_status(); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.091022, normal approx w. mean: 121.000000, std deviation: 14.200939, z: -1.690029 prob = 0.091022 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 11 | 96 / 115 ---------------------------------------------------------------- 2 | 10 | 134 / 105 ---------------------------------------------------------------- N (size of both datasets): 21 Probability: 0.091022, normal approx w. mean: 121.000000, std deviation: 14.200939, z: -1.690029 Not significant (at 0.05 level) [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi /home/eroshkin/bin/WilcoxonRankSum.pl changed the datasets (added 1,2,3 to #1 my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2,1,2,3); my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 prob = 0.022394 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 14 | 135 / 168 ---------------------------------------------------------------- 2 | 10 | 164 / 120 ---------------------------------------------------------------- N (size of both datasets): 24 Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 Significant (at 0.05 level) Ranks of dataset 1 are lower than expected [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi /home/eroshkin/bin/WilcoxonRankSum.pl changed the order of datasets: my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2,1,2,3); my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 prob = 0.022394 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 10 | 164 / 120 ---------------------------------------------------------------- 2 | 14 | 135 / 168 ---------------------------------------------------------------- N (size of both datasets): 24 Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 Significant (at 0.05 level) Ranks of dataset 1 are higher than expected

Sat Dec 05 12:12:34 2009 INGRIF [...] cpan.org - Correspondence added

Hello Alexey, sorry for the late answer ;-) I'm rather busy right now... On Tue Nov 24 13:23:22 2009, eroshkin@burnham.org wrote: Show quoted text

> Hello Ingrid, > > I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and > found it very useful. > > One question: if the sign of the Z is defined properly. >

Yes, I think it is. I used the following formula for computing z (I don't remember why and where I got it from - I'm not an expert in Statistics): my $mean = $nA*($N+1)/2; my $deviation = sqrt($nA*$nB*($N+1)/12.0); my $continuity = (($W - $mean) >= 0) ? -0.5 : +0.5; my $z = ($W - $mean + $continuity)/$deviation; where $nA is the size of the dataset with the smaller rank sum and $nB the size of the other one, $N = $nA + $nB and $W is the smaller rank sum (of datasets 1 and 2). So $W will be the same, even if you switch the datasets. Best regards, Ingrid Show quoted text

> I have used your example three times and I have changed the order of > datasetsd (1 and 2) have for the two last cases : > > Ranks of dataset 1 are lower than expected > Ranks of dataset 1 are higher than expected > > Z score was the same - z: -2.283606 > > Is this is a correct behavior of the program or the sign of Z should > change? > > Thanks > Alexey > Burnham Instuitute > > > use Statistics::Test::WilcoxonRankSum; > > my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > $wilcox_test->load_data(\@dataset_1, \@dataset_2); > my $prob = $wilcox_test->probability(); > > my $pf = sprintf '%f', $prob; # prints 0.091022 > > print $wilcox_test->probability_status(); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > prob = 0.091022 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 11 | 96 / 115 > ---------------------------------------------------------------- > 2 | 10 | 134 / 105 > ---------------------------------------------------------------- > N (size of both datasets): 21 > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > Not significant (at 0.05 level) > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the datasets (added 1,2,3 to #1 > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 14 | 135 / 168 > ---------------------------------------------------------------- > 2 | 10 | 164 / 120 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are lower than expected > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the order of datasets: > > my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 10 | 164 / 120 > ---------------------------------------------------------------- > 2 | 14 | 135 / 168 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are higher than expected >

Sat Dec 05 12:12:35 2009 The RT System itself - Status changed from 'new' to 'open'

Sat Dec 05 12:12:37 2009 INGRIF [...] cpan.org - Status changed from 'open' to 'resolved'

Mon Dec 07 12:45:37 2009 eroshkin [...] burnham.org - Correspondence added

Subject:	RE: [rt.cpan.org #52035] question: if the sign of the Z is defined properly
Date:	Mon, 7 Dec 2009 09:45:12 -0800
To:	<bug-Statistics-Test-WilcoxonRankSum [...] rt.cpan.org>
From:	Alexey Erohskin <eroshkin [...] burnham.org>

Hi Ingrid, Thanks so much. This means that I cannot use sign of Z to conclude which datasets ranks are higher or lower thank expected. I will you use this conclusion you report: Ranks of dataset 1 are lower than expected Or Ranks of dataset 1 are higher than expected Thanks again Alexey Show quoted text

-----Original Message----- From: Ingrid Falk via RT [mailto:bug-Statistics-Test-WilcoxonRankSum@rt.cpan.org] Sent: Saturday, December 05, 2009 9:13 AM To: Alexey Eroshkin Subject: [rt.cpan.org #52035] question: if the sign of the Z is defined properly <URL: https://rt.cpan.org/Ticket/Display.html?id=52035 > Hello Alexey, sorry for the late answer ;-) I'm rather busy right now... On Tue Nov 24 13:23:22 2009, eroshkin@burnham.org wrote:

> Hello Ingrid, > > I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and > found it very useful. > > One question: if the sign of the Z is defined properly. >

Yes, I think it is. I used the following formula for computing z (I don't remember why and where I got it from - I'm not an expert in Statistics): my $mean = $nA*($N+1)/2; my $deviation = sqrt($nA*$nB*($N+1)/12.0); my $continuity = (($W - $mean) >= 0) ? -0.5 : +0.5; my $z = ($W - $mean + $continuity)/$deviation; where $nA is the size of the dataset with the smaller rank sum and $nB the size of the other one, $N = $nA + $nB and $W is the smaller rank sum (of datasets 1 and 2). So $W will be the same, even if you switch the datasets. Best regards, Ingrid

> I have used your example three times and I have changed the order of > datasetsd (1 and 2) have for the two last cases : > > Ranks of dataset 1 are lower than expected > Ranks of dataset 1 are higher than expected > > Z score was the same - z: -2.283606 > > Is this is a correct behavior of the program or the sign of Z should > change? > > Thanks > Alexey > Burnham Instuitute > > > use Statistics::Test::WilcoxonRankSum; > > my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > $wilcox_test->load_data(\@dataset_1, \@dataset_2); > my $prob = $wilcox_test->probability(); > > my $pf = sprintf '%f', $prob; # prints 0.091022 > > print $wilcox_test->probability_status(); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > prob = 0.091022 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 11 | 96 / 115 > ---------------------------------------------------------------- > 2 | 10 | 134 / 105 > ---------------------------------------------------------------- > N (size of both datasets): 21 > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > Not significant (at 0.05 level) > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the datasets (added 1,2,3 to #1 > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 14 | 135 / 168 > ---------------------------------------------------------------- > 2 | 10 | 164 / 120 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are lower than expected > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the order of datasets: > > my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 10 | 164 / 120 > ---------------------------------------------------------------- > 2 | 14 | 135 / 168 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are higher than expected >

Mon Dec 07 12:45:38 2009 The RT System itself - Status changed from 'resolved' to 'open'

Mon Dec 07 14:13:04 2009 INGRIF [...] cpan.org - Correspondence added

On Mon Dec 07 12:45:37 2009, eroshkin@burnham.org wrote: Hi Alexey, Show quoted text

> This means that I cannot use sign of Z to conclude which datasets > ranks are higher or lower thank expected. I will you use this > conclusion you report: > > Ranks of dataset 1 are lower than expected > Or > Ranks of dataset 1 are higher than expected >

You've got a point there - it doesn't semm right you have to do it like this. Here's another way: my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); my @dataset_1 = qw(12 15 18 24 88); my @dataset_2 = qw(3 3 13 27 33); $wilcox_test->load_data(\@dataset_1, \@dataset_2); # get rank sum for dataset 1, returns 31 my $rank_sum_dataset1 = $wilcox_test->rank_sum_for('dataset1'); # get expected rank sum for dataset 1, returns 25 my $rank_sum_expected_dataset1 = $wilcox_test->get_expected_rank_sum_dataset1(); Maybe I'll have to update the documentation, I'll check one of these days. Ingrid