Skip Menu |

This queue is for tickets about the Statistics-Test-WilcoxonRankSum CPAN distribution.

Report information
The Basics
Id: 52035
Status: open
Priority: 0/
Queue: Statistics-Test-WilcoxonRankSum

People
Owner: Nobody in particular
Requestors: eroshkin [...] burnham.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: "bug-statistics-test-wilcoxonranksum [...] rt.cpan.org" <bug-statistics-test-wilcoxonranksum [...] rt.cpan.org>
Subject: question: if the sign of the Z is defined properly
Date: Tue, 24 Nov 2009 10:22:58 -0800
To: "Iingrid.falk [...] loria.fr" <Iingrid.falk [...] loria.fr>
From: Alexey Eroshkin <eroshkin [...] burnham.org>
Hello Ingrid, I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and found it very useful. One question: if the sign of the Z is defined properly. I have used your example three times and I have changed the order of datasetsd (1 and 2) have for the two last cases : Ranks of dataset 1 are lower than expected Ranks of dataset 1 are higher than expected Z score was the same - z: -2.283606 Is this is a correct behavior of the program or the sign of Z should change? Thanks Alexey Burnham Instuitute use Statistics::Test::WilcoxonRankSum; my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2); my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); $wilcox_test->load_data(\@dataset_1, \@dataset_2); my $prob = $wilcox_test->probability(); my $pf = sprintf '%f', $prob; # prints 0.091022 print $wilcox_test->probability_status(); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.091022, normal approx w. mean: 121.000000, std deviation: 14.200939, z: -1.690029 prob = 0.091022 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 11 | 96 / 115 ---------------------------------------------------------------- 2 | 10 | 134 / 105 ---------------------------------------------------------------- N (size of both datasets): 21 Probability: 0.091022, normal approx w. mean: 121.000000, std deviation: 14.200939, z: -1.690029 Not significant (at 0.05 level) [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi /home/eroshkin/bin/WilcoxonRankSum.pl changed the datasets (added 1,2,3 to #1 my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2,1,2,3); my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 prob = 0.022394 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 14 | 135 / 168 ---------------------------------------------------------------- 2 | 10 | 164 / 120 ---------------------------------------------------------------- N (size of both datasets): 24 Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 Significant (at 0.05 level) Ranks of dataset 1 are lower than expected [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi /home/eroshkin/bin/WilcoxonRankSum.pl changed the order of datasets: my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, 7.2,1,2,3); my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, 8.1); [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# /home/eroshkin/bin/WilcoxonRankSum.pl Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 prob = 0.022394 ---------------------------------------------------------------- dataset | n | rank sum: observed / expected ---------------------------------------------------------------- 1 | 10 | 164 / 120 ---------------------------------------------------------------- 2 | 14 | 135 / 168 ---------------------------------------------------------------- N (size of both datasets): 24 Probability: 0.022394, normal approx w. mean: 175.000000, std deviation: 17.078251, z: -2.283606 Significant (at 0.05 level) Ranks of dataset 1 are higher than expected
Hello Alexey, sorry for the late answer ;-) I'm rather busy right now... On Tue Nov 24 13:23:22 2009, eroshkin@burnham.org wrote: Show quoted text
> Hello Ingrid, > > I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and > found it very useful. > > One question: if the sign of the Z is defined properly. >
Yes, I think it is. I used the following formula for computing z (I don't remember why and where I got it from - I'm not an expert in Statistics): my $mean = $nA*($N+1)/2; my $deviation = sqrt($nA*$nB*($N+1)/12.0); my $continuity = (($W - $mean) >= 0) ? -0.5 : +0.5; my $z = ($W - $mean + $continuity)/$deviation; where $nA is the size of the dataset with the smaller rank sum and $nB the size of the other one, $N = $nA + $nB and $W is the smaller rank sum (of datasets 1 and 2). So $W will be the same, even if you switch the datasets. Best regards, Ingrid Show quoted text
> I have used your example three times and I have changed the order of > datasetsd (1 and 2) have for the two last cases : > > Ranks of dataset 1 are lower than expected > Ranks of dataset 1 are higher than expected > > Z score was the same - z: -2.283606 > > Is this is a correct behavior of the program or the sign of Z should > change? > > Thanks > Alexey > Burnham Instuitute > > > use Statistics::Test::WilcoxonRankSum; > > my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > $wilcox_test->load_data(\@dataset_1, \@dataset_2); > my $prob = $wilcox_test->probability(); > > my $pf = sprintf '%f', $prob; # prints 0.091022 > > print $wilcox_test->probability_status(); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > prob = 0.091022 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 11 | 96 / 115 > ---------------------------------------------------------------- > 2 | 10 | 134 / 105 > ---------------------------------------------------------------- > N (size of both datasets): 21 > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > Not significant (at 0.05 level) > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the datasets (added 1,2,3 to #1 > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 14 | 135 / 168 > ---------------------------------------------------------------- > 2 | 10 | 164 / 120 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are lower than expected > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the order of datasets: > > my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 10 | 164 / 120 > ---------------------------------------------------------------- > 2 | 14 | 135 / 168 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are higher than expected >
Subject: RE: [rt.cpan.org #52035] question: if the sign of the Z is defined properly
Date: Mon, 7 Dec 2009 09:45:12 -0800
To: <bug-Statistics-Test-WilcoxonRankSum [...] rt.cpan.org>
From: Alexey Erohskin <eroshkin [...] burnham.org>
Hi Ingrid, Thanks so much. This means that I cannot use sign of Z to conclude which datasets ranks are higher or lower thank expected. I will you use this conclusion you report: Ranks of dataset 1 are lower than expected Or Ranks of dataset 1 are higher than expected Thanks again Alexey Show quoted text
-----Original Message----- From: Ingrid Falk via RT [mailto:bug-Statistics-Test-WilcoxonRankSum@rt.cpan.org] Sent: Saturday, December 05, 2009 9:13 AM To: Alexey Eroshkin Subject: [rt.cpan.org #52035] question: if the sign of the Z is defined properly <URL: https://rt.cpan.org/Ticket/Display.html?id=52035 > Hello Alexey, sorry for the late answer ;-) I'm rather busy right now... On Tue Nov 24 13:23:22 2009, eroshkin@burnham.org wrote:
> Hello Ingrid, > > I have tried your module Statistics-Test-WilcoxonRankSum-0.0.6 and > found it very useful. > > One question: if the sign of the Z is defined properly. >
Yes, I think it is. I used the following formula for computing z (I don't remember why and where I got it from - I'm not an expert in Statistics): my $mean = $nA*($N+1)/2; my $deviation = sqrt($nA*$nB*($N+1)/12.0); my $continuity = (($W - $mean) >= 0) ? -0.5 : +0.5; my $z = ($W - $mean + $continuity)/$deviation; where $nA is the size of the dataset with the smaller rank sum and $nB the size of the other one, $N = $nA + $nB and $W is the smaller rank sum (of datasets 1 and 2). So $W will be the same, even if you switch the datasets. Best regards, Ingrid
> I have used your example three times and I have changed the order of > datasetsd (1 and 2) have for the two last cases : > > Ranks of dataset 1 are lower than expected > Ranks of dataset 1 are higher than expected > > Z score was the same - z: -2.283606 > > Is this is a correct behavior of the program or the sign of Z should > change? > > Thanks > Alexey > Burnham Instuitute > > > use Statistics::Test::WilcoxonRankSum; > > my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > $wilcox_test->load_data(\@dataset_1, \@dataset_2); > my $prob = $wilcox_test->probability(); > > my $pf = sprintf '%f', $prob; # prints 0.091022 > > print $wilcox_test->probability_status(); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > prob = 0.091022 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 11 | 96 / 115 > ---------------------------------------------------------------- > 2 | 10 | 134 / 105 > ---------------------------------------------------------------- > N (size of both datasets): 21 > Probability: 0.091022, normal approx w. mean: 121.000000, std > deviation: 14.200939, z: -1.690029 > Not significant (at 0.05 level) > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the datasets (added 1,2,3 to #1 > > my @dataset_1 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_2 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 14 | 135 / 168 > ---------------------------------------------------------------- > 2 | 10 | 164 / 120 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are lower than expected > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# vi > /home/eroshkin/bin/WilcoxonRankSum.pl > > > changed the order of datasets: > > my @dataset_2 = (4.6, 4.7, 4.9, 5.1, 5.2, 5.5, 5.8, 6.1, 6.5, 6.5, > 7.2,1,2,3); > my @dataset_1 = (5.2, 5.3, 5.4, 5.6, 6.2, 6.3, 6.8, 7.7, 8.0, > 8.1); > > > [root@oneleg Statistics-Test-WilcoxonRankSum-0.0.6]# > /home/eroshkin/bin/WilcoxonRankSum.pl > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > prob = 0.022394 > ---------------------------------------------------------------- > dataset | n | rank sum: observed / expected > ---------------------------------------------------------------- > 1 | 10 | 164 / 120 > ---------------------------------------------------------------- > 2 | 14 | 135 / 168 > ---------------------------------------------------------------- > N (size of both datasets): 24 > Probability: 0.022394, normal approx w. mean: 175.000000, std > deviation: 17.078251, z: -2.283606 > Significant (at 0.05 level) > Ranks of dataset 1 are higher than expected >
On Mon Dec 07 12:45:37 2009, eroshkin@burnham.org wrote: Hi Alexey, Show quoted text
> This means that I cannot use sign of Z to conclude which datasets > ranks are higher or lower thank expected. I will you use this > conclusion you report: > > Ranks of dataset 1 are lower than expected > Or > Ranks of dataset 1 are higher than expected >
You've got a point there - it doesn't semm right you have to do it like this. Here's another way: my $wilcox_test = Statistics::Test::WilcoxonRankSum->new(); my @dataset_1 = qw(12 15 18 24 88); my @dataset_2 = qw(3 3 13 27 33); $wilcox_test->load_data(\@dataset_1, \@dataset_2); # get rank sum for dataset 1, returns 31 my $rank_sum_dataset1 = $wilcox_test->rank_sum_for('dataset1'); # get expected rank sum for dataset 1, returns 25 my $rank_sum_expected_dataset1 = $wilcox_test->get_expected_rank_sum_dataset1(); Maybe I'll have to update the documentation, I'll check one of these days. Ingrid