Skip Menu |

This queue is for tickets about the Statistics-Descriptive CPAN distribution.

Report information
The Basics
Id: 47948
Status: resolved
Priority: 0/
Queue: Statistics-Descriptive

People
Owner: SHLOMIF [...] cpan.org
Requestors: DJIBEL [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.0000
Fixed in: (no value)



Subject: percentil methode, quantile method
Dear, Your module is well and gives good methods, but if you want to calculate the Q1 (percentile 25), Q3 (percentile 75), the percentile method returns the value of the array. It would be a good idea to add a quantile method which calculates and return an array with (Q0, Q1, Q2 Q3 and Q4). For example : if you have an array my @array = (1..7); $stat->add_data(@array); my @Quantile = $stat->quantile(); Quantile_0 = 1 # => min Quantile_4 = 7 # => max Quantile_2 = 4 # => median Quantile_1 = 2 # the calcul of percentile 25 Quantile_3 = 5.5 # the calcul of percentile 75 That is a path that you can add in your module. I have test it and it is work fine. sub quantile { my $self = shift; # sort data my $count = $self->count(); $self->sort_data(); my ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4 ); my ( @BottomTab, @UpTab ); my $IndexCenter = ( $count / 2 ) - 1; # Number of data pair if ( $count % 2 ) { @BottomTab = @{$self->_data}[ 0 .. $IndexCenter ]; @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; } # Number of data impair else { $IndexCenter = ( $count - 1 ) / 2; @BottomTab = @{$self->_data}[ 0 .. $IndexCenter - 1 ]; @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; } $Quantile0 = $self->min(); my $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@BottomTab); $Quantile1 = $stat->median(); $Quantile2 = $self->_median(); undef $stat; $stat = Statistics::Descriptive::Full->new(); $stat->add_data(@UpTab); $Quantile3 = $stat->median(); $Quantile4 = $self->max(); return ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4); } Best Regards, Djibril Ousmanou
Le Jeu. Jul. 16 14:17:00 2009, DJIBEL a écrit : Show quoted text
> Dear, > > Your module is well and gives good methods, but if you want to calculate > the Q1 (percentile 25), Q3 (percentile 75), the percentile method > returns the value of the array. > > It would be a good idea to add a quantile method which calculates and > return an array with (Q0, Q1, Q2 Q3 and Q4). > > For example : > if you have an array my @array = (1..7); > $stat->add_data(@array); > my @Quantile = $stat->quantile(); > Quantile_0 = 1 # => min > Quantile_4 = 7 # => max > Quantile_2 = 4 # => median > Quantile_1 = 2 # the calcul of percentile 25 > Quantile_3 = 5.5 # the calcul of percentile 75 > > That is a path that you can add in your module. I have test it and it is > work fine. > > sub quantile { > my $self = shift; > > # sort data > my $count = $self->count(); > $self->sort_data(); > > my ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4 ); > my ( @BottomTab, @UpTab ); > > my $IndexCenter = ( $count / 2 ) - 1; > # Number of data pair > if ( $count % 2 ) { > @BottomTab = @{$self->_data}[ 0 .. $IndexCenter ]; > @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; > } > > # Number of data impair > else { > $IndexCenter = ( $count - 1 ) / 2; > @BottomTab = @{$self->_data}[ 0 .. $IndexCenter - 1 ]; > @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; > }
==> mistake <== # Number of data impair else { $IndexCenter = ( $count - 1 ) / 2; @BottomTab = @{$self->_data}[ 0 .. $IndexCenter ]; @UpTab = @{$self->_data}[ $IndexCenter .. $count - 1 ]; } Show quoted text
> $Quantile0 = $self->min(); > > my $stat = Statistics::Descriptive::Full->new(); > $stat->add_data(@BottomTab); > $Quantile1 = $stat->median(); > > $Quantile2 = $self->_median(); > > undef $stat; > $stat = Statistics::Descriptive::Full->new(); > $stat->add_data(@UpTab); > $Quantile3 = $stat->median(); > > $Quantile4 = $self->max(); > > return ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4); > } > > Best Regards, > > Djibril Ousmanou
On Thu Jul 16 14:17:00 2009, DJIBEL wrote: Show quoted text
> Dear, > > Your module is well and gives good methods, but if you want to calculate > the Q1 (percentile 25), Q3 (percentile 75), the percentile method > returns the value of the array. > > It would be a good idea to add a quantile method which calculates and > return an array with (Q0, Q1, Q2 Q3 and Q4). >
Hi! I like this. However, here are some comments. Please convert your code into a patch against: http://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive/trunk and make sure you have added regression tests (using Test::More under t/*.t) and POD. Also please make sure you return a single array ref instead of them flattened into the return list. Furthermore, you used ->median() once and ->_median() once. Please use ->median() in both cases. Regards, Shlomi Fish Show quoted text
> For example : > if you have an array my @array = (1..7); > $stat->add_data(@array); > my @Quantile = $stat->quantile(); > Quantile_0 = 1 # => min > Quantile_4 = 7 # => max > Quantile_2 = 4 # => median > Quantile_1 = 2 # the calcul of percentile 25 > Quantile_3 = 5.5 # the calcul of percentile 75 > > That is a path that you can add in your module. I have test it and it is > work fine. > > sub quantile { > my $self = shift; > > # sort data > my $count = $self->count(); > $self->sort_data(); > > my ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4 ); > my ( @BottomTab, @UpTab ); > > my $IndexCenter = ( $count / 2 ) - 1; > # Number of data pair > if ( $count % 2 ) { > @BottomTab = @{$self->_data}[ 0 .. $IndexCenter ]; > @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; > } > > # Number of data impair > else { > $IndexCenter = ( $count - 1 ) / 2; > @BottomTab = @{$self->_data}[ 0 .. $IndexCenter - 1 ]; > @UpTab = @{$self->_data}[ $IndexCenter + 1 .. $count - 1 ]; > } > > $Quantile0 = $self->min(); > > my $stat = Statistics::Descriptive::Full->new(); > $stat->add_data(@BottomTab); > $Quantile1 = $stat->median(); > > $Quantile2 = $self->_median(); > > undef $stat; > $stat = Statistics::Descriptive::Full->new(); > $stat->add_data(@UpTab); > $Quantile3 = $stat->median(); > > $Quantile4 = $self->max(); > > return ( $Quantile0, $Quantile1, $Quantile2, $Quantile3, $Quantile4); > } > > Best Regards, > > Djibril Ousmanou
Subject: percentil method, quantile method
Le Jeu. Jul. 16 18:15:40 2009, SHLOMIF a écrit : Show quoted text
> Hi! > > I like this. However, here are some comments. > > Please convert your code into a patch against: > > http://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive/trunk > > and make sure you have added regression tests (using Test::More under > t/*.t) and POD. Also please make sure you return a single array ref > instead of them flattened into the return list. > > Furthermore, you used ->median() once and ->_median() once. Please use > ->median() in both cases. > > Regards, > > Shlomi Fish
Hi. Well, now the quantile method is OK. I just have to create a POD section Please, can you explain me how to create a patch ? I don't know the method to create a patch. Thank you Djibril Ousmanou
Hi! I have add new method call quantile. I attached 2 files : 1- quantile.t that you can add in t/ directory 2- patch djibel.txt of Descriptive.pm I have added new method quantile, add POD text to this method, and I have run perltidy to Descriptive.pm file. I have test and all work well. You can test and update your module. Don't forget to tell me when you update it. Best Regards, Djibril Ousmanou

Message body is not shown because it is too large.

#!/usr/bin/perl #================================================================== # Author : Djibril Ousmanou # Copyright : 2009 # Update : 17/07/2009 18:47:16 # AIM : Test quantile type 7 0 to 4 calcul #================================================================== use strict; use warnings; use Carp; use Test::More tests => 15; use Statistics::Descriptive; my @data1 = ( 1 .. 10 ); my @data2 = ( 601, 449, 424, 568, 569, 447, 425, 621, 616, 573, 584, 635, 480, 437, 724, 711, 717, 576, 724, 585, 458, 752, 753, 709, 584, 748, 628, 483, 739, 747, 694, 601, 758, 653, 487, 720, 750, 660, 588, 719, 631, 492, 584, 647, 548, 585, 649, 532, 492, 598, 653, 524, 567, 570, 506, 475, 640, 725, 688, 567, 634, 520, 488, 718, 769, 739, 576, 718, 527, 497, 698, 736, 785, 581, 733, 540, 537, 683, 691, 785, 588, 733, 531, 564, 581, 554, 765, 580, 626, 510, 533, 495, 470, 713, 571, 573, 476, 526, 441, 431, 686, 563, 496, 447, 518 ); my @data3 = qw/-9 2 3 44 -10 6 7/; my %DataTest = ( 'First sample test' => { 'Data' => \@data1, 'Test' => { '0' => '1', '1' => '3.25', '2' => '5.5', '3' => '7.75', '4' => '10', }, }, 'Second sample test' => { 'Data' => \@data2, 'Test' => { '0' => '424', '1' => '526', '2' => '584', '3' => '698', '4' => '785', }, }, 'Third sample test' => { 'Data' => \@data3, 'Test' => { '0' => '-10', '1' => '-3.5', '2' => '3', '3' => '6.5', '4' => '44', }, } ); # Test Quantile, foreach my $MessageTest ( sort keys %DataTest ) { my $stat = Statistics::Descriptive::Full->new(); $stat->add_data( @{ $DataTest{$MessageTest}->{Data} } ); for ( 0 .. 4 ) { is( $stat->quantile($_), $DataTest{$MessageTest}->{Test}{$_}, $MessageTest . ", Q$_" ); } }
On Fri Jul 17 14:01:20 2009, DJIBEL wrote: Show quoted text
> Hi! > > I have add new method call quantile. > > I attached 2 files : > 1- quantile.t that you can add in t/ directory > 2- patch djibel.txt of Descriptive.pm > > I have added new method quantile, add POD text to this method, and I > have run perltidy to Descriptive.pm file.
Due to the fact you've ran perltidy on the module, the patch is incredibly large, and I cannot tell if it's OK, or if you've added something malicious. Please avoid running perltidy, and submit a smaller patch that just adds the extra method (with its POD). Also, please use the "svn diff" command to generate the patch, when run from the root of the repository. You can also do "svn add" to the new test file, and then do "svn diff". Regards, -- Shlomi Fish Show quoted text
> I have test and all work well. > > You can test and update your module. > > Don't forget to tell me when you update it. > > Best Regards, > > Djibril Ousmanou
Le Sam. Jul. 18 03:21:07 2009, SHLOMIF a écrit : Show quoted text
> On Fri Jul 17 14:01:20 2009, DJIBEL wrote: > Due to the fact you've ran perltidy on the module, the patch is > incredibly large, and I cannot tell if it's OK, or if you've added > something malicious. Please avoid running perltidy, and submit a smaller > patch that just adds the extra method (with its POD). > > Also, please use the "svn diff" command to generate the patch, when run > from the root of the repository. You can also do "svn add" to the new > test file, and then do "svn diff". > > Regards, > > -- Shlomi Fish >
Hi. You can download the patch. I hope all it is OK now for your module update. I have use svn to download your module. I have added quantile.t test file and update descriptive.pm (new method and POD). Then I have created the patch file using svndiff. Regards, Djibril Ousmanou

Message body is not shown because it is too large.

Please, don't use the patch of my last post. Use this one. Thank you. Djibril Ousmanou
Index: quantile.t =================================================================== --- quantile.t (révision 0) +++ quantile.t (révision 0) @@ -0,0 +1,72 @@ +#!/usr/bin/perl +#================================================================== +# Author : Djibril Ousmanou +# Copyright : 2009 +# Update : 20/07/2009 +# AIM : Test quantile type 7 calcul +#================================================================== +use strict; +use warnings; +use Carp; + +use Test::More tests => 15; +use Statistics::Descriptive; + +my @data1 = ( 1 .. 10 ); +my @data2 = ( + 601, 449, 424, 568, 569, 447, 425, 621, 616, 573, 584, 635, 480, 437, + 724, 711, 717, 576, 724, 585, 458, 752, 753, 709, 584, 748, 628, 483, + 739, 747, 694, 601, 758, 653, 487, 720, 750, 660, 588, 719, 631, 492, + 584, 647, 548, 585, 649, 532, 492, 598, 653, 524, 567, 570, 506, 475, + 640, 725, 688, 567, 634, 520, 488, 718, 769, 739, 576, 718, 527, 497, + 698, 736, 785, 581, 733, 540, 537, 683, 691, 785, 588, 733, 531, 564, + 581, 554, 765, 580, 626, 510, 533, 495, 470, 713, 571, 573, 476, 526, + 441, 431, 686, 563, 496, 447, 518 +); +my @data3 = qw/-9 2 3 44 -10 6 7/; + +my %DataTest = ( + 'First sample test' => { + 'Data' => \@data1, + 'Test' => { + '0' => '1', + '1' => '3.25', + '2' => '5.5', + '3' => '7.75', + '4' => '10', + }, + }, + 'Second sample test' => { + 'Data' => \@data2, + 'Test' => { + '0' => '424', + '1' => '526', + '2' => '584', + '3' => '698', + '4' => '785', + }, + }, + 'Third sample test' => { + 'Data' => \@data3, + 'Test' => { + '0' => '-10', + '1' => '-3.5', + '2' => '3', + '3' => '6.5', + '4' => '44', + }, + } +); + +# Test Quantile, +foreach my $MessageTest ( sort keys %DataTest ) { + my $stat = Statistics::Descriptive::Full->new(); + $stat->add_data( @{ $DataTest{$MessageTest}->{Data} } ); + for ( 0 .. 4 ) { + is( + $stat->quantile($_), + $DataTest{$MessageTest}->{Test}{$_}, + $MessageTest . ", Q$_" + ); + } +} Index: Descriptive.pm =================================================================== --- Descriptive.pm (révision 3709) +++ Descriptive.pm (copie de travail) @@ -411,6 +411,38 @@ return $self->_median(); } +sub quantile { + my ( $self, $QuantileNumber ) = @_; + + unless ( defined $QuantileNumber and $QuantileNumber =~ m/^0|1|2|3|4$/ ) { + carp("Bad quartile type, must be 0, 1, 2, 3 or 4\n"); + return; + } + + $self->sort_data(); + + return $self->_data->[0] if ( $QuantileNumber == 0 ); + + my $count = $self->count(); + + return $self->_data->[ $count - 1 ] if ( $QuantileNumber == 4 ); + + my $K_quantile = ( ( $QuantileNumber / 4 ) * ( $count - 1 ) + 1 ); + my $F_quantile = $K_quantile - POSIX::floor($K_quantile); + $K_quantile = POSIX::floor($K_quantile); + + # interpolation + my $aK_quantile = $self->_data->[ $K_quantile - 1 ]; + return $aK_quantile if ( $F_quantile == 0 ); + my $aKPlus_quantile = $self->_data->[$K_quantile]; + + # Calcul quantile + my $quantile = $aK_quantile + + ( $F_quantile * ( $aKPlus_quantile - $aK_quantile ) ); + + return $quantile; +} + sub _real_calc_trimmed_mean { my $self = shift; @@ -916,6 +948,35 @@ If the percentile method is called in a list context then it will also return the index of the percentile. +=item $x = $stat->quantile($Type); + +Sorts the data and returns estimates of underlying distribution quantiles based on one +or two order statistics from the supplied elements. + +This method use the same algorithm as Excel and R language (quantile B<type 7>). + +The generic function quantile produces sample quantiles corresponding to the given probabilities. + +B<$Type> is an integer value between 0 to 4 : + + 0 => zero quartile (Q0) : minimal value + 1 => first quartile (Q1) : lower quartile = lowest cut off (25%) of data = 25th percentile + 2 => second quartile (Q2) : median = it cuts data set in half = 50th percentile + 3 => third quartile (Q3) : upper quartile = highest cut off (25%) of data, or lowest 75% = 75th percentile + 4 => fourth quartile (Q4) : maximal value + +Exemple : + + my @data = (1..10); + my $stat = Statistics::Descriptive::Full->new(); + $stat->add_data(@data); + print $stat->quantile(0); # => 1 + print $stat->quantile(1); # => 3.25 + print $stat->quantile(2); # => 5.5 + print $stat->quantile(3); # => 7.75 + print $stat->quantile(4); # => 10 + + =item $stat->median(); Sorts the data and returns the median value of the data.
This patch was integrated into the newly released Statistics-Descriptive-3.0100. Thanks. Please don't reply to this message, as it will re-open the bug.