Skip Menu |

This queue is for tickets about the List-MoreUtils CPAN distribution.

Report information
The Basics
Id: 91991
Status: resolved
Priority: 0/
Queue: List-MoreUtils

People
Owner: Nobody in particular
Requestors: belg4mit [...] pthbb.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 0.423



Subject: Request: mode
Date: Thu, 09 Jan 2014 13:07:41 -0500
To: bug-List-MoreUtils [...] rt.cpan.org
From: belg4mit [...] pthbb.org
A mode() function to return the mode of a list-- in the statistical sense, the most common value-- could be handy, specifically one that operates on text too.
Subject: Re: [rt.cpan.org #91991] AutoReply: Request: mode
Date: Thu, 09 Jan 2014 13:39:10 -0500
To: bug-List-MoreUtils [...] rt.cpan.org
From: Jerrad Pierce <belg4mit [...] pthbb.org>
Here is a sample implementation sub mode{ my %freq; $freq{$_}++ foreach @_; my @GRT = map { "$freq{$_}\000$_" } uniq @_; my @mode = sort @GRT; my $mode = pop @mode; my $multi = pop @mode; #Multi-modal test, uses Perl's autocast to number ;-) return undef if $mode == $multi; return (split("\000", $mode))[1]; }
Based on https://en.wikipedia.org/wiki/Mode_(statistics) the implementation is incomplete and very inefficient. However - when mode should be added, the distinguish between unimodal, bimodal and multimodal results must be possible and reasonable - either by different functions (name them reasonable) or by reasonable way of returning the mode.
On Tue Jul 18 13:38:19 2017, REHSACK wrote: Show quoted text
> Based on https://en.wikipedia.org/wiki/Mode_(statistics) the > implementation is incomplete and very inefficient. > > However - when mode should be added, the distinguish between unimodal, > bimodal and multimodal results must be possible and reasonable - > either by different functions (name them reasonable) or by reasonable > way of returning the mode.
Would you care to expand on your comments please? The link to a general encyclopedia entry does not offer much insight in to your claims regarding efficiency (Guttman Rosler Transform is quite efficient)* or "necessity" of detecting local maxima. The fact that such an implementation is global only should probaly be highlighted in any documentation, but more sophisticated outputs are likely more appropriate for a statistics module whereas the need to find the most common item(s) in a list is suitably general. Lastly, multi did not mean multi-modal in the general statistical sense, but rather a multi-value mode (multiple equal height peaks). * Although I did make a common mistake with the GRT that fails to sort some values correctly, Perl's COW meant any reptitive variables were a non-issue. This version corrects the GRT error though, and reports all modal values. sub mode{ my %freq; $freq{$_}++ foreach @_; my @GRT =3D reverse sort map { pack "LA*", $freq{$_}, $_ } keys %freq; my(@mode, $modefreq); foreach my $elem ( @GRT ){ my($freq, $term) =3D unpack("LA*", $elem); last unless $freq =3D=3D ($modefreq ||=3D $freq); push @mode, $term; } return \@mode if $#mode; return $mode[0]; }
Subject: Re: [rt.cpan.org #91991] Request: mode
Date: Tue, 18 Jul 2017 14:43:00 -0400
To: bug-List-MoreUtils [...] rt.cpan.org
From: Jerrad Pierce <belg4mit [...] pthbb.org>
Would you care to expand on your comments please? The link to a general encyclopedia entry does not offer much insight in to your claims regarding efficiency (Guttman Rosler Transform is quite efficient)* or "necessity" of detecting local maxima. The fact that such an implementation is global only should probaly be highlighted in any documentation, but more sophisticated outputs are likely more appropriate for a statistics module whereas the need to find the most common item(s) in a list is suitably general. Lastly, multi did not mean multi-modal, but rather a multi-value mode. * Although I did make a common mistake with the GRT that fails to sort some values correctly, Perl's COW meant any reptitive variables were a non-issue. This version corrects the GRT error though, and reports all modal values. sub mode{ my %freq; $freq{$_}++ foreach @_; my @GRT = reverse sort map { pack "LA*", $freq{$_}, $_ } keys %freq; my(@mode, $modefreq); foreach my $elem ( @GRT ){ my($freq, $term) = unpack("LA*", $elem); last unless $freq == ($modefreq ||= $freq); push @mode, $term; } return \@mode if $#mode; return $mode[0]; }
On Tue Jul 18 14:55:01 2017, JPIERCE wrote: Show quoted text
> On Tue Jul 18 13:38:19 2017, REHSACK wrote:
> > Based on https://en.wikipedia.org/wiki/Mode_(statistics) the > > implementation is incomplete and very inefficient. > > > > However - when mode should be added, the distinguish between > > unimodal, > > bimodal and multimodal results must be possible and reasonable - > > either by different functions (name them reasonable) or by reasonable > > way of returning the mode.
> > > Would you care to expand on your comments please? > The link to a general encyclopedia entry does not offer much insight > in to your claims regarding efficiency (Guttman Rosler Transform is > quite efficient)* or "necessity" of detecting local maxima. The fact > that such an implementation is global only should probaly be > highlighted in any documentation, but more sophisticated outputs are > likely more appropriate for a statistics module whereas the need to > find the most common item(s) in a list is suitably general. Lastly, > multi did not mean multi-modal in the general statistical sense, but > rather a multi-value mode (multiple equal height peaks). > > * Although I did make a common mistake with the GRT that fails to sort > some values correctly, Perl's COW meant any reptitive variables were a > non-issue. This version corrects the GRT error though, and reports all > modal values. > > sub mode{ > my %freq; > $freq{$_}++ foreach @_;
That's what uniq/singleton/duplicates do ... beside ignoring that <undef> and '' evaluates to the same hash key but represent different samples. Show quoted text
> my @GRT = reverse sort map { pack "LA*", $freq{$_}, $_ } keys > %freq;
We're just lurking for the maximum value - this is in worst case O(n). Sorting is O(n * log(n)) - with an incredible constant when leaving 1st/2nd level cache... But thinking about a PurePerl implementation, it looks more reasonable to use builtin sort over Perl code looping over an array. OTOH it can be reasonable to memorize the max-occur during count (line 3 of your code). Then grep {} can be favoured over sort... Show quoted text
> my(@mode, $modefreq); > foreach my $elem ( @GRT ){ > my($freq, $term) = unpack("LA*", $elem); > last unless $freq == ($modefreq ||= $freq); > push @mode, $term; > }
This loop is unnecessary when the sort is eliminated. Don't worry about that ;) Show quoted text
> return \@mode if $#mode; > return $mode[0]; > }
My claim was not intended to force you provide better code - I just cared for the API. I simply know - implementing the intial provided unimodal variant - releasing and getting wishes for bimodal and multimodal will be at the same time :) And your implementation lacks the "typical" G_SCALAR support (wantwarry is false) - then just the value of mode should be returned. And in list context - for each sample the mode value is the same. I know - there're a lot of (implicit) questions and they should be answered before ;) The quick response of you guides me to a proposal: I provide an implementation (XS first, since I currently work on List::MoreUtils::XS) and ask you for review on the API (I give you the link to the code and the link to the test file). Does is sound ok for you?
On Tue Jul 18 15:15:14 2017, REHSACK wrote: Show quoted text
> On Tue Jul 18 14:55:01 2017, JPIERCE wrote:
> > my @GRT = reverse sort map { pack "LA*", $freq{$_}, $_ } keys > > %freq;
> > We're just lurking for the maximum value - this is in worst case O(n). > Sorting is O(n * log(n)) - with an incredible constant when leaving > 1st/2nd level cache...
Fair enough, this was implemented more for brevity and perlishness. Eating one's own dog food and using max makes sense, although one must later iterate over %freq to find all keys with the same value (frequency). Show quoted text
> And your implementation lacks the "typical" G_SCALAR support > (wantwarry is false) - then just the value of mode should be returned. > > And in list context - for each sample the mode value is the same.
Ah, yes: return @mode if wantarray(); Show quoted text
> The quick response of you guides me to a proposal: I provide an > implementation (XS first, since I currently work on > List::MoreUtils::XS) and ask you for review on the API (I give you the > link to the code and the link to the test file). > > Does is sound ok for you?
Sounds good.
I added mode to List::MoreUtils::XS with the latest commit today (https://github.com/perl5-utils/List-MoreUtils-XS). The test code is in https://github.com/perl5-utils/Test-List-MoreUtils.
On Thu Jul 27 13:26:13 2017, REHSACK wrote: Show quoted text
> I added mode to List::MoreUtils::XS with the latest commit today > (https://github.com/perl5-utils/List-MoreUtils-XS). The test code is > in https://github.com/perl5-utils/Test-List-MoreUtils.
Not sure why the tests are separate, but after working my way through the not explicitly listed requirements for the module (Config::AutoConf, File::Find::Rule, Test::WriteVariants) I get the following when I run inline/mode.pm Array found where operator expected at inline/mode.pm line 15, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 18, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 27, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 30, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 39, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 45, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 364, at end of line (Missing operator before ?) Array found where operator expected at inline/mode.pm line 368, at end of line (Missing operator before ?) syntax error at inline/mode.pm line 15, near "mode @lorem" syntax error at inline/mode.pm line 18, near "mode @lorem" syntax error at inline/mode.pm line 27, near "mode @probes" syntax error at inline/mode.pm line 30, near "mode @probes" syntax error at inline/mode.pm line 39, near "mode @probes" syntax error at inline/mode.pm line 45, near "mode @probes" syntax error at inline/mode.pm line 352, near "mode values" syntax error at inline/mode.pm line 357, near "mode values" syntax error at inline/mode.pm line 364, near "mode @probes" syntax error at inline/mode.pm line 368, near "mode @probes" inline/mode.pm has too many errors. This is perl 5, version 18, subversion 1 (v5.18.1) built for i686-linux Copyright 1987-2013, Larry Wall And I got the following warning when building the XS WARNING: MAGICXS is not a known parameter. 'MAGICXS' is not a known MakeMaker parameter name. I also read through the source and am not sure I understand the results mode is returning. Why does mode(@probes)=7 in scalar context and mode( values %radio_ukw_nrw)=14 in scalar context? For that matter, it seems odd that it returns both WDR 5 (N=20) and WDR Eins Live (N=21) in list context... let alone give the result with the smaller number of occurrences first?
On Thu Jul 27 14:21:01 2017, JPIERCE wrote: Show quoted text
> On Thu Jul 27 13:26:13 2017, REHSACK wrote:
> > I added mode to List::MoreUtils::XS with the latest commit today > > (https://github.com/perl5-utils/List-MoreUtils-XS). The test code is > > in https://github.com/perl5-utils/Test-List-MoreUtils.
> > Not sure why the tests are separate,
Because they're exactly the same tests for List::MoreUtils and List::MoreUtils::XS. It's easier to find regressions in one of them sharing the tests. Show quoted text
> but after working my way through > the not explicitly listed requirements for the module > (Config::AutoConf, File::Find::Rule, Test::WriteVariants)
Building from repository requires having developer dependencies installed. See MAINTAINER.md ;) But I expected rather you read the tests for proving the API ;) Show quoted text
> I get the following when I run inline/mode.pm
You have to run t/xs/mode.t (again: see MAINTAINER.md) Show quoted text
> Array found where operator expected at inline/mode.pm line 15, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 18, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 27, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 30, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 39, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 45, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 364, at end > of line > (Missing operator before ?) > Array found where operator expected at inline/mode.pm line 368, at end > of line > (Missing operator before ?) > syntax error at inline/mode.pm line 15, near "mode @lorem" > syntax error at inline/mode.pm line 18, near "mode @lorem" > syntax error at inline/mode.pm line 27, near "mode @probes" > syntax error at inline/mode.pm line 30, near "mode @probes" > syntax error at inline/mode.pm line 39, near "mode @probes" > syntax error at inline/mode.pm line 45, near "mode @probes" > syntax error at inline/mode.pm line 352, near "mode values" > syntax error at inline/mode.pm line 357, near "mode values" > syntax error at inline/mode.pm line 364, near "mode @probes" > syntax error at inline/mode.pm line 368, near "mode @probes" > inline/mode.pm has too many errors. > > This is perl 5, version 18, subversion 1 (v5.18.1) built for i686- > linux > > Copyright 1987-2013, Larry Wall > > And I got the following warning when building the XS > > WARNING: MAGICXS is not a known parameter. > 'MAGICXS' is not a known MakeMaker parameter name.
You'd need a more recent ExtUtils::MakeMaker to get rid of the warning. But it doesn't affect the build. Show quoted text
> I also read through the source and am not sure I understand the > results mode is returning. Why does mode(@probes)=7 in scalar context > and mode( values %radio_ukw_nrw)=14 in scalar context?
In scalar context, it's impossible to support multimodal results, thus the only reasonable value is the highest occurance. Show quoted text
> For that > matter, it seems odd that it returns both WDR 5 (N=20) and WDR Eins > Live (N=21) in list context... let alone give the result with the > smaller number of occurrences first?
They're both 14 times in the list - not 20 nor 21 times ;) In doubt, use occurances or frequency function to visualize ;) Cheers, Jens
And btw - I found a typo and force-pushed it, so sync against upstream before running tests.
On Thu Jul 27 15:31:25 2017, REHSACK wrote: Show quoted text
> On Thu Jul 27 14:21:01 2017, JPIERCE wrote:
> > On Thu Jul 27 13:26:13 2017, REHSACK wrote:
> > > I added mode to List::MoreUtils::XS with the latest commit today > > > (https://github.com/perl5-utils/List-MoreUtils-XS). The test code > > > is > > > in https://github.com/perl5-utils/Test-List-MoreUtils.
> > > > Not sure why the tests are separate,
> > Because they're exactly the same tests for List::MoreUtils and > List::MoreUtils::XS. It's easier to find regressions in one of them > sharing the tests. >
> > but after working my way through > > the not explicitly listed requirements for the module > > (Config::AutoConf, File::Find::Rule, Test::WriteVariants)
> > Building from repository requires having developer dependencies > installed. See MAINTAINER.md ;) > But I expected rather you read the tests for proving the API ;)
I tried both, did not see or read MAINTAINER, checked README. Show quoted text
> > I get the following when I run inline/mode.pm
> > You have to run t/xs/mode.t (again: see MAINTAINER.md)
Nothing in MAINTAINER or README about tests other than "make test" :-P I'd tried downloading and using the zipfile of the project, but apparently there are too many cross-project dependencies for that to work out of the box and git itself must be used. Show quoted text
> > Array found where operator expected at inline/mode.pm line 15, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 18, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 27, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 30, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 39, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 45, at end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 364, at > > end > > of line > > (Missing operator before ?) > > Array found where operator expected at inline/mode.pm line 368, at > > end > > of line > > (Missing operator before ?) > > syntax error at inline/mode.pm line 15, near "mode @lorem" > > syntax error at inline/mode.pm line 18, near "mode @lorem" > > syntax error at inline/mode.pm line 27, near "mode @probes" > > syntax error at inline/mode.pm line 30, near "mode @probes" > > syntax error at inline/mode.pm line 39, near "mode @probes" > > syntax error at inline/mode.pm line 45, near "mode @probes" > > syntax error at inline/mode.pm line 352, near "mode values" > > syntax error at inline/mode.pm line 357, near "mode values" > > syntax error at inline/mode.pm line 364, near "mode @probes" > > syntax error at inline/mode.pm line 368, near "mode @probes" > > inline/mode.pm has too many errors. > > > > This is perl 5, version 18, subversion 1 (v5.18.1) built for i686- > > linux > > > > Copyright 1987-2013, Larry Wall > > > > And I got the following warning when building the XS > > > > WARNING: MAGICXS is not a known parameter. > > 'MAGICXS' is not a known MakeMaker parameter name.
> > You'd need a more recent ExtUtils::MakeMaker to get rid of the > warning. But it doesn't affect the build.
I suspected as much, but wondered if it might be the source of the syntax errors. Show quoted text
>
> > I also read through the source and am not sure I understand the > > results mode is returning. Why does mode(@probes)=7 in scalar context > > and mode( values %radio_ukw_nrw)=14 in scalar context?
> > In scalar context, it's impossible to support multimodal results, thus > the only reasonable value is the highest occurance. >
> > For that > > matter, it seems odd that it returns both WDR 5 (N=20) and WDR Eins > > Live (N=21) in list context... let alone give the result with the > > smaller number of occurrences first?
> > They're both 14 times in the list - not 20 nor 21 times ;) > In doubt, use occurances or frequency function to visualize ;) > > Cheers, > Jens
Oh, I see, you have several duplicate definitions of stations "87,6" => "WDR Eins Live", "87,7" => "WDR 5", "87,7" => "Welle Niederrhein", "87,7" => "WDR 5", I had simply done a count of occurrences in the file, not the hash. I'm still drawn to the idea of not having to deal with arrays for single mode results, but there's no good way to handle this without throwing an exception if the data is multi-modal, which would require a second invocation in list context. I guess one could always do: $mode = (mode(@data))[0]; But this seems to work, and getting ready access to the frequency is handy too. Thanks!
On Thu Jul 27 17:21:59 2017, JPIERCE wrote: Show quoted text
> On Thu Jul 27 15:31:25 2017, REHSACK wrote:
> > On Thu Jul 27 14:21:01 2017, JPIERCE wrote:
> > > On Thu Jul 27 13:26:13 2017, REHSACK wrote:
> > > > I added mode to List::MoreUtils::XS with the latest commit today > > > > (https://github.com/perl5-utils/List-MoreUtils-XS). The test code > > > > is > > > > in https://github.com/perl5-utils/Test-List-MoreUtils.
> > > > > > Not sure why the tests are separate,
> > > > Because they're exactly the same tests for List::MoreUtils and > > List::MoreUtils::XS. It's easier to find regressions in one of them > > sharing the tests. > >
> > > but after working my way through > > > the not explicitly listed requirements for the module > > > (Config::AutoConf, File::Find::Rule, Test::WriteVariants)
> > > > Building from repository requires having developer dependencies > > installed. See MAINTAINER.md ;) > > But I expected rather you read the tests for proving the API ;)
> > I tried both, did not see or read MAINTAINER, checked README.
https://github.com/perl5-utils/List-MoreUtils-XS/blob/master/MAINTAINER.md Should be included when you pull ;) Show quoted text
> > > I get the following when I run inline/mode.pm
> > > > You have to run t/xs/mode.t (again: see MAINTAINER.md)
> > Nothing in MAINTAINER or README about tests other than "make test" :-P > > I'd tried downloading and using the zipfile of the project, but > apparently there are too many cross-project dependencies for that to > work out of the box and git itself must be used.
Better do a git clone - don't know how the zipfile works. Show quoted text
> > > Array found where operator expected at inline/mode.pm line 15, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 18, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 27, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 30, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 39, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 45, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 364, at > > > end > > > of line > > > (Missing operator before ?) > > > Array found where operator expected at inline/mode.pm line 368, at > > > end > > > of line > > > (Missing operator before ?) > > > syntax error at inline/mode.pm line 15, near "mode @lorem" > > > syntax error at inline/mode.pm line 18, near "mode @lorem" > > > syntax error at inline/mode.pm line 27, near "mode @probes" > > > syntax error at inline/mode.pm line 30, near "mode @probes" > > > syntax error at inline/mode.pm line 39, near "mode @probes" > > > syntax error at inline/mode.pm line 45, near "mode @probes" > > > syntax error at inline/mode.pm line 352, near "mode values" > > > syntax error at inline/mode.pm line 357, near "mode values" > > > syntax error at inline/mode.pm line 364, near "mode @probes" > > > syntax error at inline/mode.pm line 368, near "mode @probes" > > > inline/mode.pm has too many errors. > > > > > > This is perl 5, version 18, subversion 1 (v5.18.1) built for i686- > > > linux > > > > > > Copyright 1987-2013, Larry Wall > > > > > > And I got the following warning when building the XS > > > > > > WARNING: MAGICXS is not a known parameter. > > > 'MAGICXS' is not a known MakeMaker parameter name.
> > > > You'd need a more recent ExtUtils::MakeMaker to get rid of the > > warning. But it doesn't affect the build.
> > I suspected as much, but wondered if it might be the source of the > syntax errors.
It's because the prototypes are missing because noone included List::MoreUtils::XS ;) Show quoted text
> >
> > > I also read through the source and am not sure I understand the > > > results mode is returning. Why does mode(@probes)=7 in scalar > > > context > > > and mode( values %radio_ukw_nrw)=14 in scalar context?
> > > > In scalar context, it's impossible to support multimodal results, > > thus > > the only reasonable value is the highest occurance. > >
> > > For that > > > matter, it seems odd that it returns both WDR 5 (N=20) and WDR Eins > > > Live (N=21) in list context... let alone give the result with the > > > smaller number of occurrences first?
> > > > They're both 14 times in the list - not 20 nor 21 times ;) > > In doubt, use occurances or frequency function to visualize ;) > > > > Cheers, > > Jens
> > Oh, I see, you have several duplicate definitions of stations > > "87,6" => "WDR Eins Live", > "87,7" => "WDR 5", > "87,7" => "Welle Niederrhein", > "87,7" => "WDR 5", > > I had simply done a count of occurrences in the file, not the hash. > > I'm still drawn to the idea of not having to deal with arrays for > single mode results, but there's no good way to handle this without > throwing an exception if the data is multi-modal, which would require > a second invocation in list context.
I depends - exceptions shall be thrown in case of errors. And Perl is in general made for list processing - so while I discourage scalar context for List::MoreUtils in general - it's part of it's history and I try to make the best out of that ;) Show quoted text
> I guess one could always do: > > $mode = (mode(@data))[0]; > > But this seems to work, and getting ready access to the frequency is > handy too.
I see that I do a test release tomorrow and start on PP implementation next week. Let's see whether a new combo is ready before Perl Conference in Amsterdam... Cheers, Jens
From: paul [...] city-fan.org
On Thu Jul 27 15:31:25 2017, REHSACK wrote: Show quoted text
> In doubt, use occurances or frequency function to visualize ;)
Please consider adding 'occurrences' as a synonym of 'occurances' in a future release, or, better still, make 'occurrences' the main function and add back 'occurances' as a synonym for backwards compatibility.
On Tue Aug 15 04:35:29 2017, paul@city-fan.org wrote: Show quoted text
> On Thu Jul 27 15:31:25 2017, REHSACK wrote:
> > In doubt, use occurances or frequency function to visualize ;)
> > Please consider adding 'occurrences' as a synonym of 'occurances' in a > future release, or, better still, make 'occurrences' the main function > and add back 'occurances' as a synonym for backwards compatibility.
This is a separate task. Please open a new ticket for it.