Bug #98583 for List-MoreUtils: Feature: new routine count()

Wed Sep 03 07:55:58 2014 EDAVIS [...] cpan.org - Ticket created

Subject:

Feature: new routine count()

It would be useful to have a routine count() which counts the number of elements in a list. The simplest implementation is as sub count { scalar @_ } Further optimizations are possible with XS code, but the above should be enough. You may wonder why this is necessary - there is some discussion at <http://thread.gmane.org/gmane.comp.lang.perl.perl5.porters/134968/>. Long story short, most of the idioms or tricks suggested to get the number of elements in a list don't actually work for all cases. This can lead to fragile code when beginners pick up scalar(...) or other not-quite-working constructs and think that this is the way to count elements. A guaranteed working way to count a list would be a most welcome addition to List::MoreUtils.

Wed Sep 03 08:03:47 2014 REHSACK [...] cpan.org - Correspondence added

I do not agree. I do not understand why using scalar context isn't sufficient and by throwing an eye on https://rt.cpan.org/Ticket/Display.html?id=96596 I tend to reject this wish. You should absolutely better reason for such a misleading routine.

Wed Sep 03 08:03:47 2014 The RT System itself - Status changed from 'new' to 'open'

Wed Sep 03 08:34:34 2014 EDAVIS [...] cpan.org - Correspondence added

I used to think that using scalar() was sufficient and I have written lots of code like that. But it is not, and this can trip you up. For example suppose you have the following code: my @got = (@first_part, @second_part); if (scalar(@got) != scalar(@expected_lines)) { die 'wrong number of lines' } That works - but if you then decide to remove the intermediate @got variable the code becomes wrong: if (scalar(@first_part, @second_part) != scalar(@expected_lines)) { ... } This is because scalar() is not, in fact, a routine that returns the number of elements in a list. Many programmers may assume that it is, but that is incorrect. It just returns the result in scalar context, whatever that may be. If you want to get the count of elements in a list then you have to write something like scalar( ()= foo ) which, to my eyes, is just the kind of thing that would usefully be wrapped up in a one-word List::MoreUtils function.

Wed Sep 03 08:42:30 2014 REHSACK [...] cpan.org - Correspondence added

But this leads to count(\@;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@) - and will fail when one list to much is given. Further: count(@) ... doesn't save anything but creates a temporary list - one can do it by hand either. I disagree that the concept matches List::MoreUtils.

Wed Sep 03 08:55:41 2014 EDAVIS [...] cpan.org - Correspondence added

I believe that count() does not require any prototype, or simply a prototype of (@). Show quoted text

>Further: count(@) ... doesn't save anything but creates a temporary >list - one can do it by hand either.

That's right; although an XS implementation can be used to avoid the temporary. However, the motivation is not for efficiency but as a way to say what you mean and have code which will work without surprises. What is scalar()? It coerces its argument list to scalar context. Sometimes this gives the count of items in the list, but sometimes it doesn't. Surely it is not a good idea to say 'scalar' when what you really want is not the behaviour of scalar() but just a count of items. I would suggest that count() is a cousin of true() and false(), just without the first BLOCK argument. Then, for example, you have a recipe to see the proportion of items matching some test: my $p = (true { $_ > 0 } foo) / count(foo); say int($p * 100), '% of the items are positive'; If you use scalar() instead of count() then the code will work for some values of foo but not for others. Note that true() and false() do not have any problem with arbitrary lists: my $count = true { 1 } @a, @b, @c; # works correctly Whereas scalar(@a, @b, @c) does not do what you expected, if you were hoping to use scalar() as the way to count a list.

Wed Sep 03 08:58:56 2014 REHSACK [...] cpan.org - Correspondence added

All questions aside why scalar(@L1,L2) won't behave expected - you want a scalar_but_merges_lists(@L1,@L2). But what you lurk for is not a list util - you lurk for a syntax helper!

Wed Sep 03 09:06:30 2014 EDAVIS [...] cpan.org - Correspondence added

On Wed Sep 03 08:58:56 2014, REHSACK wrote: Show quoted text

>you want a scalar_but_merges_lists(@L1,@L2).

That is exactly right but I think that 'count' is a better name. However, if you think that 'count' is misleading it could be called something else. Show quoted text

>But what you lurk for is not a list util - you lurk for a syntax >helper!

I agree that a source filter or the equivalent would let this be implemented without losing efficiency. It could convert count(foo) into scalar( ()= foo ) or something equivalent. However, I think that efficiency is secondary and the first priority is to write code that clearly expresses what's wanted, and not to use the wrong tool for the job (as scalar() is the wrong tool to use for counting elements in a list). Clearly the definition of count() is trivial and could be implemented separately in each project, or even in each source file. But there is value in having an agreed name and common location. The same is true of most of the routines in List::MoreUtils, they are often one-liners (in pure-Perl implementation at least), but having them collected together with known names and documentation means that different programmers can read each others' code and know exactly what is intended. That is what makes List::MoreUtils such a valuable module.

Wed Sep 03 09:25:16 2014 REHSACK [...] cpan.org - Correspondence added

My point is: this is not a list util - it doesn't improve dealing with lists, it improves dealing with Perl(5) syntax. I'm open for other arguements - but at this point I reject the wish.

Wed Sep 03 09:25:19 2014 REHSACK [...] cpan.org - Status changed from 'open' to 'rejected'

Wed Sep 03 09:55:49 2014 EDAVIS [...] cpan.org - Correspondence added

But isn't true BLOCK foo just a more convenient syntax for scalar grep BLOCK foo Again, it is something which could equally be implemented as a source filter (and might even be a bit faster that way, in the absence of XS code) but it is useful to have as a named function. To my mind, 'count the number of elements in list L' is just as much a list function as 'count the number of elements in list L matching condition P'... don't you think?

Thu Sep 04 01:31:39 2014 REHSACK [...] cpan.org - Correspondence added

No - I don't think ;) I rate some functions as not so useful placeholders - but I do not intend to waste names for shortcuts with questionable value.

Thu Sep 04 09:19:55 2014 EDAVIS [...] cpan.org - Correspondence added

Thanks for your reply. You are right that a naive implementation sub count { scalar @_ } does not add much value, just because it is so easy to write for those who want it - and also because it imposes a performance penalty. Using count(foo) will require making a copy of the list foo, whereas scalar(@x) does not take a copy of @x. It is possible to write a more efficient implementation of count() in XS code, so it can be just as fast as scalar() while not having scalar()'s gotchas. Then it becomes a bit more useful I think. If I can provide a patch with a fast XS implementation, would you consider adding count()?

Sun Sep 07 09:46:26 2014 REHSACK [...] cpan.org - Correspondence added

The most powerful implementation is the right use of scalar. Nothing can beat a native operator. I explained a bit above why every XS implementation is worse than scalar and where the bottlenecks are.

Mon Sep 08 09:49:33 2014 EDAVIS [...] cpan.org - Correspondence added

Show quoted text

>Nothing can beat a native operator. I explained a bit above why every >XS implementation is worse than scalar and where the bottlenecks are.

You make a valid point about needing to take a copy of the array (if called on a single array). On p5p, Zefram suggested a solution to this: http://article.gmane.org/gmane.comp.lang.perl.perl5.porters/135045 Following this approach, I believe it is possible to make a count() routine which - always returns the number of items in the list, with no funny cases - is just as fast as scalar(@x) for the common case of a single array (in fact, compiles to the same bytecode) - lets the programmer say what he or she means - 'I want the count of items in this list' - rather than relying on Perl's scalar conversion rules, which do not do what's wanted when the list is something other than a single array. In the same thread on p5p, a couple of other contributors suggested that the routine belonged in List::Util or List::MoreUtils. It would not need to be called count() if that name is too general. What do you think?

Mon Sep 08 11:01:36 2014 REHSACK [...] cpan.org - Correspondence added

On Mon Sep 08 09:49:33 2014, EDAVIS wrote: Show quoted text

> > Nothing can beat a native operator. I explained a bit above why every > > XS implementation is worse than scalar and where the bottlenecks are.

> > You make a valid point about needing to take a copy of the array (if > called on a single array). > On p5p, Zefram suggested a solution to this: > > http://article.gmane.org/gmane.comp.lang.perl.perl5.porters/135045 > > Following this approach, I believe it is possible to make a count() > routine which > > - always returns the number of items in the list, with no funny cases > > - is just as fast as scalar(@x) for the common case of a single array > (in fact, compiles to the same bytecode) > > - lets the programmer say what he or she means - 'I want the count of > items in this list' - rather than relying on Perl's scalar conversion > rules, which do not do what's wanted when the list is something other > than a single array. > > In the same thread on p5p, a couple of other contributors suggested > that the routine belonged in List::Util or List::MoreUtils. It would > not need to be called count() if that name is too general. What do > you think?

Patches welcome! Including tests and XS ;) Cheers

Tue Sep 09 05:13:03 2014 EDAVIS [...] cpan.org - Correspondence added

Great, I'll get started.