Skip Menu |

This queue is for tickets about the Email-Find CPAN distribution.

Report information
The Basics
Id: 79271
Status: open
Priority: 0/
Queue: Email-Find

People
Owner: Nobody in particular
Requestors: ALUCAS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: (no value)



Subject: Specify a maximum number of emails to find
Hi, A user recently asked for help with this module on #perl at freenode. They wanted to stop processing after a specific number of matches. I had a look at your module and noticed that there's no easy way to bail out of a find early. I had to suggest that they subclass and override your find method in order to do it. Would you be interested in implementing such a feature in your module? The normal way to do it would be to walk the string with while(m//g) {} and using substr with @- etc. for the replacement, instead of the s///ge, but I think that would be slower and messier than simply returning the find routine early from within the regexp replacement block, since it is quite a short routine anyway. I've attached a simple patch to illustrate.
Subject: find_match_limit_support.patch
40c40 < my($self, $r_text) = @_; --- > my($self, $r_text, $match_limit) = @_; 45,47c45,53 < my($replace, $found) = $self->validate($1); < $emails_found += $found; < $replace; --- > #return early if match limit reached > if( defined $match_limit && $emails_found >= $match_limit ) { > return $emails_found; > } > > #otherwise process match > my($replace, $found) = $self->validate($1); > $emails_found += $found; > $replace; 79,80c85,86 < sub find_emails(\$&) { < my($r_text, $callback) = @_; --- > sub find_emails(\$&$) { > my($r_text, $callback, $match_limit) = @_; 82c88 < $finder->find($r_text); --- > $finder->find($r_text, $match_limit); 129a136 > $num_emails_found = $finder->find(\$text, 10); #limit to 10 addresses 131a139,140 > Takes an optional integer as a second argument, which indicates the maximum number of email > addresses that will be matched.
Subject: Re: [rt.cpan.org #79271] Specify a maximum number of emails to find
Date: Tue, 28 Aug 2012 15:11:46 -0700
To: bug-Email-Find [...] rt.cpan.org
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
On Aug 28, 2012, at 2:51 PM, Anthony J Lucas via RT wrote: Show quoted text
> > Hi, > > A user recently asked for help with this module on #perl at freenode. > They wanted to stop processing after a specific number of matches. I had a look at your module > and noticed that there's no easy way to bail out of a find early. > > I had to suggest that they subclass and override your find method in order to do it. > > Would you be interested in implementing such a feature in your module?
No. you can easily die from the callback to stop processing once collected address is bigger than what you want. Show quoted text
> The normal way to do it would be to walk the string with while(m//g) {} and using substr with @- > etc. for the replacement, instead of the s///ge, but I think that would be slower and messier > than simply returning the find routine early from within the regexp replacement block, since it is > quite a short routine anyway. > > I've attached a simple patch to illustrate. > > <find_match_limit_support.patch>
Try this my @addr; my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1] }); eval { $f->find(\$text) };
On Tue Aug 28 19:55:59 2012, MIYAGAWA wrote: Show quoted text
> Try this > > my @addr; > my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1]
}); Show quoted text
> eval { $f->find(\$text) }; > >
Thanks for replying. Hmmm, I was just thinking about this. The problem with using die in this way is that it limits the usefulness of having the callback functionality. Beyond using the callback as a closure for pushing to an array, it gets a bit more complicated. In order to handle real errors in functions called inside the callback you end up having to parse $@, like below. for example, just to find 2 email addresses: + my $success = UNIQUE_SUCCESS_STRING; + eval{ + find_email($text, sub { + state $count = 0; + + die $success if( ++$count > 1 ); + my($email_obj, $email) = @_; + $my_obj->do_stuff($email_obj); + return $email; + }); + }; + if ($@ =~ /^$success/) { + #we found stuff + } else { + die "It was really an error from somewhere inside ->do_stuff: $@"; + } It would be nice to cover this common case with some simpler syntax. At the moment, using die to abort means the callback can only be used for extremely simple operations (ones that won't die themselves).
Sorry, it opened the case. I was just following up. Set back to rejected.
Subject: Re: [rt.cpan.org #79271] Specify a maximum number of emails to find
Date: Tue, 28 Aug 2012 17:21:39 -0700
To: "bug-Email-Find [...] rt.cpan.org" <bug-Email-Find [...] rt.cpan.org>
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
On Aug 28, 2012, at 5:10 PM, Anthony J Lucas via RT <bug-Email-Find@rt.cpan.org> wrote: Show quoted text
> Queue: Email-Find > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=79271 > > > On Tue Aug 28 19:55:59 2012, MIYAGAWA wrote:
>> Try this >> >> my @addr; >> my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1]
> });
>> eval { $f->find(\$text) }; >> >>
> > Thanks for replying. Hmmm, I was just thinking about this. > > The problem with using die in this way is that it limits the usefulness > of having the callback functionality. > > Beyond using the callback as a closure for pushing to an array, it gets > a bit more complicated. In order to handle real errors in functions > called inside the callback you end up having to parse $@, like below.
It is common to allow die in callbacks to stop further processing. Commonly used in LWP handlers for instance. Show quoted text
> > for example, just to find 2 email addresses: > > + my $success = UNIQUE_SUCCESS_STRING; > + eval{ > + find_email($text, sub { > + state $count = 0; > + > + die $success if( ++$count > 1 ); > + my($email_obj, $email) = @_; > + $my_obj->do_stuff($email_obj); > + return $email; > + }); > + }; > + if ($@ =~ /^$success/) { > + #we found stuff > + } else { > + die "It was really an error from somewhere inside ->do_stuff: > $@"; > + } > > It would be nice to cover this common case with some simpler syntax.
I don't consider it a common case given this module has been on CPAN for 10 years and you're almost the first to request it to be built into the module. Having said that you can always subclass or fork a module and ship to CPAN as a separate module. Nobody would stop you from doing it. Show quoted text
> At the moment, using die to abort means the callback can only be used > for extremely simple operations (ones that won't die themselves).