Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the MCE CPAN distribution.

Report information
The Basics
Id: 91778
Status: resolved
Priority: 0/
Queue: MCE

People
Owner: Nobody in particular
Requestors: jeff [...] stratopan.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Feature Request: Support for lazy arrays of input data
Date: Wed, 1 Jan 2014 00:35:18 -0800
To: bug-MCE [...] rt.cpan.org
From: Jeffrey Ryan Thalhammer <jeff [...] stratopan.com>
I've been using MCE as part of Stratopan.com and it has been wonderful. Thanks so much for your great work! To further optimize performance and memory consumption, I'd like to pass MCE->process() a lazy array that is filled only when each element is accessed (such as Tie::Array::Lazy). This currently won't work because MCE wants to know the total length of the input data array, but the length would be unknown for a lazy array. Or perhaps I could tie the data to a filehandle instead? Happy New Year!
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Wed, 1 Jan 2014 04:29:07 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Jeffrey, Yes, that's correct. MCE needs the length when processing data via the input_data option or via the process method. However, one can create a lazy array from the manager process. Workers can call MCE->do('callback_func', ...) to retrieve values from the lazy array. In this case, input_data is not specified and MCE->run(0) is used versus MCE->process(...). The MCE->do(...) method is bi-directional. my @list = MCE->do('get_items', $optional_arg1, $optional_argN); my $next = MCE->do('get_next'); The MCE->do method can be called as often as needed. The worker will need to know when to break out of a loop. Define lazy array; sub get_next { return item(s) from lazy array; } MCE->new( ... user_func => sub { my ($self) = @_; while (1) { my $next = MCE->do('get_next'); last unless defined $next; ... } } ); MCE->run(0); Regards, Mario On Wed, Jan 1, 2014 at 3:35 AM, Jeffrey Ryan Thalhammer via RT < bug-MCE@rt.cpan.org> wrote: Show quoted text
> Wed Jan 01 03:35:34 2014: Request 91778 was acted upon. > Transaction: Ticket created by jeff@stratopan.com > Queue: MCE > Subject: Feature Request: Support for lazy arrays of input data > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: jeff@stratopan.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > I've been using MCE as part of Stratopan.com and it has been wonderful. > Thanks so much for your great work! > > To further optimize performance and memory consumption, I'd like to pass > MCE->process() a lazy array that is filled only when each element is > accessed (such as Tie::Array::Lazy). > > This currently won't work because MCE wants to know the total length of > the input data array, but the length would be unknown for a lazy array. > > Or perhaps I could tie the data to a filehandle instead? > > Happy New Year! > > > >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Wed, 1 Jan 2014 05:03:33 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Saving this URL for review later on. http://stackoverflow.com/questions/109880/is-there-a-perl-solution-for-lazy-lists-this-side-of-perl-6 Perhaps, will enhance input_data or add a new input_iterator option. In the meantime, MCE-do(...) can be used. Not related, I'm currently working on a script to wrap MCE around the grep, egrep, fgrep, agrep and tre-agrep C binaries. Will look into lazy-arrays and/or custom iterators afterwards. Happy New Year, -mario On Wed, Jan 1, 2014 at 4:29 AM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Hi Jeffrey, > > Yes, that's correct. MCE needs the length when processing data via the > input_data option or via the process method. > > However, one can create a lazy array from the manager process. Workers can > call MCE->do('callback_func', ...) to retrieve values from the lazy array. > In this case, input_data is not specified and MCE->run(0) is used versus > MCE->process(...). > > The MCE->do(...) method is bi-directional. > > my @list = MCE->do('get_items', $optional_arg1, $optional_argN); > my $next = MCE->do('get_next'); > > > The MCE->do method can be called as often as needed. The worker will need > to know when to break out of a loop. > > Define lazy array; > > sub get_next { > return item(s) from lazy array; > } > > MCE->new( > ... > user_func => sub { > my ($self) = @_; > while (1) { > my $next = MCE->do('get_next'); > last unless defined $next; > ... > } > } > ); > > MCE->run(0); > > > Regards, > Mario > > > > On Wed, Jan 1, 2014 at 3:35 AM, Jeffrey Ryan Thalhammer via RT < > bug-MCE@rt.cpan.org> wrote: >
> > Wed Jan 01 03:35:34 2014: Request 91778 was acted upon. > > Transaction: Ticket created by jeff@stratopan.com > > Queue: MCE > > Subject: Feature Request: Support for lazy arrays of input data > > Broken in: (no value) > > Severity: (no value) > > Owner: Nobody > > Requestors: jeff@stratopan.com > > Status: new > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > > > I've been using MCE as part of Stratopan.com and it has been wonderful. > > Thanks so much for your great work! > > > > To further optimize performance and memory consumption, I'd like to pass > > MCE->process() a lazy array that is filled only when each element is > > accessed (such as Tie::Array::Lazy). > > > > This currently won't work because MCE wants to know the total length of > > the input data array, but the length would be unknown for a lazy array. > > > > Or perhaps I could tie the data to a filehandle instead? > > > > Happy New Year! > > > > > > > >
> >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Fri, 10 Jan 2014 15:48:06 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Looking into this now. If feasible, will be included for the upcoming MCE 1.505 release. Regards, Mario On Wed, Jan 1, 2014 at 5:03 AM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Saving this URL for review later on. > > http://stackoverflow.com/questions/109880/is-there-a-perl-solution-for-lazy-lists-this-side-of-perl-6 > > Perhaps, will enhance input_data or add a new input_iterator option. In the > meantime, MCE-do(...) can be used. > > Not related, I'm currently working on a script to wrap MCE around the grep, > egrep, fgrep, agrep and tre-agrep C binaries. > > Will look into lazy-arrays and/or custom iterators afterwards. > > Happy New Year, > > -mario > > > > On Wed, Jan 1, 2014 at 4:29 AM, Mario Roy via RT <bug-MCE@rt.cpan.org
> >wrote:
>
> > Queue: MCE > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > Hi Jeffrey, > > > > Yes, that's correct. MCE needs the length when processing data via the > > input_data option or via the process method. > > > > However, one can create a lazy array from the manager process. Workers
> can
> > call MCE->do('callback_func', ...) to retrieve values from the lazy
> array.
> > In this case, input_data is not specified and MCE->run(0) is used versus > > MCE->process(...). > > > > The MCE->do(...) method is bi-directional. > > > > my @list = MCE->do('get_items', $optional_arg1, $optional_argN); > > my $next = MCE->do('get_next'); > > > > > > The MCE->do method can be called as often as needed. The worker will need > > to know when to break out of a loop. > > > > Define lazy array; > > > > sub get_next { > > return item(s) from lazy array; > > } > > > > MCE->new( > > ... > > user_func => sub { > > my ($self) = @_; > > while (1) { > > my $next = MCE->do('get_next'); > > last unless defined $next; > > ... > > } > > } > > ); > > > > MCE->run(0); > > > > > > Regards, > > Mario > > > > > > > > On Wed, Jan 1, 2014 at 3:35 AM, Jeffrey Ryan Thalhammer via RT < > > bug-MCE@rt.cpan.org> wrote: > >
> > > Wed Jan 01 03:35:34 2014: Request 91778 was acted upon. > > > Transaction: Ticket created by jeff@stratopan.com > > > Queue: MCE > > > Subject: Feature Request: Support for lazy arrays of input data > > > Broken in: (no value) > > > Severity: (no value) > > > Owner: Nobody > > > Requestors: jeff@stratopan.com > > > Status: new > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > > > > > > I've been using MCE as part of Stratopan.com and it has been wonderful. > > > Thanks so much for your great work! > > > > > > To further optimize performance and memory consumption, I'd like to
> pass
> > > MCE->process() a lazy array that is filled only when each element is > > > accessed (such as Tie::Array::Lazy). > > > > > > This currently won't work because MCE wants to know the total length of > > > the input data array, but the length would be unknown for a lazy array. > > > > > > Or perhaps I could tie the data to a filehandle instead? > > > > > > Happy New Year! > > > > > > > > > > > >
> > > >
> >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Mon, 13 Jan 2014 01:46:35 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Jeffrey, Btw, I want to help you with the memory consumption aspect. Are workers consuming a lot of memory or just the manager process or both? Are you calling the spawn method before creating the array? The upcoming MCE 1.505 will allow one to specify a code reference for input_data. The lazy array may be declared inside the iterator closure below (was not sure if @a is needed after running MCE). The code snippet below behaves similarly to other input_data types with full support for chunk_size => 1 or greater, MCE->abort, MCE->next and MCE->last. use Tie::Array::Lazy; use MCE; tie my @a, 'Tie::Array::Lazy', [], sub { $_[0]->index; }; sub _iterator { my $j = 0; my $max = 1000; return sub { my $i = $j; $j += MCE->chunk_size; return if $i > $max; return $j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]; }; } MCE->new( max_workers => 4, chunk_size => 15, input_data => _iterator(), user_func => sub { my ($self, $chunk_ref, $chunk_id) = @_; ## $_ = $chunk_ref->[0] when chunk_size => 1, otherwise $_ = $chunk_ref ## MCE->print($_, "\n"); MCE->print("$chunk_id: ", join(' ', @{ $chunk_ref }), "\n"); } )->run; Will commit the update to MCE into SVN in the next couple of days. In the meantime, the following can be done using MCE 1.504 (chunk_size being 1 by default). Length is used inside the while loop due to "defined" not working -- will be fixed in 1.505 (update to the do method to support both "" and undef properly). use Tie::Array::Lazy; use MCE; tie my @a, 'Tie::Array::Lazy', [], sub { $_[0]->index }; { my $max = 1000; my $j = 0; sub _iterator { return if $j > $max; return $a[$j++]; } } my $mce = MCE->new( max_workers => 4, user_func => sub { my ($self) = @_; while (length (my $next = MCE->do('_iterator'))) { MCE->print($next . "\n"); } } )->run; Perhaps, you're wanting to chunk as well. Below works with MCE 1.504 while waiting for 1.505 to be released soon. use Tie::Array::Lazy; use MCE; tie my @a, 'Tie::Array::Lazy', [], sub { $_[0]->index }; { my $max = 1000; my $chunk_size = 15; my $j = 0; sub _iterator { my $i = $j; $j += $chunk_size; return if $i > $max; return ($j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]); } } MCE->new( max_workers => 4, user_func => sub { my ($self) = @_; while (my @next = MCE->do('_iterator')) { MCE->print(join(' ', @next), "\n"); } } )->run; Perhaps chunk_id is needed too. use Tie::Array::Lazy; use MCE; tie my @a, 'Tie::Array::Lazy', [], sub { $_[0]->index }; { my $max = 1000; my $chunk_size = 15; my $chunk_id = 0; my $j = 0; sub _iterator { my $i = $j; $j += $chunk_size; return if $i > $max; return (++$chunk_id, $j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]); } } MCE->new( max_workers => 4, user_func => sub { my ($self) = @_; while (my ($chunk_id, @next) = MCE->do('_iterator')) { MCE->print("$chunk_id: ", join(' ', @next), "\n"); } } )->run; Again, the MCE 1.505 release is coming soon. Regards, Mario On Fri, Jan 10, 2014 at 3:48 PM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Looking into this now. If feasible, will be included for the upcoming MCE > 1.505 release. > > Regards, > Mario > > > On Wed, Jan 1, 2014 at 5:03 AM, Mario Roy via RT <bug-MCE@rt.cpan.org
> >wrote:
>
> > Queue: MCE > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > Saving this URL for review later on. > > > >
> http://stackoverflow.com/questions/109880/is-there-a-perl-solution-for-lazy-lists-this-side-of-perl-6
> > > > Perhaps, will enhance input_data or add a new input_iterator option. In
> the
> > meantime, MCE-do(...) can be used. > > > > Not related, I'm currently working on a script to wrap MCE around the
> grep,
> > egrep, fgrep, agrep and tre-agrep C binaries. > > > > Will look into lazy-arrays and/or custom iterators afterwards. > > > > Happy New Year, > > > > -mario > > > > > > > > On Wed, Jan 1, 2014 at 4:29 AM, Mario Roy via RT <bug-MCE@rt.cpan.org
> > >wrote:
> >
> > > Queue: MCE > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > > > Hi Jeffrey, > > > > > > Yes, that's correct. MCE needs the length when processing data via the > > > input_data option or via the process method. > > > > > > However, one can create a lazy array from the manager process. Workers
> > can
> > > call MCE->do('callback_func', ...) to retrieve values from the lazy
> > array.
> > > In this case, input_data is not specified and MCE->run(0) is used
> versus
> > > MCE->process(...). > > > > > > The MCE->do(...) method is bi-directional. > > > > > > my @list = MCE->do('get_items', $optional_arg1, $optional_argN); > > > my $next = MCE->do('get_next'); > > > > > > > > > The MCE->do method can be called as often as needed. The worker will
> need
> > > to know when to break out of a loop. > > > > > > Define lazy array; > > > > > > sub get_next { > > > return item(s) from lazy array; > > > } > > > > > > MCE->new( > > > ... > > > user_func => sub { > > > my ($self) = @_; > > > while (1) { > > > my $next = MCE->do('get_next'); > > > last unless defined $next; > > > ... > > > } > > > } > > > ); > > > > > > MCE->run(0); > > > > > > > > > Regards, > > > Mario > > > > > > > > > > > > On Wed, Jan 1, 2014 at 3:35 AM, Jeffrey Ryan Thalhammer via RT < > > > bug-MCE@rt.cpan.org> wrote: > > >
> > > > Wed Jan 01 03:35:34 2014: Request 91778 was acted upon. > > > > Transaction: Ticket created by jeff@stratopan.com > > > > Queue: MCE > > > > Subject: Feature Request: Support for lazy arrays of input data > > > > Broken in: (no value) > > > > Severity: (no value) > > > > Owner: Nobody > > > > Requestors: jeff@stratopan.com > > > > Status: new > > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > > > > > > > > > > > I've been using MCE as part of Stratopan.com and it has been
> wonderful.
> > > > Thanks so much for your great work! > > > > > > > > To further optimize performance and memory consumption, I'd like to
> > pass
> > > > MCE->process() a lazy array that is filled only when each element is > > > > accessed (such as Tie::Array::Lazy). > > > > > > > > This currently won't work because MCE wants to know the total length
> of
> > > > the input data array, but the length would be unknown for a lazy
> array.
> > > > > > > > Or perhaps I could tie the data to a filehandle instead? > > > > > > > > Happy New Year! > > > > > > > > > > > > > > > >
> > > > > >
> > > >
> >

Message body is not shown because it is too large.

Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Mon, 13 Jan 2014 01:52:20 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
I apologize for all the white-space in my previous email. It wasn't like that when creating the reply. Going forward, I know to enable "plain text mode" when replying to rt.cpan.org. Best, Mario On Mon, Jan 13, 2014 at 1:46 AM, Mario Roy via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Hi Jeffrey, > > Btw, I want to help you with the memory consumption aspect. Are workers > consuming a lot of memory or just the manager process or both? Are you > calling the spawn method before creating the array? > > The upcoming MCE 1.505 will allow one to specify a code reference for > input_data. The lazy array may be declared inside the iterator closure > below (was not sure if @a is needed after running MCE). > > The code snippet below behaves similarly to other input_data types with > full support for chunk_size => 1 or greater, MCE->abort, MCE->next and > MCE->last. > > > use Tie::Array::Lazy; > > use MCE; > > > tie my @a, 'Tie::Array::Lazy', [], sub { > > $_[0]->index; > > }; > > > sub _iterator { > > my $j = 0; my $max = 1000; > > > return sub { > > my $i = $j; $j += MCE->chunk_size; > > > return if $i > $max; > > return $j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]; > > }; > > } > > > MCE->new( > > max_workers => 4, chunk_size => 15, input_data => _iterator(), > > > user_func => sub { > > my ($self, $chunk_ref, $chunk_id) = @_; > > > ## $_ = $chunk_ref->[0] when chunk_size => 1, otherwise $_ = > $chunk_ref > > ## MCE->print($_, "\n"); > > > MCE->print("$chunk_id: ", join(' ', @{ $chunk_ref }), "\n"); > > } > > > )->run; > > > > > Will commit the update to MCE into SVN in the next couple of days. > > In the meantime, the following can be done using MCE 1.504 (chunk_size > being 1 by default). Length is used inside the while loop due to "defined" > not working -- will be fixed in 1.505 (update to the do method to support > both "" and undef properly). > > > use Tie::Array::Lazy; > > use MCE; > > > tie my @a, 'Tie::Array::Lazy', [], sub { > > $_[0]->index > > }; > > > { > > my $max = 1000; my $j = 0; > > > sub _iterator { > > return if $j > $max; > > return $a[$j++]; > > } > > } > > > my $mce = MCE->new( > > max_workers => 4, > > user_func => sub { > > my ($self) = @_; > > while (length (my $next = MCE->do('_iterator'))) { > > MCE->print($next . "\n"); > > } > > } > > )->run; > > > > > Perhaps, you're wanting to chunk as well. Below works with MCE 1.504 while > waiting for 1.505 to be released soon. > > > use Tie::Array::Lazy; > > use MCE; > > > tie my @a, 'Tie::Array::Lazy', [], sub { > > $_[0]->index > > }; > > > { > > my $max = 1000; my $chunk_size = 15; my $j = 0; > > > sub _iterator { > > my $i = $j; $j += $chunk_size; > > return if $i > $max; > > return ($j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]); > > } > > } > > > MCE->new( > > max_workers => 4, > > user_func => sub { > > my ($self) = @_; > > while (my @next = MCE->do('_iterator')) { > > MCE->print(join(' ', @next), "\n"); > > } > > } > > )->run; > > > > > Perhaps chunk_id is needed too. > > > use Tie::Array::Lazy; > > use MCE; > > > tie my @a, 'Tie::Array::Lazy', [], sub { > > $_[0]->index > > }; > > > { > > my $max = 1000; my $chunk_size = 15; my $chunk_id = 0; my $j = 0; > > > sub _iterator { > > my $i = $j; $j += $chunk_size; > > return if $i > $max; > > return (++$chunk_id, $j <= $max ? @a[$i .. $j - 1] : @a[$i .. $max]); > > } > > } > > > MCE->new( > > max_workers => 4, > > user_func => sub { > > my ($self) = @_; > > while (my ($chunk_id, @next) = MCE->do('_iterator')) { > > MCE->print("$chunk_id: ", join(' ', @next), "\n"); > > } > > } > > )->run; > > > > > Again, the MCE 1.505 release is coming soon. > > Regards, > Mario > > > > On Fri, Jan 10, 2014 at 3:48 PM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: >
>> Queue: MCE >> Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > >> >> Looking into this now. If feasible, will be included for the upcoming MCE >> 1.505 release. >> >> Regards, >> Mario >> >> >> On Wed, Jan 1, 2014 at 5:03 AM, Mario Roy via RT <bug-MCE@rt.cpan.org
>> >wrote:
>>
>> > Queue: MCE >> > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > >> > >> > Saving this URL for review later on. >> > >> >
>> http://stackoverflow.com/questions/109880/is-there-a-perl-solution-for-lazy-lists-this-side-of-perl-6
>> > >> > Perhaps, will enhance input_data or add a new input_iterator option. In
>> the
>> > meantime, MCE-do(...) can be used. >> > >> > Not related, I'm currently working on a script to wrap MCE around the
>> grep,
>> > egrep, fgrep, agrep and tre-agrep C binaries. >> > >> > Will look into lazy-arrays and/or custom iterators afterwards. >> > >> > Happy New Year, >> > >> > -mario >> > >> > >> > >> > On Wed, Jan 1, 2014 at 4:29 AM, Mario Roy via RT <bug-MCE@rt.cpan.org
>> > >wrote:
>> >
>> > > Queue: MCE >> > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > >> > > >> > > Hi Jeffrey, >> > > >> > > Yes, that's correct. MCE needs the length when processing data via the >> > > input_data option or via the process method. >> > > >> > > However, one can create a lazy array from the manager process. Workers
>> > can
>> > > call MCE->do('callback_func', ...) to retrieve values from the lazy
>> > array.
>> > > In this case, input_data is not specified and MCE->run(0) is used
>> versus
>> > > MCE->process(...). >> > > >> > > The MCE->do(...) method is bi-directional. >> > > >> > > my @list = MCE->do('get_items', $optional_arg1, $optional_argN); >> > > my $next = MCE->do('get_next'); >> > > >> > > >> > > The MCE->do method can be called as often as needed. The worker will
>> need
>> > > to know when to break out of a loop. >> > > >> > > Define lazy array; >> > > >> > > sub get_next { >> > > return item(s) from lazy array; >> > > } >> > > >> > > MCE->new( >> > > ... >> > > user_func => sub { >> > > my ($self) = @_; >> > > while (1) { >> > > my $next = MCE->do('get_next'); >> > > last unless defined $next; >> > > ... >> > > } >> > > } >> > > ); >> > > >> > > MCE->run(0); >> > > >> > > >> > > Regards, >> > > Mario >> > > >> > > >> > > >> > > On Wed, Jan 1, 2014 at 3:35 AM, Jeffrey Ryan Thalhammer via RT < >> > > bug-MCE@rt.cpan.org> wrote: >> > >
>> > > > Wed Jan 01 03:35:34 2014: Request 91778 was acted upon. >> > > > Transaction: Ticket created by jeff@stratopan.com >> > > > Queue: MCE >> > > > Subject: Feature Request: Support for lazy arrays of input data >> > > > Broken in: (no value) >> > > > Severity: (no value) >> > > > Owner: Nobody >> > > > Requestors: jeff@stratopan.com >> > > > Status: new >> > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > >> > > > >> > > > >> > > > I've been using MCE as part of Stratopan.com and it has been
>> wonderful.
>> > > > Thanks so much for your great work! >> > > > >> > > > To further optimize performance and memory consumption, I'd like to
>> > pass
>> > > > MCE->process() a lazy array that is filled only when each element is >> > > > accessed (such as Tie::Array::Lazy). >> > > > >> > > > This currently won't work because MCE wants to know the total length
>> of
>> > > > the input data array, but the length would be unknown for a lazy
>> array.
>> > > > >> > > > Or perhaps I could tie the data to a filehandle instead? >> > > > >> > > > Happy New Year! >> > > > >> > > > >> > > > >> > > >
>> > > >> > >
>> > >> >
>> >>
>
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Mon, 13 Jan 2014 19:17:00 -0800
To: bug-MCE [...] rt.cpan.org
From: Jeffrey Thalhammer <jeff [...] stratopan.com>
I didn't quite follow your examples. In my case, the chunk_size is always 1. And I don't know the $max in advance (because it is lazy). So I'd like the manager to stop feeding workers when the iterator returns undef (much like when a filehandle reaches EOF). But I can see how that may not be desirable in other cases. I'll look for the next release. Perhaps it will be clear to me then. Thanks for being so responsive! -Jeff
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 01:19:32 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Jeff, Thank you for your response. I have committed an update to MCE for the feature request in SVN revision 423 (main one) and 424 (small update based on your last feedback). Here, undef is returned inside the code block for the lazy array. I believe this is what you were after, now that I understand better. use Tie::Array::Lazy; use MCE; tie my @a, 'Tie::Array::Lazy', [], sub { my $i = $_[0]->index; return ($i < 10) ? $i : undef; }; sub make_iterator { my $i = 0; my $a_ref = shift; return sub { return $a_ref->[$i++]; }; } MCE->new( max_workers => 4, input_data => make_iterator(\@a), user_func => sub { my ($self, $chunk_ref, $chunk_id) = @_; MCE->print($_, "\n"); } )->run; -- Output (output order is not guaranteed, simply for demonstration) 0 1 2 3 4 6 7 8 5 9 Regards, Mario On Mon, Jan 13, 2014 at 10:17 PM, Jeffrey Ryan Thalhammer via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > I didn't quite follow your examples. In my case, the chunk_size is always > 1. And I don't know the $max in advance (because it is lazy). So I'd like > the manager to stop feeding workers when the iterator returns undef (much > like when a filehandle reaches EOF). But I can see how that may not be > desirable in other cases. > > I'll look for the next release. Perhaps it will be clear to me then. > Thanks for being so responsive! > > -Jeff >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Mon, 13 Jan 2014 22:57:51 -0800
To: bug-MCE [...] rt.cpan.org
From: Jeffrey Thalhammer <jeff [...] stratopan.com>
I *think* that is right. Here's a more concrete example of what I'm trying to do: https://gist.github.com/thaljef/8414230
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 02:59:09 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Jeff, That works quite well. Passing $iterator is all that's needed using the latest MCE in svn r424. You may know this... Chunk size defaults to 1 and $_ always contains the value when chunk_size is 1. Max workers defaults to 1 as well. use strict; use warnings; use Path::Iterator::Rule; use MCE; my $start_dir = shift or die "Must specify a starting directory"; -d $start_dir or die "$start_dir is not a directory or does not exist"; my $rule = Path::Iterator::Rule->new->file->name( qr/[.](pm)$/ ); my $iterator = $rule->iter($start_dir, {follow_symlinks => 0, depthfirst => 1}); MCE->new( max_workers => 'auto', input_data => $iterator, user_func => sub { print "$_\n" } )->run; -- Output $ ./mce-finder.pl ../. .././lib/MCE/Core/Input/Generator.pm .././lib/MCE/Core/Input/Handle.pm .././lib/MCE/Core/Input/Iterator.pm .././lib/MCE/Core/Input/Request.pm .././lib/MCE/Core/Input/Sequence.pm .././lib/MCE/Core/Manager.pm .././lib/MCE/Core/Validation.pm .././lib/MCE/Core/Worker.pm .././lib/MCE/Flow.pm .././lib/MCE/Grep.pm .././lib/MCE/Loop.pm .././lib/MCE/Map.pm .././lib/MCE/Queue.pm .././lib/MCE/Signal.pm .././lib/MCE/Stream.pm .././lib/MCE/Subs.pm .././lib/MCE/Util.pm .././lib/MCE.pm This is really cool on being able to many-core an iterator like this. Regards, Mario On Tue, Jan 14, 2014 at 1:58 AM, Jeffrey Ryan Thalhammer via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > I *think* that is right. Here's a more concrete example of what I'm trying > to do: > > https://gist.github.com/thaljef/8414230 >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 00:03:18 -0800
To: bug-MCE [...] rt.cpan.org
From: Jeffrey Thalhammer <jeff [...] stratopan.com>
On Mon, Jan 13, 2014 at 11:59 PM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: Show quoted text
> > > That works quite well. Passing $iterator is all that's needed using > the latest MCE in svn r424. You may know this... Chunk size defaults > to 1 and $_ always contains the value when chunk_size is 1. Max > workers defaults to 1 as well.
That's great to hear! I look forward to seeing the release. I did not know that about the defaults. Thanks for the protip! -Jeff
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 03:40:39 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Ok, great. I added MCE->wid to the output just to see this in action. Notice the many parallel workers. MCE->new( max_workers => 'auto', input_data => $iterator, user_func => sub { MCE->print(MCE->wid, ": $_\n") } )->run; -- Output [mario@localhost examples]$ ./mce-finder.pl ../. 5: .././lib/MCE/Core/Input/Generator.pm 1: .././lib/MCE/Core/Input/Handle.pm 2: .././lib/MCE/Core/Input/Iterator.pm 3: .././lib/MCE/Core/Input/Request.pm 7: .././lib/MCE/Core/Manager.pm 8: .././lib/MCE/Core/Input/Sequence.pm 6: .././lib/MCE/Core/Validation.pm 4: .././lib/MCE/Core/Worker.pm 5: .././lib/MCE/Flow.pm 1: .././lib/MCE/Grep.pm 2: .././lib/MCE/Loop.pm 3: .././lib/MCE/Map.pm 7: .././lib/MCE/Queue.pm 8: .././lib/MCE/Signal.pm 6: .././lib/MCE/Stream.pm 4: .././lib/MCE/Subs.pm 5: .././lib/MCE/Util.pm 1: .././lib/MCE.pm The MCE->print method (there's also MCE->say and MCE->printf) is useful when wanting to serialize output from many workers. Otherwise, it's possible that STDOUT may appear garbled when many workers are writing simultaneously. MCE provides a chunking engine (not needed above), a many-core engine, and a serializing engine. Therefore, if you ever need to serialize an action such as logging or anything, you have the MCE->do('callback'), MCE->sendto("file:/path/to/log_file.log", $log_data) and the output methods MCE->say(...), MCE->print(...), and MCE->printf(...). All of these can be called as many times as needed. There is also MCE->gather(...), one specifies the gather option for where you want the data to go to (an array or a subroutine). The MCE::Loop pod doc has a lot of usage on MCE->gather. There are also the 5 models in MCE 1.5x. Below showing MCE::Loop. This does the same thing as above with much less MCE code. The 5 models configure max_workers => 'auto' by default. However, we want 1 for chunk_size and can be set on the same line when including the module. Notice the mce_loop line. That's it. use strict; use warnings; use Path::Iterator::Rule; use MCE::Loop CHUNK_SIZE => 1; my $start_dir = shift or die "Must specify a starting directory"; -d $start_dir or die "$start_dir is not a directory or does not exist"; my $rule = Path::Iterator::Rule->new->file->name( qr/[.](pm)$/ ); my $iterator = $rule->iter($start_dir, {follow_symlinks => 0, depthfirst => 1}); mce_loop { MCE->print(MCE->wid, ": $_\n") } $iterator; Some folks prefer using the Core API while others enjoy using the new models. It doesn't matter which folks use. MCE was created to help folks maximize on all available cores. Regards, Mario
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 03:59:49 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
For completeness and fairness to Tie::Array::Lazy (the output did not match previously). It needs the make_iterator function below. Although, Tie::Array::Lazy isn't needed here, wanted to see this work. Notice the $mce->process line below. use MCE 1.505; use Tie::Array::Lazy; use Path::Iterator::Rule; my $start_dir = shift or die "Must specify a starting directory"; -d $start_dir or die "$start_dir is not a directory or does not exist"; my $rule = Path::Iterator::Rule->new->file->name( qr/[.](pm)$/ ); my $iterator = $rule->iter($start_dir, {follow_symlinks => 0, depthfirst => 1}); tie my @paths, 'Tie::Array::Lazy', [], sub {$iterator->()}; sub make_iterator { my $i = 0; my $a_ref = shift; return sub { return $a_ref->[$i++]; }; } my $mce = MCE->new( user_func => sub { print "$_\n" } ); $mce->process(make_iterator(\@paths)); $mce->shutdown; It looks like MCE handles input iterators quite well. Will close off the feature request. MCE 1.505 will be released this coming weekend. Regards, Mario On Tue, Jan 14, 2014 at 3:40 AM, Mario Roy via RT <bug-MCE@rt.cpan.org> wrote: Show quoted text
> Queue: MCE > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Ok, great. I added MCE->wid to the output just to see this in action. > Notice the many parallel workers. > > MCE->new( > max_workers => 'auto', > input_data => $iterator, > user_func => sub { MCE->print(MCE->wid, ": $_\n") } > )->run; > > -- Output > > [mario@localhost examples]$ ./mce-finder.pl ../. > 5: .././lib/MCE/Core/Input/Generator.pm > 1: .././lib/MCE/Core/Input/Handle.pm > 2: .././lib/MCE/Core/Input/Iterator.pm > 3: .././lib/MCE/Core/Input/Request.pm > 7: .././lib/MCE/Core/Manager.pm > 8: .././lib/MCE/Core/Input/Sequence.pm > 6: .././lib/MCE/Core/Validation.pm > 4: .././lib/MCE/Core/Worker.pm > 5: .././lib/MCE/Flow.pm > 1: .././lib/MCE/Grep.pm > 2: .././lib/MCE/Loop.pm > 3: .././lib/MCE/Map.pm > 7: .././lib/MCE/Queue.pm > 8: .././lib/MCE/Signal.pm > 6: .././lib/MCE/Stream.pm > 4: .././lib/MCE/Subs.pm > 5: .././lib/MCE/Util.pm > 1: .././lib/MCE.pm > > The MCE->print method (there's also MCE->say and MCE->printf) is > useful when wanting to serialize output from many workers. Otherwise, > it's possible that STDOUT may appear garbled when many workers are > writing simultaneously. MCE provides a chunking engine (not needed > above), a many-core engine, and a serializing engine. > > Therefore, if you ever need to serialize an action such as logging or > anything, you have the MCE->do('callback'), > MCE->sendto("file:/path/to/log_file.log", $log_data) and the output > methods MCE->say(...), MCE->print(...), and MCE->printf(...). All of > these can be called as many times as needed. There is also > MCE->gather(...), one specifies the gather option for where you want > the data to go to (an array or a subroutine). The MCE::Loop pod doc > has a lot of usage on MCE->gather. > > There are also the 5 models in MCE 1.5x. Below showing MCE::Loop. This > does the same thing as above with much less MCE code. The 5 models > configure max_workers => 'auto' by default. However, we want 1 for > chunk_size and can be set on the same line when including the module. > Notice the mce_loop line. That's it. > > use strict; > use warnings; > > use Path::Iterator::Rule; > use MCE::Loop CHUNK_SIZE => 1; > > my $start_dir = shift or die "Must specify a starting directory"; > -d $start_dir or die "$start_dir is not a directory or does not exist"; > > my $rule = Path::Iterator::Rule->new->file->name( qr/[.](pm)$/ ); > my $iterator = $rule->iter($start_dir, {follow_symlinks => 0, depthfirst => 1}); > > mce_loop { MCE->print(MCE->wid, ": $_\n") } $iterator; > > > Some folks prefer using the Core API while others enjoy using the new > models. It doesn't matter which folks use. MCE was created to help > folks maximize on all available cores. > > Regards, > Mario >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 17:58:00 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
Hi Jeff, Thanks for your time. I do not believe that I would have succeeded initially for the new input type (support for iterator objects). I am thankful for your time and therefore added you to the CREDITS file. This work is complete. The svn r425 commit is entirely changes to the documentation. It is clear now even to me :). Regards, Mario
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 14 Jan 2014 16:41:56 -0800
To: bug-MCE [...] rt.cpan.org
From: Jeffrey Thalhammer <jeff [...] stratopan.com>
Mario- It has been my pleasure. But you did all the work -- I was just a provocateur! I'll be happy to write a review of MCE on cpanratings.perl.org after I try the new release. -Jeff On Tue, Jan 14, 2014 at 2:58 PM, Mario Roy via RT <bug-MCE@rt.cpan.org>wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=91778 > > > Hi Jeff, > > Thanks for your time. I do not believe that I would have succeeded > initially for the new input type (support for iterator objects). I am > thankful for your time and therefore added you to the CREDITS file. > > This work is complete. The svn r425 commit is entirely changes to the > documentation. It is clear now even to me :). > > Regards, > Mario > >
Subject: Re: [rt.cpan.org #91778] Feature Request: Support for lazy arrays of input data
Date: Tue, 21 Jan 2014 05:14:42 -0500
To: bug-MCE [...] rt.cpan.org
From: Mario Roy <marioeroy [...] gmail.com>
MCE 1.505 has been released and includes support for an iterator reference as input data. https://metacpan.org/pod/release/MARIOROY/MCE-1.505/lib/MCE.pod For a very fast file finder, one can utilize egrep's directory recursion capabilities. MCE also supports a file handle (pipe) as input data. The egrep binary under Linux supports --include=SPEC, --exclude=SPEC, --exclude-from=SPEC, and --exclude-dir=SPEC. These options may be specified multiple times. Egrep is not parsing the entire file for the demonstration below, just the start of the first line '^' and goes on to the next file. Anyway, this is just another way if speed is important. use MCE; my $start_dir = shift or die "Please specify a starting directory"; -d $start_dir or die "cannot open '$start_dir': No such file or directory"; my $mce = MCE->new( max_workers => 'auto', user_func => sub { chomp; MCE->print(MCE->wid, ": $_\n") } )->spawn; open my $egrep_fh, '-|', 'egrep', '-lsr', '--include=*.pm', '^', $start_dir; $mce->process( $egrep_fh ); close $egrep_fh; Best, Mario