Bug #94091 for Apache-Solr: possible improvement to apache::solr ?

Fri Mar 21 16:16:40 2014 simon.rosenthal [...] yahoo.com - Ticket created

Subject:	possible improvement to apache::solr ?
Date:	Fri, 21 Mar 2014 13:16:29 -0700 (PDT)
To:	"bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From:	Simon Rosenthal <simon.rosenthal [...] yahoo.com>

Hi: not a bug this time, but a suggestion for an improvement. Solr 4.7 introduced cursor based paging to more efficiently page though large result sets, and I was wondering if you had thought about enhancing Apache::Solr::Document to use this new feature ? I'd be quite interested at taking a stab at this as I am starting work on a project where I'll be downloading large numbers of query results, but I didn't want to reinvent the wheel if this was something you had planned to do. best -Simon

Fri Mar 21 16:29:22 2014 Mark [...] Overmeer.net - Correspondence added

Subject:	Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date:	Fri, 21 Mar 2014 21:29:04 +0100
To:	Simon Rosenthal via RT <bug-Apache-Solr [...] rt.cpan.org>
From:	Mark Overmeer <mark [...] overmeer.net>

* Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 20:16]: Show quoted text

> Fri Mar 21 16:16:40 2014: Request 94091 was acted upon. > Transaction: Ticket created by simon.rosenthal@yahoo.com > Queue: Apache-Solr > Subject: possible improvement to apache::solr ? > > Solr 4.7 introduced cursor based paging to more efficiently page > though large result sets, and I was wondering if you had thought about > enhancing Apache::Solr::Document to use this new feature ? I'd be quite > interested at taking a stab at this as I am starting work on a project > where I'll be downloading large numbers of query results, but I didn't > want to reinvent the wheel if this was something you had planned to do.

The ::Result object already implements transparent follow-up queries. The first query asks for a number of rows, but when you ask for selected rows outside those first set, it will automatically request more. So, it already implements paging, see Apache::Solr::Result::selected() Is Solr's feature better? Should it replace the queries I make now? -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net

Fri Mar 21 16:29:22 2014 The RT System itself - Status changed from 'new' to 'open'

Fri Mar 21 17:11:35 2014 simon.rosenthal [...] yahoo.com - Correspondence added

Subject:	Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date:	Fri, 21 Mar 2014 14:11:11 -0700 (PDT)
To:	"bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From:	Simon Rosenthal <simon.rosenthal [...] yahoo.com>

it's considerably more efficient for large result sets. Take a look at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of Large Result Sets | SearchHub | Lucene/Solr Open Source Search Since I'm anticipating some large result sets (100K docs) the performance improvement would definitely be useful. Show quoted text

________________________________ From: Mark Overmeer via RT <bug-Apache-Solr@rt.cpan.org> To: simon.rosenthal@yahoo.com Sent: Friday, March 21, 2014 4:29 PM Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ? <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > * Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 20:16]:

> Fri Mar 21 16:16:40 2014: Request 94091 was acted upon. > Transaction: Ticket created by simon.rosenthal@yahoo.com > Queue: Apache-Solr > Subject: possible improvement to apache::solr ? > > Solr 4.7 introduced cursor based paging to more efficiently page > though large result sets, and I was wondering if you had thought about > enhancing Apache::Solr::Document to use this new feature ? I'd be quite > interested at taking a stab at this as I am starting work on a project > where I'll be downloading large numbers of query results, but I didn't > want to reinvent the wheel if this was something you had planned to do.

The ::Result object already implements transparent follow-up queries. The first query asks for a number of rows, but when you ask for selected rows outside those first set, it will automatically request more. So, it already implements paging, see Apache::Solr::Result::selected() Is Solr's feature better? Should it replace the queries I make now? -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net

Fri Mar 21 17:37:25 2014 Mark [...] Overmeer.net - Correspondence added

Subject:	Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date:	Fri, 21 Mar 2014 22:37:10 +0100
To:	Simon Rosenthal via RT <bug-Apache-Solr [...] rt.cpan.org>
From:	Mark Overmeer <mark [...] overmeer.net>

* Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 21:11]: Show quoted text

> Queue: Apache-Solr > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > > > it's considerably more efficient for large result sets. Take a look > at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of > Large Result Sets | SearchHub | Lucene/Solr Open Source Search > > Since I'm anticipating some large result sets (100K docs) the > performance improvement would definitely be useful.

I try to understand why the new algoritm is faster, but do not find any clue in the article. Of course, if we can offer a light interface to a performance miracle, we should do it. Yes, if you are willing to try implementation it, you are welcome. Be aware that I *will* rewrite contributed code when it does not follow my own (rigid/stupid) code standards. On the other hand, I welcome constributions even when they are not perfect (yet), giving it a finishing touch myself. I expect it should be sufficient to add some code to ::Result The old interface must be kept for existing Solr instances (there are users of this Perl module and upgrading is not always a choice) Best to start working based on my raw source. I generated my releases and documentation. See http://perl.overmeer.net/apache-solr/raw/ Looking forward to your ideas. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net

Wed Apr 09 11:47:28 2014 simon.rosenthal [...] yahoo.com - Correspondence added

Subject:	Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date:	Wed, 9 Apr 2014 08:47:18 -0700 (PDT)
To:	"bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From:	Simon Rosenthal <simon.rosenthal [...] yahoo.com>

After looking at the Apache::Solr code, I decided to take an alternative path; adapting a Perl client for another search engine we have been using to talk to Solr instead, which was a bit more straightforward. -Simon Show quoted text

________________________________ From: Mark Overmeer via RT <bug-Apache-Solr@rt.cpan.org> To: simon.rosenthal@yahoo.com Sent: Friday, March 21, 2014 5:37 PM Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ? <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > * Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 21:11]:

> Queue: Apache-Solr > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > > > it's considerably more efficient for large result sets. Take a look > at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of > Large Result Sets | SearchHub | Lucene/Solr Open Source Search > > Since I'm anticipating some large result sets (100K docs) the > performance improvement would definitely be useful.

I try to understand why the new algoritm is faster, but do not find any clue in the article. Of course, if we can offer a light interface to a performance miracle, we should do it. Yes, if you are willing to try implementation it, you are welcome. Be aware that I *will* rewrite contributed code when it does not follow my own (rigid/stupid) code standards. On the other hand, I welcome constributions even when they are not perfect (yet), giving it a finishing touch myself. I expect it should be sufficient to add some code to ::Result The old interface must be kept for existing Solr instances (there are users of this Perl module and upgrading is not always a choice) Best to start working based on my raw source. I generated my releases and documentation. See http://perl.overmeer.net/apache-solr/raw/ Looking forward to your ideas. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net

Wed Jun 17 04:20:16 2015 MARKOV [...] cpan.org - Status changed from 'open' to 'rejected'