Skip Menu |

This queue is for tickets about the Apache-Solr CPAN distribution.

Report information
The Basics
Id: 94091
Status: rejected
Priority: 0/
Queue: Apache-Solr

People
Owner: Nobody in particular
Requestors: simon.rosenthal [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: possible improvement to apache::solr ?
Date: Fri, 21 Mar 2014 13:16:29 -0700 (PDT)
To: "bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From: Simon Rosenthal <simon.rosenthal [...] yahoo.com>
Hi: not a bug this time, but a suggestion for an improvement. Solr 4.7 introduced cursor based paging to more efficiently page though large result sets, and I was wondering if you had thought about enhancing Apache::Solr::Document to use this new feature ? I'd be quite interested at taking a stab at this as I am starting work on a project where I'll be downloading large numbers of query results, but I didn't want to reinvent the wheel if this was something you had planned to do. best -Simon
Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date: Fri, 21 Mar 2014 21:29:04 +0100
To: Simon Rosenthal via RT <bug-Apache-Solr [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 20:16]: Show quoted text
> Fri Mar 21 16:16:40 2014: Request 94091 was acted upon. > Transaction: Ticket created by simon.rosenthal@yahoo.com > Queue: Apache-Solr > Subject: possible improvement to apache::solr ? > > Solr 4.7 introduced cursor based paging to more efficiently page > though large result sets, and I was wondering if you had thought about > enhancing Apache::Solr::Document to use this new feature ? I'd be quite > interested at taking a stab at this as I am starting work on a project > where I'll be downloading large numbers of query results, but I didn't > want to reinvent the wheel if this was something you had planned to do.
The ::Result object already implements transparent follow-up queries. The first query asks for a number of rows, but when you ask for selected rows outside those first set, it will automatically request more. So, it already implements paging, see Apache::Solr::Result::selected() Is Solr's feature better? Should it replace the queries I make now? -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date: Fri, 21 Mar 2014 14:11:11 -0700 (PDT)
To: "bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From: Simon Rosenthal <simon.rosenthal [...] yahoo.com>
it's considerably more efficient for large result sets. Take a look at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of Large Result Sets | SearchHub | Lucene/Solr Open Source Search Since I'm anticipating some large result sets (100K docs) the performance improvement would definitely be useful. Show quoted text
________________________________ From: Mark Overmeer via RT <bug-Apache-Solr@rt.cpan.org> To: simon.rosenthal@yahoo.com Sent: Friday, March 21, 2014 4:29 PM Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ? <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > * Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 20:16]:
> Fri Mar 21 16:16:40 2014: Request 94091 was acted upon. > Transaction: Ticket created by simon.rosenthal@yahoo.com >        Queue: Apache-Solr >      Subject: possible improvement to apache::solr ? > > Solr 4.7 introduced cursor based paging to more efficiently page > though large result sets, and I was wondering if you had thought about > enhancing Apache::Solr::Document to use this new feature ? I'd be quite > interested at taking a stab at this as I am starting work on a project > where I'll be downloading large numbers of query results, but I didn't > want to reinvent the wheel if this was something you had planned to do.
The ::Result object already implements transparent follow-up queries. The first query asks for a number of rows, but when you ask for selected rows outside those first set, it will automatically request more.  So, it already implements paging, see Apache::Solr::Result::selected() Is Solr's feature better?  Should it replace the queries I make now? -- Regards,               MarkOv ------------------------------------------------------------------------       Mark Overmeer MSc                                MARKOV Solutions       Mark@Overmeer.net                          solutions@overmeer.net http://Mark.Overmeer.net                   http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date: Fri, 21 Mar 2014 22:37:10 +0100
To: Simon Rosenthal via RT <bug-Apache-Solr [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 21:11]: Show quoted text
> Queue: Apache-Solr > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > > > it's considerably more efficient for large result sets. Take a look > at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of > Large Result Sets | SearchHub | Lucene/Solr Open Source Search > > Since I'm anticipating some large result sets (100K docs) the > performance improvement would definitely be useful.
I try to understand why the new algoritm is faster, but do not find any clue in the article. Of course, if we can offer a light interface to a performance miracle, we should do it. Yes, if you are willing to try implementation it, you are welcome. Be aware that I *will* rewrite contributed code when it does not follow my own (rigid/stupid) code standards. On the other hand, I welcome constributions even when they are not perfect (yet), giving it a finishing touch myself. I expect it should be sufficient to add some code to ::Result The old interface must be kept for existing Solr instances (there are users of this Perl module and upgrading is not always a choice) Best to start working based on my raw source. I generated my releases and documentation. See http://perl.overmeer.net/apache-solr/raw/ Looking forward to your ideas. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ?
Date: Wed, 9 Apr 2014 08:47:18 -0700 (PDT)
To: "bug-Apache-Solr [...] rt.cpan.org" <bug-Apache-Solr [...] rt.cpan.org>
From: Simon Rosenthal <simon.rosenthal [...] yahoo.com>
After looking at the Apache::Solr code, I decided to take an alternative path; adapting a Perl client for another search engine we have been using to talk to Solr instead, which was a bit more straightforward. -Simon Show quoted text
________________________________ From: Mark Overmeer via RT <bug-Apache-Solr@rt.cpan.org> To: simon.rosenthal@yahoo.com Sent: Friday, March 21, 2014 5:37 PM Subject: Re: [rt.cpan.org #94091] possible improvement to apache::solr ? <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > * Simon Rosenthal via RT (bug-Apache-Solr@rt.cpan.org) [140321 21:11]:
>        Queue: Apache-Solr >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=94091 > > > it's considerably more efficient for large result sets. Take a look > at this blog - Coming Soon to Solr: Efficient Cursor Based Iteration of > Large Result Sets | SearchHub | Lucene/Solr Open Source Search > > Since I'm anticipating some large result sets (100K docs) the > performance improvement would definitely be useful.
I try to understand why the new algoritm is faster, but do not find any clue in the article. Of course, if we can offer a light interface to a performance miracle, we should do it.  Yes, if you are willing to try implementation it, you are welcome.  Be aware that I *will* rewrite contributed code when it does not follow my own (rigid/stupid) code standards.  On the other hand, I welcome constributions even when they are not perfect (yet), giving it a finishing touch myself. I expect it should be sufficient to add some code to ::Result The old interface must be kept for existing Solr instances (there are users of this Perl module and upgrading is not always a choice) Best to start working based on my raw source.  I generated my releases and documentation.  See http://perl.overmeer.net/apache-solr/raw/ Looking forward to your ideas. -- Regards,               MarkOv ------------------------------------------------------------------------       Mark Overmeer MSc                                MARKOV Solutions       Mark@Overmeer.net                          solutions@overmeer.net http://Mark.Overmeer.net                   http://solutions.overmeer.net