Skip Menu |

This queue is for tickets about the TheSchwartz CPAN distribution.

Report information
The Basics
Id: 34843
Status: resolved
Priority: 0/
Queue: TheSchwartz

People
Owner: Jeff.Fearn [...] gmail.com
Requestors: jesper [...] krogh.cc
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.04
Fixed in: 1.11



Subject: Using an ordering on the select.
If possible, please use an ordering when selecting from the database. In PostgreSQL this makes queries on large set enormously faster: # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce from workqueue.job where (job.funcid in (3)) and (job.run_after <= 1207833398 ) and (job.grabbed_until <= 1207833398) limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2.16 rows=50 width=88) (actual time=1513.607..1513.745 rows=50 loops=1) -> Seq Scan on job (cost=0.00..303726.95 rows=7035367 width=88) (actual time=1513.603..1513.663 rows=50 loops=1) Filter: ((funcid = 3) AND (run_after <= 1207833398) AND (grabbed_until <= 1207833398)) Total runtime: 1513.863 ms (4 rows) # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce from workqueue.job where (job.funcid in (3)) and (job.run_after <= 1207833398 ) and (job.grabbed_until <= 1207833398) order by run_after limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3.42 rows=50 width=88) (actual time=43.736..44.059 rows=50 loops=1) -> Index Scan using func_id_run_after_grabbed_until on job (cost=0.00..480655.21 rows=7035367 width=88) (actual time=43.732..43.982 rows=50 loops=1) Index Cond: ((funcid = 3) AND (run_after <= 1207833398) AND (grabbed_until <= 1207833398)) Total runtime: 44.173 ms (4 rows) The queue is currently around 7m.. I just have trouble tweaking Data::ObjectDriver to accept ordering on the data.
From: asjo [...] koldfront.dk
Here is a quick patch that adds the ORDER BY to TheSchwartz.pm.
--- TheSchwartz.pm.orig 2008-04-11 11:58:11.000000000 +0200 +++ TheSchwartz.pm 2008-04-11 11:59:32.000000000 +0200 @@ -24,6 +24,7 @@ ## Number of jobs to fetch at a time in find_job_for_workers. our $FIND_JOB_BATCH_SIZE = 50; +our $FIND_JOB_SORT='funcid'; sub new { my TheSchwartz $client = shift; @@ -165,12 +166,12 @@ } $driver->search('TheSchwartz::Job' => { funcid => $funcid, @options - }, { limit => $limit }); + }, { limit => $limit, sort=>$FIND_JOB_SORT }); } else { push @jobs, $driver->search('TheSchwartz::Job' => { funcid => $funcid, @options - }, { limit => $limit }); + }, { limit => $limit, sort=>$FIND_JOB_SORT }); } } return @jobs; @@ -213,7 +214,7 @@ run_after => \ "<= $unixtime", grabbed_until => \ "<= $unixtime", coalesce => { op => $op, value => $coval }, - }, { limit => $FIND_JOB_BATCH_SIZE }); + }, { limit => $FIND_JOB_BATCH_SIZE, sort=>$FIND_JOB_SORT }); }; if ($@) { unless (OK_ERRORS->{ $driver->last_error || 0 }) { @@ -252,7 +253,7 @@ funcid => \@ids, run_after => \ "<= $unixtime", grabbed_until => \ "<= $unixtime", - }, { limit => $FIND_JOB_BATCH_SIZE }); + }, { limit => $FIND_JOB_BATCH_SIZE, sort=>$FIND_JOB_SORT }); }; if ($@) { unless (OK_ERRORS->{ $driver->last_error || 0 }) {
Hi, I made it default to sort by jobid as it's the only field with an index, and defaulting to sort by an un-indexed column has made things really slow on other systems I've worked on. To git@github.com:jfearn/TheSchwartz.git 02c14c9..9bb4120 master -> master
A fix for this issue shipped in Version 1.11.