Skip Menu |

This queue is for tickets about the DBD-Pg CPAN distribution.

Report information
The Basics
Id: 105492
Status: open
Priority: 0/
Queue: DBD-Pg

People
Owner: greg [...] turnstep.com
Requestors: mtyson [...] redhat.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Poor performance with large amounts of placeholders.
Date: Fri, 26 Jun 2015 14:52:53 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
Hi, I've found a rather extreme case where our application will generate a query with approx 20,000 '?' placeholders. When DBD::Pg gets such a query, it spends a large amount of time (3 - 5 seconds) processing the query before dispatching it on to postgres. I found this was due to the way placeholders are stored in linked lists. It essentially is a quadratic algorithm that becomes quite slow with large lists. I've had a stab at optimizing the code to use an array and would like your feedback on the patch. It passes the test suite and I'm not aware of any changed behavior with the patch applied. There's probably some more optimizing that can be done, however this is a first pass. The patch is against version 3.5.1 I've attached a reproducer script that generates a query with 50,000 placeholders. Without the patch, execution time is approx 20 seconds With the patch, execution time is approx 900 milliseconds Regards, Matt.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #105492] AutoReply: Poor performance with large amounts of placeholders.
Date: Mon, 29 Jun 2015 09:56:23 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
Hi, I've made another pass and found another optimization that can be applied. The way DBD::Pg builds the postgres version of the placeholder string (I.E $1, $2 etc) is done using inefficient C string operations such as using strcat and strchr. I've written a trivial string library that keeps track of where the end of the string is, and the amount of memory that has been allocated. It will then append directly to the end of the string, and allocate more memory as needed. This roughly halves execution time again. With both patches applied I get approx 400 milliseconds run time for the reproducer script. Regards, Matt.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #105492] AutoReply: Poor performance with large amounts of placeholders.
Date: Mon, 29 Jun 2015 10:36:14 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
Hi, This applies the string optimization to the prepared statement creation. The speedup here isn't as dramatic, maybe 50 milliseconds. I didn't notice this part of the code when I optimized strings the first time, but I've updated it to use the new string routines for completeness sake. Regards Matt.

Message body is not shown because sender requested not to inline it.

That is a lot of placeholders! Thanks for the work on this, I will take a look soon when I get a few more tuits.
Subject: Re: [rt.cpan.org #105492] Poor performance with large amounts of placeholders.
Date: Fri, 20 Nov 2015 13:58:54 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
On 29/06/15 10:58, Greg Sabino Mullane via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=105492 > > > That is a lot of placeholders! Thanks for the work on this, I will take a look soon when I get a few more tuits. >
Hi Greg, I've updated the patch to apply to 3.5.3, and consolidated everything into this one patch. Please consider the previous patches obsolete. If there's anything I can do to help, please let me know. Regards, Matt.

Message body is not shown because sender requested not to inline it.

Hi Matt, On 2015-11-20 03:59:12, mtyson@redhat.com wrote: Show quoted text
> Hi Greg, > > I've updated the patch to apply to 3.5.3, and consolidated everything > into this one patch.
Thank you very much for the patch, that's an impressive improvement. However, it doesn't apply cleanly to git master (https://github.com/bucardo/dbdpg/) due to some nearby changes. I've applied it manually and adjusted a few things, please review the attached. It might also be worth adding a comment noting that ph_array_append() copies the ph_t it gets a pointer to, that threw me a bit when reading the loops adding placeholders. Show quoted text
> If there's anything I can do to help, please let me know.
I have a few issues wrt the string library that I didn't address myself: 1) string_append_char() can overflow if the appended string is longer than str->memory * 2 - str->length. 2) It should use size_t for the lengths, not unsigned int (I don't think PostgreSQL would accept a query > 4GiB in length, but it's the principle of the thing ;) Show quoted text
> Regards, > Matt.
Thanks, Ilmari.
Subject: 0001-DBD-Pg-Optimization-patch.patch

Message body is not shown because it is too large.

Subject: Re: [rt.cpan.org #105492] Poor performance with large amounts of placeholders.
Date: Mon, 1 Aug 2016 16:33:48 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
Hi Ilmari, On 30/07/16 01:17, Dagfinn Ilmari Mannsaker via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=105492 > > > Hi Matt, > > > I have a few issues wrt the string library that I didn't address myself: > > 1) string_append_char() can overflow if the appended string is longer than str->memory * 2 - str->length.
Can you provide a reproducer for this? I haven't been able to reproduce it with a quick check of my own. The logic for the if test makes me a bit suspicious though. I'll examine this in more detail when I get some more time later this month. Show quoted text
> 2) It should use size_t for the lengths, not unsigned int (I don't think PostgreSQL would accept a query > 4GiB in length, but it's the principle of the thing ;)
Not to mention the time spent on the network copying a query that large :). I'll fix that up in the next patch. I personally wouldn't consider this patch 'complete'. It was intended as a first pass to get some more feedback from someone more familiar with the driver code. I didn't get any reply until now, so I never worked on it further after solving the immediate performance problem. My memory is a bit fuzzy, but IIRC DBD::Pg (sans-patch) uses two linked lists to do what's needed with placeholders. My patch only replaces one of these linked lists with an array, as this is where the performance bottleneck is. My testing didn't show a performance problem with the other linked list, but you're doing O(N) malloc() and free() operations rather than just O(1) (with the odd realloc()). Also as far as I'm aware, the driver doesn't do any insertions into the middle of the list, which is the main use case of a linked list. I'd like to clean it up a bit more to be arrays only. I can probably take the existing array code and make it a bit more generic (maybe). Personally I think having a mix of an array and a linked list is a bit messy. What are your thoughts on this? I don't have time to work on this right now, but I can get back to this towards the end of August. Cheers, Matt.
Either of you feel like giving this issue some attention?
Subject: Re: [rt.cpan.org #105492] Poor performance with large amounts of placeholders.
Date: Mon, 5 Feb 2018 09:10:58 +1000
To: bug-DBD-Pg [...] rt.cpan.org
From: Matt Tyson <mtyson [...] redhat.com>
Hi, I've uploaded a new patch set onto github here: https://github.com/bucardo/dbdpg/pull/21 I don't believe this new patch set has been given an upstream review yet. I'm happy to discuss and resolve any issues arising from a review. Regards, Matt. On 05/02/18 02:08, Greg Sabino Mullane via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=105492 > > > Either of you feel like giving this issue some attention? >