Bug #33310 for SQL-Interp: Output depends on vagaries of hash enumeration

Sat Feb 16 20:51:37 2008 andykirk [...] ubermonkey.net - Ticket created

Subject:

Output depends on vagaries of hash enumeration

If two hashes are built to have the same contents, but the contents are added in a different order, then keys(), values() and each() may return the contents in a different order. This makes SQL::Interp produce SQL strings that vary, yet do the same thing, causing unnecessary prepared statement handle cache misses. For example ($sql not printed for brevity): perl -MList::Util=shuffle -MSQL::Interp=sql_interp -e '@a=qw(one two three four five); for my $i (0..5) { %h = map { $_ => $_ } shuffle @a; ($sql, @bind) = sql_interp("SELECT * FROM t WHERE ", \%h); print join(", ", @bind), "\n"; }' Output: three, five, one, two, four three, one, five, two, four three, five, one, two, four three, one, five, two, four three, one, five, two, four three, five, one, two, four I suggest prefixing calls to keys() in list context with a sort. The shortness of parameter lists makes this sort cheap compared to a round trip to the DB to prepare a new statement. I've tried this (and replacing a call to values() with a map) but it made some tests fail. I'm trying to fix those tests at the moment but the test suite is a bit cryptic. I believe that some of the tests rely on the order that keys() returns after a given order of insertion, which may change in future versions of perl. Its debatable whether this should be considered a bug, I dare say most times a hash is passed to this module, it was built in a consistent order and so you'd get a consistent SQL string. Also the worst case scenario is that prepared statement handles are wasted. I'll send in a patch if I fix this for myself.

Sun Feb 17 05:57:53 2008 andykirk [...] ubermonkey.net - Correspondence added

From:

andykirk [...] ubermonkey.net

Attached is a patch that implements the key sorting as an option, if SORT_KEYS => 1 is passed at import time. I've changed the tests to match (though now they don't work without this option turned on), and changed a print STDERR to carp so that I could track on which line the tests were failing. I hope that this patch proves useful but if not I'll just maintain it in my own branch. Cheers

diff -rubB SQL-Interp-1.06/lib/SQL/Interp.pm SQL-Interp-1.06-sortkeys/lib/SQL/Interp.pm --- SQL-Interp-1.06/lib/SQL/Interp.pm 2007-12-21 04:41:59.000000000 +1030 +++ SQL-Interp-1.06-sortkeys/lib/SQL/Interp.pm 2008-02-17 21:07:34.000000000 +1030 @@ -48,6 +48,10 @@ # [local to sql_interp functions] my @bind; +# whether to sort all hash key sequences +# may improve prepared statement cache hit ratio +my $sort_keys = undef; + sub import { my $class = shift; my @params = @_; @@ -59,6 +63,7 @@ TRACE_SQL => sub { $trace_sql_enabled = shift @params; print STDERR "TRACE_SQL enabled\n" if $trace_sql_enabled; }, + SORT_KEYS => sub { $sort_keys = shift @params; }, __WRAP => sub { $is_wrapped = shift @params; } ); @_ = ($class); # unprocessed params @@ -189,7 +194,7 @@ my $val = $item->{$key}; "$key=" . _sql_interp_data($val); - } keys %$item); + } ($sort_keys ? sort keys %$item : keys %$item)); } elsif ($sql =~ /\bINSERT[\w\s]*\sINTO\s*$id_match\s*$/si) { $item = [ $$item ] if ref $item eq 'SCALAR'; @@ -199,11 +204,12 @@ } @$item) . ")"; } elsif (ref $item eq 'HASH') { + my @keyseq = $sort_keys ? sort keys %$item : keys %$item; $sql .= - " (" . join(', ', keys %$item) . ")" . + " (" . join(', ', @keyseq) . ")" . " VALUES(" . join(', ', map { - _sql_interp_data($_); - } values %$item) . ")"; + _sql_interp_data($item->{$_}); + } @keyseq) . ")"; } else { _error_item($idx, \@items); } } @@ -248,7 +254,7 @@ "$key=" . _sql_interp_data($val); } - } keys %$item; + } ($sort_keys ? sort keys %$item : keys %$item); $s = "($s)" if keys %$item > 1; $s = " $s"; $sql .= $s; @@ -350,7 +356,7 @@ my $sql3 = _sql_interp_data($val); $sql3 .= " AS $key" if $is_first_row; $sql3; - } keys %$first_row); + } ($sort_keys ? sort keys %$first_row : keys %$first_row)); } } else { @@ -707,6 +713,30 @@ bind value. Note that a single sql_type holding an aggregate (arrayref or hashref) may generate multiple bind values. +=head1 Sorting hash keys + +When a hash reference is passed into this module, the order of terms in +the interpolated parts of the generated SQL depends on the hash key +enumeration behaviour of the Perl interpreter (see L<perlrun> and +L<perlsec>). This means that although they may be logically equivalent, +the SQL statements generated may have terms in a different order for +otherwise equivalent parameters. + +In particular this can happen if the hashes passed in have had their +elements inserted in a different order. The result can be that prepared +DBI statements are wasted, because in order to be reused the SQL +strings of the statments must be identical. + +In order to avoid this, do the following: + + use SQL::Interp SORT_KEYS => 1; + +When imported this way, the enumeration of keys will be sorted for any +hash references passed in for interpolation. This yields less variable +SQL output and a higher prepared statement cache hit ratio (an +occasional large time saving), but involves more sort calls (a freqent +small cost). + =head1 Enabling debugging output To have the generated SQL and bind variables sent to STDOUT, diff -rubB SQL-Interp-1.06/t/lib.pl SQL-Interp-1.06-sortkeys/t/lib.pl --- SQL-Interp-1.06/t/lib.pl 2007-12-21 04:41:59.000000000 +1030 +++ SQL-Interp-1.06-sortkeys/t/lib.pl 2008-02-17 11:02:42.000000000 +1030 @@ -4,6 +4,7 @@ use strict; use SQL::Interp qw(:all); use Data::Dumper; +use Carp; #our $fake_mysql_dbh = # bless {Driver => {Name => 'mysql'}}, 'DBI::db'; @@ -30,7 +31,7 @@ local $SIG{__WARN__} = sub { warn $_[0] if $_[0] !~ /isn't numeric in numeric eq/; }; - is_deeply(@_) or print STDERR Dumper(@_); + is_deeply(@_) or carp Dumper(@_); } 1 diff -rubB SQL-Interp-1.06/t/sql_interp.t SQL-Interp-1.06-sortkeys/t/sql_interp.t --- SQL-Interp-1.06/t/sql_interp.t 2007-12-21 04:41:59.000000000 +1030 +++ SQL-Interp-1.06-sortkeys/t/sql_interp.t 2008-02-17 20:57:52.000000000 +1030 @@ -3,7 +3,7 @@ use strict; use warnings; use Test::More 'no_plan'; -use SQL::Interp ':all'; +use SQL::Interp SORT_KEYS => 1, ':all'; use Data::Dumper; BEGIN {require 't/lib.pl';} @@ -118,8 +118,8 @@ @{$hi->{values}}], 'INSERT hashref of size > 0'); interp_test(['INSERT INTO mytable', $h2i->{hashref}], - ["INSERT INTO mytable ($h2i->{keys}[0], $h2i->{keys}[1], $h2i->{keys}[2]) " . - "VALUES($h2i->{places}->[0], $h2i->{places}->[1], $h2i->{places}->[2])", + ["INSERT INTO mytable ($h2i->{keys}[1], $h2i->{keys}[0], $h2i->{keys}[2]) " . + "VALUES($h2i->{places}->[1], $h2i->{places}->[0], $h2i->{places}->[2])", @{$h2i->{binds}}], 'INSERT hashref of sql_type + sql()'); interp_test(['INSERT INTO mytable', {one => 1, two => sql(\$x, '*', \$x)}], @@ -175,7 +175,7 @@ 'SET hashref'); interp_test(['UPDATE mytable SET', {one => 1, two => $var2, three => sql('3')}], - ['UPDATE mytable SET three=3, one=?, two= ?', + ['UPDATE mytable SET one=?, three=3, two= ?', [1, sql_type(\1)], [${$var2->{value}}, $var2]], 'SET hashref of sql_type types, sql()'); #FIX--what if size of hash is zero? error? @@ -193,7 +193,7 @@ {x => [1]} ); interp_test(['WHERE', $h2bi->{hashref}], - ["WHERE ($h2bi->{places}[0] AND $h2bi->{places}[1])", @{$h2bi->{binds}}], + ["WHERE ($h2bi->{places}[1] AND $h2bi->{places}[0])", @{$h2bi->{binds}}], 'WHERE hashref sql()'); my $h2ci = make_hash_info( {x => 1, y => undef}, @@ -201,7 +201,7 @@ {x => [1]} ); interp_test(['WHERE', $h2ci->{hashref}], - ["WHERE ($h2ci->{places}[0] AND $h2ci->{places}[1])", @{$h2ci->{binds}}], + ["WHERE ($h2ci->{places}[1] AND $h2ci->{places}[0])", @{$h2ci->{binds}}], 'WHERE hashref of NULL'); # WHERE x= @@ -217,8 +217,8 @@ ['WHERE x= ? AND y= ?', [$x, sql_type(\$x)], [${$var2->{value}}, $var2]], 'WHERE \$x, sql_type typed'); interp_test(['WHERE', {x => $x, y => $var2}, 'AND z=', \$x], - ['WHERE (y= ? AND x=?) AND z= ?', - [${$var2->{value}}, $var2], [$x, sql_type(\$x)], [$x, sql_type(\$x)]], + ['WHERE (x=? AND y= ?) AND z= ?', + [$x, sql_type(\$x)], [${$var2->{value}}, $var2], [$x, sql_type(\$x)]], 'WHERE hashref of \$x, sql_type typed'); my $h5i = make_hash_info( {x => $x, y => [3, $var2]}, @@ -226,10 +226,10 @@ {x => [[$x, sql_type(\$x)]], y => [[3, sql_type(\3)], [${$var2->{value}}, $var2]]} ); interp_test(['WHERE', $h5i->{hashref}], - ["WHERE ($h5i->{places}[0] AND $h5i->{places}[1])", @{$h5i->{binds}}], + ["WHERE ($h5i->{places}[1] AND $h5i->{places}[0])", @{$h5i->{binds}}[2,0,1]], 'WHERE hashref of arrayref of sql_type typed'); interp_test(['WHERE', {x => $x, y => sql('z')}], - ['WHERE (y=z AND x=?)', $x], + ['WHERE (x=? AND y=z)', $x], 'WHERE hashref of \$x, sql()'); # table references

Sun Jun 28 21:17:50 2009 MARKSTOS [...] cpan.org - Taken

Sun Jun 28 21:20:10 2009 MARKSTOS [...] cpan.org - Correspondence added

Thanks for this patch, and sorry it took so long to respond. I'm accepting the patch, and releasing it to tonight, with credit to you. I did motify the patch to be "always on" rather than optional. I can't see any reason *not* to provide this kind of consistency in the SQL generation. Mark

Sun Jun 28 21:20:12 2009 The RT System itself - Status changed from 'new' to 'open'

Sun Jun 28 21:20:14 2009 MARKSTOS [...] cpan.org - Status changed from 'open' to 'resolved'