Subject: | "dot" token being lost in versions 0.10 and above |
Date: | Fri, 11 Apr 2008 15:40:18 -0500 |
To: | <bug-SQL-Tokenizer [...] rt.cpan.org> |
From: | "Charlie Hills" <hillsc [...] ncsoft.com> |
I've been using version 0.09 for about a month or more. I had some unit
tests written that suddenly failed when going from 0.09 to 0.10 (and
0.11). Apparently when using table.* notation in the select statement,
the "." token is getting eaten, which is bad.
For example:
"select * from table"
Discounting whitespace:
This is four tokens in 0.09: "select", "*", "from", "table"
In version 0.10 and above: "select", "*", "from", "table"
So far, so good. :) Now this example:
"select t.* from table t"
This is five tokens in 0.09: "select", "t.*", "from", "table", "t"
In version 0.10 and above: "select", "t", "*", "from", "table", "t"
In my code, when I reassemble the tokens after processing, I end up
with:
"select t* from table t"
It's a wonderfully simple perl module and I'd like to keep up with the
current version. Here's a test script I put together to test the modules
side by side:
#!/usr/bin/perl -w
use strict;
use Tokenizer09;
use Tokenizer11;
use Data::Dumper;
my $query = "select * from table";
my @tokens09 = Tokenizer09->tokenize($query);
my @tokens11 = Tokenizer11->tokenize($query);
print Dumper(\@tokens09);
print Dumper(\@tokens11);
$query = "select t.* from table t";
@tokens09 = Tokenizer09->tokenize($query);
@tokens11 = Tokenizer11->tokenize($query);
print Dumper(\@tokens09);
print Dumper(\@tokens11);
Thanks for your time and effort,
Charlie Hills