Skip Menu |

This queue is for tickets about the SQL-Tokenizer CPAN distribution.

Report information
The Basics
Id: 27797
Status: resolved
Priority: 0/
Queue: SQL-Tokenizer

People
Owner: izut [...] cpan.org
Requestors: jamesml [...] planetolsen.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.07
Fixed in: 0.08



Subject: Fails to tokenize SQL segments containing empty strings (single quotes with nothing inbetween)
Thank you for a handy module! It seems like using empty strings (two single-quote marks with nothing inbetween) seems to cause a hiccup in the parser and it fails to properly tokenize the string. The following portion of a SQL statement does not get processed into tokens properly: nvl(reward_type,'')='' and group_code = 'XXXX' It gets incorrectly processed into (notice the 4th line, where there are several SQL components all inside of the same "token"): '(', 'reward_type', ',', '\'\')=\'\' and group_code = \'', 'XXXX\'', But if I try to parse this: nvl(reward_type,'x')='x' and group_code = 'XXXX' It gets properly parsed into: 'nvl', '(', 'reward_type', ',', '\'x\'', ')', '=', '\'x\'', ' ', 'and', ' ', 'group_code', ' ', '=', ' ', '\'XXXX\'', I've looked at the regex, and it'll take me some digesting to figure it out so I thought I might throw a bug report your way in case you might be able to figure it out more quickly. Thanks again.. --James Other info: SQL::Tokenizer 0.07 Perl 5.8.6 SunOS 5.10 Generic sun4u sparc SUNW,Sun-Fire-V440
From: igor.sutton [...] gmail.com
On Thu Jun 28 10:51:37 2007, samej71 wrote: Show quoted text
> Thank you for a handy module! > > It seems like using empty strings (two single-quote marks with nothing > inbetween) seems to cause a hiccup in the parser and it fails to > properly tokenize the string. > > The following portion of a SQL statement does not get processed into > tokens properly: > > nvl(reward_type,'')='' and group_code = 'XXXX' > > It gets incorrectly processed into (notice the 4th line, where there are > several SQL components all inside of the same "token"): > > '(', > 'reward_type', > ',', > '\'\')=\'\' and group_code = \'', > 'XXXX\'', > > But if I try to parse this: > > nvl(reward_type,'x')='x' and group_code = 'XXXX' > > It gets properly parsed into: > > 'nvl', > '(', > 'reward_type', > ',', > '\'x\'', > ')', > '=', > '\'x\'', > ' ', > 'and', > ' ', > 'group_code', > ' ', > '=', > ' ', > '\'XXXX\'', > > I've looked at the regex, and it'll take me some digesting to figure it > out so I thought I might throw a bug report your way in case you might > be able to figure it out more quickly. > > Thanks again.. > > --James > > Other info: > SQL::Tokenizer 0.07 > Perl 5.8.6 > SunOS 5.10 Generic sun4u sparc SUNW,Sun-Fire-V440
I'll be adding a test for it. Thanks for your response.
From: jamesml [...] planetolsen.com
On Thu Jun 28 10:51:37 2007, samej71 wrote: Show quoted text
> I've looked at the regex, and it'll take me some digesting to figure it > out so I thought I might throw a bug report your way in case you might > be able to figure it out more quickly.
It seems like it was much more simple to fix this than I originally thought. I believe the following simple addition to the regex may fix it: | '' # empty single quoted string | "" # empty double quoted string This segment needs to be added before the other quoted string checks. I inserted them at line 94 to give me: my @query = $query =~ m{ ( (?:>=|<=|==) # >=, <= and == operators | [\(\),=;] # punctuation (parenthesis, comma) | '' # empty single quoted string | "" # empty double quoted string | ".*?(?:(?:""){1,}"|(?<!["\\])"(?!")|\\"{2}) # anything inside double quotes, ungreedy | '.*?(?:(?:''){1,}'|(?<!['\\])'(?!')|\\'{2}) # anything inside single quotes, ungreedy. | --[\ \t\S]* # comments | /\*[\ \t\n\S]*?\*/ # C style comments | [^\s\(\),=;]+ # everything that doesn't matches with above | \n # newline | [\t\ ]+ # any kind of white spaces ) }smxgo;
Subject: Re: [rt.cpan.org #27797] Fails to tokenize SQL segments containing empty strings (single quotes with nothing inbetween)
Date: Thu, 28 Jun 2007 12:53:54 -0300
To: bug-SQL-Tokenizer [...] rt.cpan.org
From: Igor Sutton Lopes <igor.sutton [...] gmail.com>
On Jun 28, 2007, at 12:09 PM, James via RT wrote: Show quoted text
> > Queue: SQL-Tokenizer > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=27797 > > > On Thu Jun 28 10:51:37 2007, samej71 wrote:
>> I've looked at the regex, and it'll take me some digesting to >> figure it >> out so I thought I might throw a bug report your way in case you >> might >> be able to figure it out more quickly.
> > It seems like it was much more simple to fix this than I originally > thought. I believe the following simple addition to the regex may > fix it: > > | > '' # empty single quoted string > | > "" # empty double quoted string > > This segment needs to be added before the other quoted string > checks. I > inserted them at line 94 to give me: > > my @query = $query =~ m{ > ( > (?:>=|<=|==) # >=, <= and == operators > | > [\(\),=;] # punctuation (parenthesis, comma) > | > '' # empty single quoted string > | > "" # empty double quoted string > | > ".*?(?:(?:""){1,}"|(?<!["\\])"(?!")|\\"{2}) > # anything inside double quotes, ungreedy > | > '.*?(?:(?:''){1,}'|(?<!['\\])'(?!')|\\'{2}) > # anything inside single quotes, ungreedy. > | > --[\ \t\S]* # comments > | > /\*[\ \t\n\S]*?\*/ # C style comments > | > [^\s\(\),=;]+ # everything that doesn't matches with above > | > \n # newline > | > [\t\ ]+ # any kind of white spaces > ) > }smxgo; >
Unfortunately it breaks the other kind of quotation, like '''word''' - since it is a valid quotation scheme on some databases. I'm checking some way to provide this correctly.
Download PGP.sig
application/pgp-signature 186b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #27797] Fails to tokenize SQL segments containing empty strings (single quotes with nothing inbetween)
Date: Thu, 28 Jun 2007 13:19:52 -0300
To: bug-SQL-Tokenizer [...] rt.cpan.org
From: Igor Sutton Lopes <igor.sutton [...] gmail.com>
On Jun 28, 2007, at 12:09 PM, James via RT wrote: Show quoted text
> This segment needs to be added before the other quoted string > checks. I > inserted them at line 94 to give me: > > my @query = $query =~ m{ > ( > (?:>=|<=|==) # >=, <= and == operators > | > [\(\),=;] # punctuation (parenthesis, comma) > | > '' # empty single quoted string > | > "" # empty double quoted string > | > ".*?(?:(?:""){1,}"|(?<!["\\])"(?!")|\\"{2}) > # anything inside double quotes, ungreedy > | > '.*?(?:(?:''){1,}'|(?<!['\\])'(?!')|\\'{2}) > # anything inside single quotes, ungreedy. > | > --[\ \t\S]* # comments > | > /\*[\ \t\n\S]*?\*/ # C style comments > | > [^\s\(\),=;]+ # everything that doesn't matches with above > | > \n # newline > | > [\t\ ]+ # any kind of white spaces > ) > }smxgo;
I figured out what needed, and now I'm uploading the new version - 0.08. At this moment, PAUSE is indexing the new file and it is supposed to be finished soon.
Download PGP.sig
application/pgp-signature 186b

Message body not shown because it is not plain text.

Modified regex to solve issue with single and doubled quotes empty strings.