Skip Menu |

This queue is for tickets about the SQL-Tokenizer CPAN distribution.

Report information
The Basics
Id: 70774
Status: new
Priority: 0/
Queue: SQL-Tokenizer

People
Owner: Nobody in particular
Requestors: alf.lacis [...] logica.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.24
Fixed in: (no value)



Subject: Colons are a valid part of a combined field name
Colons are being seen as a separator, not a valid part of a field name. For Example: select IOGroup:GPElement.Address, Number, IOGroup:GPElement.GPElemType:Type, Point:Equip:Location:Name, Point:Equip:Plant:Name, Point:Equip:Equip.Desig, Point:Quantity:Name, Point:Equip:Equip.Description, Point from Chan where 'IOGroup:DACUnit:Name='\''ST32'\''' The field IOGroup:GPElement.GPElemType:Type is wrongly broken into two tokens: IOGroup:GPElement.GPElemType :Type I'm using SQL:Tokenizer to add format "%s|" to each field, and yes, the last field needs a trailing | too. The error occurs for tokens jj=6 and jj=7 below. ODDLY enough, token 17 is parsed correctly. Here's debug output from my script: (note it stops processing fields after the last from, and yes, it will handle recursion, too, though this example is not recursive.) query was 'select IOGroup:GPElement.Address,Number,IOGroup:GPElement.GPElemType:Type,Point:Equip:Location:Name,Point:Equip:Plant:Name,Point:Equip:Equi p.Desig,Point:Quantity:Name,Point:Equip:Equip.Description,Point from Chan where IOGroup:DACUnit:Name='ST32';' jj=0 selectCounter=0 token is 'select' jj=1 selectCounter=1 token is ' ' jj=2 selectCounter=1 token is 'IOGroup:GPElement.Address' jj=2 selectCounter=1 token now 'IOGroup:GPElement.Address format "%s|"' jj=3 selectCounter=1 token is ',' jj=4 selectCounter=1 token is 'Number' jj=4 selectCounter=1 token now 'Number format "%s|"' jj=5 selectCounter=1 token is ',' jj=6 selectCounter=1 token is 'IOGroup:GPElement.GPElemType' jj=6 selectCounter=1 token now 'IOGroup:GPElement.GPElemType format "%s|"' jj=7 selectCounter=1 token is ':Type' jj=7 selectCounter=1 token now ':Type format "%s|"' jj=8 selectCounter=1 token is ',' jj=9 selectCounter=1 token is 'Point:Equip:Location:Name' jj=9 selectCounter=1 token now 'Point:Equip:Location:Name format "%s|"' jj=10 selectCounter=1 token is ',' jj=11 selectCounter=1 token is 'Point:Equip:Plant:Name' jj=11 selectCounter=1 token now 'Point:Equip:Plant:Name format "%s|"' jj=12 selectCounter=1 token is ',' jj=13 selectCounter=1 token is 'Point:Equip:Equip.Desig' jj=13 selectCounter=1 token now 'Point:Equip:Equip.Desig format "%s|"' jj=14 selectCounter=1 token is ',' jj=15 selectCounter=1 token is 'Point:Quantity:Name' jj=15 selectCounter=1 token now 'Point:Quantity:Name format "%s|"' jj=16 selectCounter=1 token is ',' jj=17 selectCounter=1 token is 'Point:Equip:Equip.Description' jj=17 selectCounter=1 token now 'Point:Equip:Equip.Description format "%s|"' jj=18 selectCounter=1 token is ',' jj=19 selectCounter=1 token is 'Point' jj=19 selectCounter=1 token now 'Point format "%s|"' jj=20 selectCounter=1 token is ' ' jj=21 selectCounter=1 token is 'from' query now 'select IOGroup:GPElement.Address format "%s|",Number format "%s|",IOGroup:GPElement.GPElemType format "%s|":Type format "%s|",Point:Equip:Location:Name format "%s|",Point:Equip:Plant:Name format "%s|",Point:Equip:Equip.Desig format "%s|",Point:Quantity:Name format "%s|",Point:Equip:Equip.Description format "%s|",Point format "%s|" from Chan where IOGroup:DACUnit:Name='ST32';'
From: alf.lacis [...] logica.com
On Wed Sep 07 02:37:42 2011, alflacis wrote: Modified Tokenizer.pm is attached for your distribution.
Subject: Tokenizer.pm
package SQL::Tokenizer; use warnings; use strict; use 5.006002; use Exporter; our @ISA = qw(Exporter); our @EXPORT_OK= qw(tokenize_sql); our $VERSION= '0.24'; my $re= qr{ ( (?:--|\#)[\ \t\S]* # single line comments | (?:<>|<=>|>=|<=|==|=|!=|!|<<|>>|<|>|\|\||\||&&|&|-|\+|\*(?!/)|/(?!\*)|\%|~|\^|\?) # operators and tests | [\[\]\(\),;.] # punctuation (parenthesis, comma) | \'\'(?!\') # empty single quoted string | \"\"(?!\"") # empty double quoted string | "(?>(?:(?>[^"\\]+)|""|\\.)*)+" # anything inside double quotes, ungreedy | `(?>(?:(?>[^`\\]+)|``|\\.)*)+` # anything inside backticks quotes, ungreedy | '(?>(?:(?>[^'\\]+)|''|\\.)*)+' # anything inside single quotes, ungreedy. | /\*[\ \t\r\n\S]*?\*/ # C style comments | (?:[\w:@]+(?:[\.\:](?:\w+|\*)?)*) # (?:[\w:@]+(?:\.(?:\w+|\*)?)*) << old # ?? another literal? # ?? # ) end group # \* literal? * # or # another word character, 1 or more # colon # (? another group? # \. literal . # is this a literal colon? # (? another group? # [set of \w=a word character, : or @], 1 or more # is this a literal colon? # (? something to do with groups? # words, standard named placeholders, db.table.*, db.* | (?: \$_\$ | \$\d+ | \${1,2} ) # dollar expressions - eg $_$ $3 $$ | \n # newline | [\t\ ]+ # any kind of white spaces ) }smx; sub tokenize_sql { my ( $query, $remove_white_tokens )= @_; my @query= $query =~ m{$re}smxg; if ($remove_white_tokens) { @query= grep( !/^[\s\n\r]*$/, @query ); } return wantarray ? @query : \@query; } sub tokenize { my $class= shift; return tokenize_sql(@_); } 1; =pod =head1 NAME SQL::Tokenizer - A simple SQL tokenizer. =head1 VERSION 0.20 =head1 SYNOPSIS use SQL::Tokenizer qw(tokenize_sql); my $query= q{SELECT 1 + 1}; my @tokens= SQL::Tokenizer->tokenize($query); # @tokens now contains ('SELECT', ' ', '1', ' ', '+', ' ', '1') @tokens= tokenize_sql($query); # procedural interface =head1 DESCRIPTION SQL::Tokenizer is a simple tokenizer for SQL queries. It does not claim to be a parser or query verifier. It just creates sane tokens from a valid SQL query. It supports SQL with comments like: -- This query is used to insert a message into -- logs table INSERT INTO log (application, message) VALUES (?, ?) Also supports C<''>, C<""> and C<\'> escaping methods, so tokenizing queries like the one below should not be a problem: INSERT INTO log (application, message) VALUES ('myapp', 'Hey, this is a ''single quoted string''!') =head1 API =over 4 =item tokenize_sql use SQL::Tokenizer qw(tokenize_sql); my @tokens= tokenize_sql($query); my $tokens= tokenize_sql($query); $tokens= tokenize_sql( $query, $remove_white_tokens ); C<tokenize_sql> can be imported to current namespace on request. It receives a SQL query, and returns an array of tokens if called in list context, or an arrayref if called in scalar context. =item tokenize my @tokens= SQL::Tokenizer->tokenize($query); my $tokens= SQL::Tokenizer->tokenize($query); $tokens= SQL::Tokenizer->tokenize( $query, $remove_white_tokens ); This is the only available class method. It receives a SQL query, and returns an array of tokens if called in list context, or an arrayref if called in scalar context. If C<$remove_white_tokens> is true, white spaces only tokens will be removed from result. =back =head1 ACKNOWLEDGEMENTS =over 4 =item Evan Harris, for implementing Shell comment style and SQL operators. =item Charlie Hills, for spotting a lot of important issues I haven't thought. =item Jonas Kramer, for fixing MySQL quoted strings and treating dot as punctuation character correctly. =item Emanuele Zeppieri, for asking to fix SQL::Tokenizer to support dollars as well. =item Nigel Metheringham, for extending the dollar signal support. =item Devin Withers, for making it not choke on CR+LF in comments. =item Luc Lanthier, for simplifying the regex and make it not choke on backslashes. =back =head1 AUTHOR Copyright (c) 2007, 2008, 2009, 2010, 2011 Igor Sutton Lopes "<IZUT@cpan.org>". All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut