Skip Menu |

This queue is for tickets about the Regexp-Common CPAN distribution.

Report information
The Basics
Id: 68718
Status: rejected
Priority: 0/
Queue: Regexp-Common

People
Owner: Nobody in particular
Requestors: cpan [...] barely3am.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2011041701
Fixed in: (no value)



Subject: url obfuscation
http://www.symantec.com/connect/blogs/dotted-decimal-url-obfuscation Below are some of the IP address numeral system obfuscation techniques Symantec has observed of spammers. (All of the samples below are just different numeral representations of the IP address for Symantec.com) An IP address converted to hexadecimal format. (Hexadecimal is a base-16 numeral system.) http://0xD80C9114 An IP address converted to dotted hexadecimal format. http://0xD8.0x0C.0x91.0x14 An IP address converted to dotted octal format. (Octal is a base-8 numeral system.) http://0330.0014.0221.0024 A combination of Hexadecimal and Octal http://0xd8.000000014.0x9114 {{{ #!/usr/bin/perl -w use Regexp::Common qw/URI/; use strict; my @array = ( 'http://0xD80C9114', 'http://0xD8.0x0C.0x91.0x14', 'http://0330.0014.0221.0024', 'http://0xd8.000000014.0x9114' ); foreach (@array){ if(/^$RE{'URI'}/){ warn 'valid url: '.$_; } else { warn 'invalid url: '.$_; } } }}} $ perl url.pl invalid url: http://0xD80C9114 at url.pl line 13. invalid url: http://0xD8.0x0C.0x91.0x14 at url.pl line 13. valid url: http://0330.0014.0221.0024 at url.pl line 11. invalid url: http://0xd8.000000014.0x9114 at url.pl line 13. I haven't looked much further than that. just an fyi.
CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #68718] url obfuscation
Date: Thu, 9 Jun 2011 15:45:16 +0200
To: Wes Young via RT <bug-Regexp-Common [...] rt.cpan.org>
From: Abigail <abigail [...] abigail.be>
On Thu, Jun 09, 2011 at 09:34:30AM -0400, Wes Young via RT wrote: Show quoted text
> Thu Jun 09 09:34:27 2011: Request 68718 was acted upon. > Transaction: Ticket created by SAXJAZMAN > Queue: Regexp-Common > Subject: url obfuscation > Broken in: 2011041701 > Severity: Normal > Owner: Nobody > Requestors: cpan@barely3am.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=68718 > > > > http://www.symantec.com/connect/blogs/dotted-decimal-url-obfuscation > > Below are some of the IP address numeral system obfuscation techniques Symantec has observed of spammers. (All of the samples below are just different numeral representations of the IP > address for Symantec.com) > > An IP address converted to hexadecimal format. (Hexadecimal is a base-16 numeral system.) > > http://0xD80C9114 > An IP address converted to dotted hexadecimal format. > > http://0xD8.0x0C.0x91.0x14 > An IP address converted to dotted octal format. (Octal is a base-8 numeral system.) > > http://0330.0014.0221.0024 > A combination of Hexadecimal and Octal > > http://0xd8.000000014.0x9114 > > {{{ > > #!/usr/bin/perl -w > > use Regexp::Common qw/URI/; > > use strict; > > my @array = ( > 'http://0xD80C9114', > 'http://0xD8.0x0C.0x91.0x14', > 'http://0330.0014.0221.0024', > 'http://0xd8.000000014.0x9114' > ); > > foreach (@array){ > if(/^$RE{'URI'}/){ > warn 'valid url: '.$_; > } else { > warn 'invalid url: '.$_; > } > } > }}} > > > $ perl url.pl > invalid url: http://0xD80C9114 at url.pl line 13. > invalid url: http://0xD8.0x0C.0x91.0x14 at url.pl line 13. > valid url: http://0330.0014.0221.0024 at url.pl line 11. > invalid url: http://0xd8.000000014.0x9114 at url.pl line 13. >
The regexp isn't designed to exactly match what a particular browser or tool accepts as a URL, and to catch anything it manages to convert into an IP address. That browsers are liberal in what they accept in their navigation bar, or inside a "src" or "href" attribute is a (valuable!) service to the user. But it would be very hard to reengineer such a pattern - and you'd need different patterns for different browser. The pattern matches what the various RFC says is valid. Anything that cannot be derived from the BNF for HTTP urls will be rejected. Regards, Abigail