Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the URI-Find CPAN distribution.

Report information
The Basics
Id: 57032
Status: resolved
Priority: 0/
Queue: URI-Find

People
Owner: Nobody in particular
Requestors: avar [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 20100211
Fixed in: (no value)



Subject: Doesn't find a valid URI following a "\w.http", e.g. "club...http://bit.ly/9QfKVL"
While working on Bot::Twatterhose I noticed that URI::Find fails on ~0.5% of URLs posted on Twitter (16/2583 in my tests from /public_timeline). This is because it doesn't grok $schema://[..] directly following e.g. "foo...", i.e. "foo...http://x.org". Here's a real world example of a Twitter user expressing concern over what he perceives to be an effeminate dance act: #letsbeclear this dance is super gay & I bet not ever see a n*gga doin it in the club...http://bit.ly/9QfKVL Another example: The technology of magnetic energy has become so powerful an entire house can...http://bit.ly/8yEdeb Due to this serious bug I've been missing out on the latest dance moves, and I've apparently been paying too much for my energy.
On Thu Apr 29 05:15:33 2010, AVAR wrote: Show quoted text
> While working on Bot::Twatterhose I noticed that URI::Find fails on > ~0.5% of URLs posted on Twitter (16/2583 in my tests from > /public_timeline). This is because it doesn't grok $schema://[..] > directly following e.g. "foo...", i.e. "foo...http://x.org".
The issue is that "foo...http" is a validly formatted scheme, so URI::Find picks up the whole of "foo...http://x.org" but then rejects it because its not a recognized scheme. Since nobody actually uses non-alphanumerics in schemes, I'll just remove them from the regex. The fix will be in the next release shortly.