Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the URI-Find CPAN distribution.

Report information
The Basics
Id: 629
Status: resolved
Priority: 0/
Queue: URI-Find

People
Owner: ROSCH [...] cpan.org
Requestors: davh [...] davh.dk
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.12
Fixed in: 0.13



Subject: May find false hits
I have experienced several things that should not have been validated as a correct URI. eg. 'www.marselisl' is found in some of my text as a url, so is 'www.info@skive-hallerne.dk'
[guest - Tue May 21 14:05:02 2002]: Show quoted text
> I have experienced several things that should not have been validated > as a correct URI. eg. 'www.marselisl' is found in some of my text as a > url, so is 'www.info@skive-hallerne.dk'
Both are valid schemeless URLs. URI::Find picks up schemeless URIs (www and ftp) but the heuristic could be tuned. Better yet, it should probably be removed from URI::Find since we have URI::Find::Schemeless now which does a much better job (the schemeless code in URI::Find predates URI::Find::Schemeless). URI::Find has always strived to be a "perfect" URI finder. Sloppiness should go into URI::Find::Schemeless.
I've uploaded a new version of URI::Find, 0.13. URI::Find itself will no longer pull out any schemeless URIs, which will fix the problem you reported (false hits for strings which contain www.*). Roderick Schertler