Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the URI-Find CPAN distribution.

Report information
The Basics
Id: 20483
Status: resolved
Priority: 0/
Queue: URI-Find

People
Owner: Nobody in particular
Requestors: hiranotaka [...] zng.info
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: extra whitespace should be ignored
RFC 3986 Appendix C, which obsoletes RFC 2396 Appendix E, suggests "extra whitespace (spaces, line-breaks, tabs, etc.) may have to be added to break a long URI across lines. The whitespace should be ignored when the URI is extracted." URI::Find doesn't accomidate this suggestion with any option.
Done. There are certain bits of Appendix C I didn't try to implement. This is one: No whitespace should be introduced after a hyphen ("-") character. Because some typesetters and printers may (erroneously) introduce a hyphen at the end of line when breaking it, the interpreter of a URI containing a line break immediately after a hyphen should ignore all whitespace around the line break and should be aware that the hyphen may or may not actually be part of the URI. I'm not sure how URI::Find is to figure out that the hyphen is or is not part of the URI. Best guess would probably be not. There's no specific code to handle URIs inside double quotes. The code's for handling whitespace folding is messy right now and I didn't want to screw it up further. I'll open another ticket for that.
The fix for this ticket broken finding URLs inside of HTML text. For instance: a url that looks like this: <a style="color:#ffffff" href="http://plusthree.com/">P3</a> Comes out looking like this: astyle="color:#ffffff"href="http://plusthree.com/" which is completely unusable. If this problem is to be fixed, it should probably only strip the whitespace between <...> when there's nothing there but the URL and the whitespace.
Looks like your patch fixed it fine.