On Fri Aug 22 15:38:23 2008, MSCHWERN wrote:
Show quoted text> This is due to the "decrufting" process which prevents probable
> punctuation from being considered part of the URL.
>
> One possible way to make that process smarter is to scan before the URI
> for a matching delimiter. For example, if a URL ends with a ) it would
> scan before the URL for a ( before it hits another ). If it finds one,
> it's probably inside a () and can strip the ). Otherwise the ) is
> probably part of the URL and it can leave it.
>
> This will only work with ] and ). ' and " will probably generate too
> many false positives.
Yeah that's basically how you have to do it, see if there are any
existing open parens inside the url and if so try to match them up.
I've always been quite happy with how gnus in Emacs does it, for
reference here's a patch I sent to rcirc.el to emacs-devel because they
were having the same problem. It does just what you suggest and matches
up parens if they're open already:
----
From: avar@cpan.org (Ævar Arnfjörð Bjarmason)
Subject: [PATCH] Make rcirc.el rcirc-url-regexp use the
gnus-button-url-regexp regexp
Newsgroups: gmane.emacs.devel
To: emacs-devel@gnu.org
Date: Sun, 23 Dec 2007 03:03:09 +0000
I was having an issue with rcirc including at the end of URIs. I fixed
it by using the regex gnus uses.
Perhaps it's better to amend the old one.
Index: net/rcirc.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/net/rcirc.el,v
retrieving revision 1.40
diff -u -r1.40 rcirc.el
--- net/rcirc.el 1 Nov 2007 03:51:47 -0000 1.40
+++ net/rcirc.el 23 Dec 2007 02:36:07 -0000
@@ -2121,24 +2121,26 @@
(rcirc-add-face 0 (length string) face string)
string))
+;; The regexp is copied from gnus-button-url-regexp in gnus-art.el
(defvar rcirc-url-regexp
- (rx-to-string
- `(and word-boundary
- (or (and
- (or (and (or "http" "https" "ftp" "file" "gopher" "news"
- "telnet" "wais" "mailto")
- "://")
- "www.")
- (1+ (char "-a-zA-Z0-9_."))
- (1+ (char "-a-zA-Z0-9_"))
- (optional ":" (1+ (char "0-9"))))
- (and (1+ (char "-a-zA-Z0-9_."))
- (or ".com" ".net" ".org")
- word-boundary))
- (optional
- (and "/"
- (1+ (char "-a-zA-Z0-9_='!?#$\@~`%&*+|\\/:;.,{}[]()"))
- (char "-a-zA-Z0-9_=#$\@~`%&*+|\\/:;{}[]()")))))
+ (concat
+ "\\b\\(\\(www\\.\\|\\(s?https?\\|ftp\\|file\\|gopher\\|"
+ "nntp\\|news\\|telnet\\|wais\\|mailto\\|info\\):\\)"
+ "\\(//[-a-z0-9_.]+:[0-9]*\\)?"
+ (if (string-match "[[:digit:]]" "1") ;; Support POSIX?
+ (let ((chars "-a-z0-9_=#$@~%&*+\\/[:word:]")
+ (punct "!?:;.,"))
+ (concat
+ "\\(?:"
+ ;; Match paired parentheses, e.g. in Wikipedia URLs:
+ "[" chars punct "]+" "(" "[" chars punct "]+" "[" chars "]*)" "["
chars "]"
+ "\\|"
+ "[" chars punct "]+" "[" chars "]"
+ "\\)"))
+ (concat ;; XEmacs 21.4 doesn't support POSIX.
+ "\\([-a-z0-9_=!?#$@~%&*+\\/:;.,]\\|\\w\\)+"
+ "\\([-a-z0-9_=#$@~%&*+\\/]\\|\\w\\)"))
+ "\\)")
"Regexp matching URLs. Set to nil to disable URL features in rcirc.")
(defun rcirc-browse-url (&optional arg)
Show quoted text