Bug #105675 for Mail-VRFY: Suggested Improvements for Mail::VRFY

Subject:	Suggested Improvements for Mail::VRFY
Date:	Sun, 5 Jul 2015 15:38:31 -0700
To:	bug-Mail-VRFY [...] rt.cpan.org
From:	Steve James <4stevejames [...] gmail.com>

I am using Mail::VRFY to check email addresses for a free CMS-style system called FreeToastHost that produces free websites and is used by Toastmasters public speaking clubs. (over 10,000 clubs worldwide are registered in the system) The system also provides email distribution lists and forwarding email lists for the various Toastmasters clubs that use the system. I regularly have encountered junk or outdated email addresses that are in the system. I seem the email issues in our email logs when people send emails to the various email lists. I have long suspected that we have outdated and junk email addresses in the system. I am trying to deploy Mail::VRFY as part of more rigorous email address checking strategy. My experience with it has been mixed, and I have had to make some changes to it and use some workarounds to get it the strategy to work. I am not very steeped in the SMTP stuff, but I have quickly tried to learn it out of necessity during the last year or so. I have a few suggestions for improvements--I would be more than happy to collaborate with you to make said improvements: 1. Noting that you state *"Email address syntax checking does not conform to RFC2822"*, the regular expression used to validate email addresses is flawed and could be improved with little effort. In my case, it would not take an email address with an apostrophe in it (Irish name). Upon further review, I also discovered that the regular expression would allow email addresses starting with a period in the username part. (not valid) I believe the regular expression on this page, http://www.regular-expressions.info/email.html (the one that matches 99%) is a better approach, as it is much closer to what the standard is. I modified the VRFY.pm file I am using to use this regular expression... less headaches with that now. 2. The whole "misbehaving" result seems to be a bit arbitrary and ambiguous. It appears to be a result that your code is interpreting and assigning a result code to rather than being related to a specific SMTP reply code. What if your code is missing some of the possible scenarios/use cases and therefore misinterprets what is going on? Some more context variables (and methods to fetch) are needed that can be returned to the calling code to assess what is going on. (*Not* using a debug mode.) 4. I would like to see a method to fetch the last SMTP reply code. This would help a lot, and would provide additional context when needed, beyond your simple result codes. 5. I would like to be able to specify a maximum number of retries (default 3) for connection attempts. Basically, I know from experience that when establishing a connection, sometimes you need to try more than once. In my email system code, I have a retry loop for sending email, that I progressively double a delay (via sleep command) between connection attempts. This increases the odds that I will get connected, even during heavy system loads--it works really well for us. (We handle over 300,000 emails in a given week.) *Since you do not try to connect more than once, it is possible that the code may misinterpret that result as a non-working email address.* I would be happy to contribute/collaborate on implementing code for any of the above--let me know. Regards, -- *Steve James* *FreeToastHost System Developer* http://support.toastmastersclubs.org *Email:* 4stevejames@gmail.com