Subject: | Intermittent overkill after <> |
Date: | Tue, 15 Aug 2006 21:13:12 -0400 |
To: | <bug-HTML-Strip [...] rt.cpan.org> |
From: | "Bellaire,Adam P" <bellaire [...] ad.ufl.edu> |
Hi there, I've been using HTML::Strip for some time now, and it's great. However, I've recently found a problem that seems to be caused by the presence of a single <> in the text to be stripped, and what's more, it only happens intermittently.
I'm using HTML::Strip to remove tags from a series of chunks of text, all of the form:
From: <>
a. Title: some text
b. Etc..
These are emails that are submitted through a simple HTML form. There are no real HTML tags in them, just the single lonely <> sequence. When I view the stripped version of the text, some of the emails are intact (sans <>), and others have only the From: line, and nothing else that was to follow the <>. What's more, from run to run different emails will be affected by this apparent bug. That is, when I view a set of 10 emails, on one run six of them will be truncated, and the others will be fine. On another run, only two will be truncated, and the others fine, and there seems to be no correlation between the content of the text and when this bug will appear.
I've worked around the problem by stripping the character sequence <> using a perl regex and then handing the result to HTML::Strip, and this solves the problem completely. But I'd much rather not have to use the regex, and I thought I should report this bug in case anyone else might be affected.
Thanks again for this terrific module!