Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 5472
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: html-parser [...] duffek.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.35
Fixed in: (no value)



Subject: Feature request + patch: return first attribute occurrence instead of last
Hi, This is more a behavior change request than a bug report. When the same HTML attribute is specified multiple times in a single element, Internet Explorer and Mozilla both honor the first occurrence, but HTML::Parser honors the last. For example, if a spammer specifies "<body background=white text=white text=black>random garbage<font color=black>advertisement</font></body>" in an HTML-formatted email message, most Windows users won't see the random garbage, but my Perl-based anti-spam filter will. The attached patch emulates IE/Mozilla behavior by storing the first rather than the last attribute in the hash passed as the "attr" argument to event handlers. Incidentally, I didn't find any mention of this ambiguity in a quick scan of the HTML 4.1 spec. Thanks! Nick Duffek html-parser@duffek.com
diff -r -u -p HTML-Parser-3.35.orig/hparser.c HTML-Parser-3.35/hparser.c --- HTML-Parser-3.35.orig/hparser.c 2003-10-27 16:14:24.000000000 -0500 +++ HTML-Parser-3.35/hparser.c 2004-02-27 14:20:59.000000000 -0500 @@ -414,7 +414,8 @@ report_event(PSTATE* p_state, sv_lower(aTHX_ attrname); if (argcode == ARG_ATTR) { - if (!hv_store_ent(hv, attrname, attrval, 0)) { + if (hv_exists_ent(hv, attrname, 0) || + !hv_store_ent(hv, attrname, attrval, 0)) { SvREFCNT_dec(attrval); } SvREFCNT_dec(attrname);