Skip Menu |

This queue is for tickets about the HTML-HTML5-Entities CPAN distribution.

Report information
The Basics
Id: 97659
Status: resolved
Priority: 0/
Queue: HTML-HTML5-Entities

People
Owner: Nobody in particular
Requestors: pagenyon [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: doesn't actually work for most characters
test case attached demonstrating encoding and decoding don't work for majority of characters: # Looks like you failed 3958 tests of 4250.
Subject: a.pl
#!/usr/bin/env perl use strict; use warnings; use HTML::HTML5::Entities; use Test::More; while (my ($ent, $chr) = each %HTML::HTML5::Entities::entity2char) { next unless ';' eq substr $ent, -1, 1; $ent = "&$ent"; is decode_entities($ent), $chr, "decoding entity"; is encode_entities($chr), $ent, "encoding character"; } done_testing;
From: pagenyon [...] gmail.com
On Thu Jul 31 17:50:15 2014, pagenyon wrote: Show quoted text
> test case attached demonstrating encoding and decoding don't work for > majority of characters: > > # Looks like you failed 3958 tests of 4250.
So the encoding is working, it's just returning numeric entities instead of the named entities. But the decoding is really broken. I attached a script that shows that the decode_entities from HTML::Entities can properly replace the entities using your %entity2char hash, whereas the decode_entities from your module doesn't have any effect.
Subject: b.pl
#!/usr/bin/env perl use strict; use warnings; use HTML::Entities; use HTML::HTML5::Entities (); use Test::More; *HTML::Entities::entity2char = \ %HTML::HTML5::Entities::entity2char; while (my ($ent, $chr) = each %HTML::HTML5::Entities::entity2char) { next unless ';' eq substr $ent, -1, 1; $ent = "&$ent"; is decode_entities($ent), $chr, "decoding entity"; } done_testing;
From: pagenyon [...] gmail.com
On Sat Aug 02 02:13:26 2014, pagenyon wrote: Show quoted text
> On Thu Jul 31 17:50:15 2014, pagenyon wrote:
> > test case attached demonstrating encoding and decoding don't work for > > majority of characters: > > > > # Looks like you failed 3958 tests of 4250.
> > So the encoding is working, it's just returning numeric entities > instead of the named entities. > > But the decoding is really broken. I attached a script that shows that > the decode_entities from HTML::Entities can properly replace the > entities using your %entity2char hash, whereas the decode_entities > from your module doesn't have any effect.
The problem is in your regular expression. I have a patch for decode_entities, I didn't look at _decode_entities: --- lib/HTML/HTML5/Entities.pm 2012-06-26 20:35:25.000000000 +0000 +++ /tmp/Entities.pm 2014-08-10 23:44:27.000000000 +0000 @@ -2526,7 +2526,7 @@ for (@$array) { s/ - (& + &( (?: \#(\d+) | \#[xX]([0-9a-fA-F]+) | (\w+) ) @@ -2538,7 +2538,7 @@ elsif (defined $3) { chr(hex $3); } else - { $entity2char{$4} || $1; } + { $entity2char{"$4;"} || $1; } /xeg; }
From: pagenyon [...] gmail.com
On Sun Aug 10 19:51:39 2014, pagenyon wrote: this { $entity2char{"$4;"} || $1; } should actually be this: { $entity2char{"$4;"} || "&$1"; }
Fixed in 0.004 I think.