Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 22854
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: bfaist [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.26
Fixed in: 3.28



Subject: regex to remove entities not spec compliant
Regex to remove entities is not spec compliant. The patch adds the other possible characters to the matched character set. http://www.w3.org/TR/2006/REC-xml-20060816/#NT-Name I was testing trying to delete all the entities from a file and then the output still had all the entities. my $entity_list = $xTwig->entity_list(); foreach my $entity ($entity_list->list()) { my $ent_name = $entity->name(); $entity_list->delete($ent_name); } open OUTFILE, ">$out_file"; print OUTFILE $xTwig->sprint( Update_DTD => 1 ); close OUTFILE;
Subject: Twig.patch
--- OrigTwig.pm 2006-09-18 14:23:48.000000000 -0400 +++ Twig.pm 2006-11-06 16:55:02.636673500 -0500 @@ -2591,7 +2591,7 @@ sub prolog # awfull hack, but at least it works a little better that what was there before if( $internal) { # remove entity declarations (they will be re-generated from the updated entity list) - $internal=~ s{<! \s* ENTITY \s+ \w+ \s+ ( ("[^"]*"|'[^']*') \s* | SYSTEM [^>]*) >\s*}{}xg; + $internal=~ s{<! \s* ENTITY \s+ [\w\.\-\:]+ \s+ ( ("[^"]*"|'[^']*') \s* | SYSTEM [^>]*) >\s*}{}xg; $internal=~ s{^\n}{}; } $internal .= $t->entity_list->text ||'' if( $t->entity_list);
Here is some same XML. I left out the content but here is the doctype/entities. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE maintwp PUBLIC "USA-DOD//DTD MAINTWP MIL-STD-2361 TM Assembly REV 3.12 20050127//EN" "maintwp.dtd" [ <!ENTITY obj.7 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0426A.bmp" NDATA Show quoted text
bmp>
<!ENTITY obj.8 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0426B.bmp" NDATA Show quoted text
bmp>
<!ENTITY obj.9 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0427A.bmp" NDATA Show quoted text
bmp>
<!ENTITY obj.10 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0428A.bmp" NDATA bmp> <!ENTITY obj.11 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0428B.bmp" NDATA bmp> <!ENTITY obj.12 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0429A.bmp" NDATA bmp> <!ENTITY obj.13 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0429B.bmp" NDATA bmp> <!ENTITY obj.14 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0429C.bmp" NDATA bmp> <!ENTITY obj.15 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0430A.bmp" NDATA bmp> <!ENTITY obj.5 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0416A.bmp" NDATA Show quoted text
bmp>
<!ENTITY obj.6 SYSTEM "file://U:/EMS2/HEMTT/GRAPHICS/04/0425A.bmp" NDATA Show quoted text
bmp>
]>
Subject: Re: [rt.cpan.org #22854] regex to remove entities not spec compliant
Date: Wed, 08 Nov 2006 15:00:41 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: Michel Rodriguez <mirod [...] xmltwig.com>
via RT wrote: Show quoted text
> Mon Nov 06 17:03:30 2006: Request 22854 was acted upon. > Transaction: Ticket created by BFAIST > Queue: XML-Twig > Subject: regex to remove entities not spec compliant > Broken in: 3.26 > Severity: Important > Owner: Nobody > Requestors: BFAIST@cpan.org > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=22854 > > > > Regex to remove entities is not spec compliant. The patch adds the > other possible characters to the matched character set. > > http://www.w3.org/TR/2006/REC-xml-20060816/#NT-Name > > I was testing trying to delete all the entities from a file and then the > output still had all the entities. > > my $entity_list = $xTwig->entity_list(); > > foreach my $entity ($entity_list->list()) { > my $ent_name = $entity->name(); > > $entity_list->delete($ent_name); > } > > open OUTFILE, ">$out_file"; > print OUTFILE $xTwig->sprint( Update_DTD => 1 ); > close OUTFILE;
OK, it makes sense, good catch. I ended up using $REG_NAME to match the entity name. It is not completely rigourous, but the problem is that it is hard to specify a match on the full range of possible characters in pre-5.8.1 perl. The development version at http://xmltwig.com/xmltwig/ has the fix. Thanks -- mirod