Skip Menu |

This queue is for tickets about the XML-SemanticDiff CPAN distribution.

Report information
The Basics
Id: 18491
Status: resolved
Priority: 0/
Queue: XML-SemanticDiff

People
Owner: SHLOMIF [...] cpan.org
Requestors: cpan [...] clotho.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.95
Fixed in: (no value)



Subject: Same tag name in different locations causes spurious change
Attached below are two variations of the same XML file. Both have <Description> tags at the root level and also one level deeper, inside the <TimeZone> tag. They should be considered identical, but this short example program erroneously reports that they are different. use XML::SemanticDiff; my $diff = XML::SemanticDiff->new(); foreach my $change ($diff->compare('derived.xml', 'orig.xml')) { print "$change->{message}\n"; } % perl compare.pl Child element 'Description' missing from element '/LocalPresentationManifest[1]/Properties[1]/TimeZone[1]'. Rogue element 'Description' in element '/LocalPresentationManifest[1]/Properties[1]/TimeZone[1]'.
Subject: derived.xml
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <LocalPresentationManifest> <Properties> <TimeZone> <Description>(GMT-06:00) Central Time (US &amp; Canada)</Description> <Abbreviation>CST</Abbreviation> <Identifier>19</Identifier> <Name>Central Time</Name> </TimeZone> <Description>This presentation is a brief overview of the MediaLandscape Product and Serivce Offerings.</Description> </Properties> </LocalPresentationManifest>
Subject: orig.xml
<LocalPresentationManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Properties> <Description>This presentation is a brief overview of the MediaLandscape Product and Serivce Offerings.</Description> <TimeZone> <Identifier>19</Identifier> <Name>Central Time</Name> <Description>(GMT-06:00) Central Time (US &amp; Canada)</Description> <Abbreviation>CST</Abbreviation> </TimeZone> </Properties> </LocalPresentationManifest>
Subject: [PATCH] Same tag name in different locations causes spurious change
I've looked at the code a little more deeply, and have created a patch that fixes the problem. The following diagnostic shows where the error comes from: use XML::SemanticDiff; use Data::Dumper; my $diff = XML::SemanticDiff->new(); for my $file ('derived.xml', 'orig.xml') { print "$file\n"; print " $_\n" for keys %{$diff->read_xml($file)}; } % perl showdoc.pl derived.xml /LocalPresentationManifest[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Description[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Identifier[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Abbreviation[1] /LocalPresentationManifest[1]/Properties[1]/Description[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Name[1] /LocalPresentationManifest[1]/Properties[1] orig.xml /LocalPresentationManifest[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Description[2] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Abbreviation[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Identifier[1] /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Name[1] /LocalPresentationManifest[1]/Properties[1]/Description[1] /LocalPresentationManifest[1]/Properties[1] The key information is the "Description[2]" from orig.xml. It seems that the $position_index hash is erroneously only looking at the tag name, not the xpath, when incrementing. The attached path uses the full xpath as the key to $position_index. -- Chris
--- /Users/chris/perl/lib/perl5/site_perl/XML/SemanticDiff.pm 2006-03-31 23:35:27.000000000 -0600 +++ lib/XML/SemanticDiff.pm 2006-04-01 00:52:54.000000000 -0600 @@ -150,14 +150,15 @@ my $context_length = scalar (@context); my $parent = $context[$context_length -1]; push (@{$descendents->{$parent}}, $element) if $parent; - $position_index->{"$element"}++; - my $test_context; + my $test_context = ''; - if (@context){ - $test_context = '/' . join ('/', map { $_ . '[' . $position_index->{$_} . ']' } @context); + for (@context){ + $test_context .= '/' . $_; + $test_context .= '[' . $position_index->{$test_context} . ']'; } - $test_context .= '/' . $element . '[' . $position_index->{$element} . ']'; + $test_context .= '/' . $element; + $test_context .= '[' . ++$position_index->{$test_context} . ']'; $doc->{"$test_context"}->{NamespaceURI} = $expat->namespace($element) || ""; $doc->{"$test_context"}->{Attributes} = \%attrs || {}; @@ -170,13 +171,15 @@ my @context = $expat->context; - my $test_context; + my $test_context = ''; - if (@context){ - $test_context = '/' . join ('/', map { $_ . '[' . $position_index->{$_} . ']' } @context); + for (@context){ + $test_context .= '/' . $_; + $test_context .= '[' . $position_index->{$test_context} . ']'; } - $test_context .= '/' . $element . '[' . $position_index->{$element} . ']'; + $test_context .= '/' . $element; + $test_context .= '[' . $position_index->{$test_context} . ']'; my $text; if ( defined( $char_accumulator->{$element} )) {
On Sat Apr 01 01:56:00 2006, CLOTHO wrote: Show quoted text
> I've looked at the code a little more deeply, and have created a
patch Show quoted text
> that fixes the problem. The following diagnostic shows where the
error Show quoted text
> comes from: > > use XML::SemanticDiff; > use Data::Dumper; > my $diff = XML::SemanticDiff->new(); > for my $file ('derived.xml', 'orig.xml') { > print "$file\n"; > print " $_\n" for keys %{$diff->read_xml($file)}; > } > > % perl showdoc.pl > derived.xml > /LocalPresentationManifest[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Description[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Identifier[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Abbreviation[1] > /LocalPresentationManifest[1]/Properties[1]/Description[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Name[1] > /LocalPresentationManifest[1]/Properties[1] > orig.xml > /LocalPresentationManifest[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Description[2] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Abbreviation[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Identifier[1] > /LocalPresentationManifest[1]/Properties[1]/TimeZone[1]/Name[1] > /LocalPresentationManifest[1]/Properties[1]/Description[1] > /LocalPresentationManifest[1]/Properties[1] > > > > The key information is the "Description[2]" from orig.xml. It seems > that the $position_index hash is erroneously only looking at the tag > name, not the xpath, when incrementing. The attached path uses the
full Show quoted text
> xpath as the key to $position_index. >
Hi CLOTHO! Thanks for your input. I translated your example into a test file, and together with a modified fix (as applicable to previous modifications) I applied it into my development line of XML-SemanticDiff here : http://svn.berlios.de/svnroot/repos/web-cpan/XML-SemanticDiff/trunk/ The changelog message reads: <<<<<<<<<<<<< - Applied a modified version of: http://rt.cpan.org/Ticket/Display.html?id=18491 - Fixes a case where the same tags in different places with identical contents, are not considered semantically identical. - Thanks to CLOTHO for reporting it and suggesting a fix. - t/11tag-in-different-locations.t Show quoted text
>>>>>>>>>>>>>
Note that this development was not approved by the XML-SemanticDiff originator. (who has been very unresponsive as of late). Regards, Shlomi Fish
Fixed in CPAN version - 0.96.