Subject: | <script/> leads to ignoring <script> events |
First of all, thank you very much for integrating bug 18936 so quickly
into release 3.53.
This bug applies to both 3.52 and to 3.53 releases (just the input form
does not yet offer the new version number).
*** Problem ***
After declaring
$p->empty_element_tags(1); $p->ignore_elements("script","x");
the tag <x/> works correctly like <x></x>.
Howeverm the tag <script/> confuses the parser: A following <script> tag
is ignored and left in the text event!
The attached test script runs a few sample strings through the parser
with the above settings and prints the text, tag and event values.
The first example demonstrates the bug. The following examples
demonstrate that the <x/> and <y/> tags work correctly according to the
documentation:
*** Tests with version 3.53 ****
================ Parse: <script/>A<script>B</script>C ================
'' start_document
'A<script>B' text
'' end_document
================ Parse: <x/>A<x>B</x>C ================
'' start_document
'A' text
'C' text
'' end_document
================ Parse: <y/>A<y>B</y>C ================
'' start_document
'<y/>' <y> start
'' </y> end
'A' text
'<y>' <y> start
'B' text
'</y>' </y> end
'C' text
'' end_document
================ Parse: </x>A ================
'' start_document
'' end_document
www@kranich:~/111$ perl test.pl
================ Parse: <script/>A<script>B</script>C ================
'' start_document
'A<script>B' text
'C' text
'' end_document
================ Parse: <x/>A<x>B</x>C ================
'' start_document
'A' text
'C' text
'' end_document
================ Parse: <y/>A<y>B</y>C ================
'' start_document
'<y/>' <y> start
'' </y> end
'A' text
'<y>' <y> start
'B' text
'</y>' </y> end
'C' text
'' end_document
================ Parse: </x>A ================
'' start_document
'A' text
'' end_document
For your reference, I run the same script with version 3.52. We find
that the two bugs are not related: the output shows both the effects of
this bug and the effects of bug 18936:
================ Parse: <script/>A<script>B</script>C ================
'' start_document
'A<script>B' text
'' end_document
================ Parse: <x/>A<x>B</x>C ================
'' start_document
'A' text
'C' text
'' end_document
================ Parse: <y/>A<y>B</y>C ================
'' start_document
'<y/>' <y> start
'' </y> end
'A' text
'<y>' <y> start
'B' text
'</y>' </y> end
'C' text
'' end_document
================ Parse: </x>A ================
'' start_document
'' end_document
www@kranich:~/111$ perl -Mblib=HTML-Parser-3.52/lib/ test.pl
================ Parse: <script/>A<script>B</script>C ================
'' start_document
'A<script>B' text
'' end_document
================ Parse: <x/>A<x>B</x>C ================
'' start_document
'A' text
'C' text
'' end_document
================ Parse: <y/>A<y>B</y>C ================
'' start_document
'<y/>' <y> start
'' </y> end
'A' text
'<y>' <y> start
'B' text
'</y>' </y> end
'C' text
'' end_document
================ Parse: </x>A ================
'' start_document
'' end_document
This time, I don't have a fix.
Best regards,
Yaakov Belch
Subject: | test.pl |
#!/usr/bin/perl -w
use HTML::Parser (); my $p;
$p=HTML::Parser->new( api_version => 3);
$p->empty_element_tags(1);
$p->ignore_elements("script","x");
$p->handler("default"=>sub{my($event,$text,$tag)=@_;
$tag=$tag?"<$tag>":"";
print "'$text'\t$tag\t$event\n";
},"event,text,tag");
for my $text (
'<script/>A<script>B</script>C',
'<x/>A<x>B</x>C',
'<y/>A<y>B</y>C',
'</x>A'
) {
print "\n================ Parse: $text ================\n";
$p->parse($text)->eof;
}