Subject: | Patch - replace error + misc |
Hi Bruno,
I came across a bug in the replace attribute today which I think is due to an old split regex in Petal::Canonicalizer::XML. I replaced it with the one in _content which was working. I updated the 031 test which will fail if the patch is not applied.
The attached file is a cumulative patch which includes some mods to Cookbook.html and to t/041_Entities.t and its data file which you may/may not want to use. I had made those when I was tracking down issues with nbsp entities. I'm not sure if these tests really add anything useful to the test suite.
William
Index: lib/Petal/Cookbook.pod
===================================================================
RCS file: /var/spool/cvs/Petal/lib/Petal/Cookbook.pod,v
retrieving revision 1.6
diff -u -u -r1.6 Cookbook.pod
--- lib/Petal/Cookbook.pod 9 Mar 2005 15:24:15 -0000 1.6
+++ lib/Petal/Cookbook.pod 21 Mar 2005 15:42:16 -0000
@@ -30,6 +30,57 @@
=head2 Template naming
+<<<<<<< Cookbook.pod
+Petal is indifferent about the name of the template files. Personally, I like
+to name my templates with the .tmpl extension to help myself and designers
+distinguish templates from static html. Some GUI editors, though, will not
+open files without a htm/html extension (esp. under Windows).
+
+
+=head2 Fixing invalid templates (Is this XML well-formed?)
+
+If you are getting a parse_error when trying to process your template, you
+will need to clean up your XHTML template in order for Petal to process it.
+Two tools will be of great assistance in taking the step towards better
+standards compliance--HTML Tidy (L<http://tidy.sf.net>) and xmllint. In
+addition, you can use the page validation services at W3C
+(L<http://validator.w3.org/>). Alternatively, you could use the
+L<Petal::Parser::HTB> module which will parse non well-formed HTML documents
+using L<HTML::TreeBuilder>.
+
+HTML Tidy will rewrite your document into valid XHTML and, if requested, even
+replace legacy formatting tags with their CSS counterparts. You can safely
+ignore the warnings about proprietary attributes. Be sure to read the output
+of what HTML Tidy is doing or else you may find it removing important tags
+which it thinks are empty or invalid (e.g., inline elements outside of a
+block). One of the important options that should be set is output_xhtml
+(-asxhtml from the command-line). Here's an example of how to use it (see the
+documentation for complete details):
+
+ tidy --asxhtml original_file.html > new_file.html
+
+Once your document is well-formed, you can use xmllint to do day-to-day
+checking that it stays well-formed without having to wade through the warnings
+that HTML Tidy will generate about proprietary attributes. The following command will check that a document is well-formed:
+
+ xmllint --noout <filename>
+
+To prevent errors about undefined namespace prefix, be sure to include these
+in your template like so:
+
+ <html xmlns="http://www.w3.org/1999/xhtml"
+ xmlns:petal="http://purl.org/petal/1.0/"
+ xmlns:metal="http://purl.org/petal/1.0/">
+
+You may receive errors from xmllint about unknown entities such as nbsp. These
+can be safely ignored. If you find a way to suppress these warnings, please
+let us know. In the meantime, you can pass the output through grep to ignore
+these bogus warnings:.
+
+ xmllint --noout tmpl/contact_info.tmpl >& grep -v 'Entity'
+
+Now you have no excuse for not creating well-formed XHTML documents.
+=======
Petal is indifferent about the name of the template files. Personally, I like
to name my templates with the .tmpl extension to help myself and designers
distinguish templates from static html. Some GUI editors, though, will not
@@ -80,6 +131,7 @@
xmllint --noout tmpl/contact_info.tmpl >& grep -v 'Entity'
Now you have no excuse for not creating well-formed XHTML documents.
+>>>>>>> 1.6
=head2 Passing a hashreference to Petal::process
Index: lib/Petal/Canonicalizer/XML.pm
===================================================================
RCS file: /var/spool/cvs/Petal/lib/Petal/Canonicalizer/XML.pm,v
retrieving revision 1.35
diff -u -u -r1.35 XML.pm
--- lib/Petal/Canonicalizer/XML.pm 5 Jan 2005 15:33:45 -0000 1.35
+++ lib/Petal/Canonicalizer/XML.pm 21 Mar 2005 15:42:17 -0000
@@ -494,7 +494,8 @@
my @new = map {
$_ = $class->_encode_backslash_semicolon ($_);
"<?var name=\"$_\"?>";
- } split /(\s|\r|\n)*\;(\s|\r|\n)*/ms, $expr;
+ } $class->_split_expression ($expr);
push @Result, @new;
$NodeStack[$#NodeStack]->{replace} = 'true';
Index: t/041_Entities.t
===================================================================
RCS file: /var/spool/cvs/Petal/t/041_Entities.t,v
retrieving revision 1.6
diff -u -u -r1.6 041_Entities.t
--- t/041_Entities.t 10 Dec 2003 15:22:00 -0000 1.6
+++ t/041_Entities.t 21 Mar 2005 15:42:17 -0000
@@ -21,7 +21,12 @@
my $string = Petal->new ( 'entities.html' )->process();
my $copy = chr (169);
my $reg = chr (174);
+ my $nbsp = chr (160);
+ my $acirc = chr (194);
like ($string, qr/$copy/ => 'Copyright');
like ($string, qr/$reg/ => 'Registered');
+ like ($string, qr/$reg/ => 'Non-break space');
+ unlike ($string, qr/$acirc/ => 'A circumflex not present');
}
Index: t/data/entities.html
===================================================================
RCS file: /var/spool/cvs/Petal/t/data/entities.html,v
retrieving revision 1.1
diff -u -u -r1.1 entities.html
--- t/data/entities.html 19 Aug 2003 14:38:11 -0000 1.1
+++ t/data/entities.html 21 Mar 2005 15:42:17 -0000
@@ -2,4 +2,5 @@
<!-- this simple test checks for HTML entities expansion -->
Copyright: ©
Registered: ®
+ Non-break space:
</body>
Index: t/data/test_attributes2.xml
===================================================================
RCS file: /var/spool/cvs/Petal/t/data/test_attributes2.xml,v
retrieving revision 1.1
diff -u -u -r1.1 test_attributes2.xml
--- t/data/test_attributes2.xml 10 Feb 2003 16:20:47 -0000 1.1
+++ t/data/test_attributes2.xml 21 Mar 2005 15:42:17 -0000
@@ -7,4 +7,6 @@
petal:attributes="baz replace; buzz replace; quxx replace"
petal-meta:attributes
="petal:attributes string:baz ${baz_meta}\; buzz ${buzz_meta}\; quxx ${quxx_meta}" />
+ <bat petal-meta:content="string:Content\;"></bat>
+ <bat petal-meta:replace="string:Replace\;"></bat>
</foo>