Date: | Thu, 20 Jan 2005 10:55:40 +0100 |
From: | Anders Ardo <anders [...] it.lth.se> |
To: | bug-html-tidy [...] rt.cpan.org |
CC: | anders.ardo [...] it.lth.se |
Subject: | Loading of tidy config files / small patch |
Hi Andy Lester,
I'm using your HTML::Tidy with success - thanks!
It's used to clean HTML files inside a focused Web-crawler. In this context
it would be extremely handy to be able to influence the output from Tidy
with some of it's many configuration options.
So here is a small patch that implements that. Could you please have a look
at it and see if it merits inclusion in the distribution? Thanks.
The approach taken is to provide the configuration filename as a parameter
to the new() method and then use it in calls to the internal _tidy_clean
procedure. An alternative would ofcourse to have a new method to more
explicitly set the config-file name.
The patch passes your tests and my requirements, although I haven't tested
it extensively or added a test to the 'make test' section.
The other small change I've made is to add a "\n" to the end of the HTML
string to be cleaned. It turned out that in a few cases tidy produced
incomplete output (which is dissatrous in my application). If you clean the
included t.html it ends with a '<p>' instead of '</body></html>' as it
should. Adding "\n" to the end of the HTML string fixes that.
t.pl is a small test script, usage: ./t.pl < t.html
tidy.cfg is a Tidy configuration file used by t.pl
Please let me know if there is anything else I can do to get this patch into
the distribution.
Cheers
Anders
--
Anders Ardö
Department of Information Technology, Lund Institute of Technology
Tel: +46 46 2227522 ; URL: http://www.it.lth.se/anders/
Message body not shown because it is not plain text.