Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the HTML-Tidy CPAN distribution.

Report information
The Basics
Id: 11120
Status: resolved
Priority: 0/
Queue: HTML-Tidy

People
Owner: Nobody in particular
Requestors: anders [...] it.lth.se
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Thu, 20 Jan 2005 10:55:40 +0100
From: Anders Ardo <anders [...] it.lth.se>
To: bug-html-tidy [...] rt.cpan.org
CC: anders.ardo [...] it.lth.se
Subject: Loading of tidy config files / small patch
Hi Andy Lester, I'm using your HTML::Tidy with success - thanks! It's used to clean HTML files inside a focused Web-crawler. In this context it would be extremely handy to be able to influence the output from Tidy with some of it's many configuration options. So here is a small patch that implements that. Could you please have a look at it and see if it merits inclusion in the distribution? Thanks. The approach taken is to provide the configuration filename as a parameter to the new() method and then use it in calls to the internal _tidy_clean procedure. An alternative would ofcourse to have a new method to more explicitly set the config-file name. The patch passes your tests and my requirements, although I haven't tested it extensively or added a test to the 'make test' section. The other small change I've made is to add a "\n" to the end of the HTML string to be cleaned. It turned out that in a few cases tidy produced incomplete output (which is dissatrous in my application). If you clean the included t.html it ends with a '<p>' instead of '</body></html>' as it should. Adding "\n" to the end of the HTML string fixes that. t.pl is a small test script, usage: ./t.pl < t.html tidy.cfg is a Tidy configuration file used by t.pl Please let me know if there is anything else I can do to get this patch into the distribution. Cheers Anders -- Anders Ardö Department of Information Technology, Lund Institute of Technology Tel: +46 46 2227522 ; URL: http://www.it.lth.se/anders/
Download tidycfg.tgz
application/x-gtar 2.2k

Message body not shown because it is not plain text.

From: rhesa
[anders@it.lth.se - Thu Jan 20 05:08:31 2005]: Thanks a million for this patch! It solves all of my issues with HTML::Tidy :-)
RT-Send-CC: rhesa [...] cpan.org
This is going into 1.05_02 that I'm releasing tonight. If nothing goes wrong in the few days that follow, I'll release it as 1.06. Thanks very much.