Skip Menu |

This queue is for tickets about the File-Extract CPAN distribution.

Report information
The Basics
Id: 16063
Status: new
Priority: 0/
Queue: File-Extract

People
Owner: Nobody in particular
Requestors: chris+rt [...] chrisdolan.net
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.06
Fixed in: (no value)



Subject: HTML output too tight
When extracting text from HTML, too much whitespace is discarded. For example, processing my page http://www.chrisdolan.net, I get text with the words butted up against eaach other: Chris Dolanskip navigationChristopher J. DolanHomeAboutProjectsTalkChris Dolan is a software developer living in Madison, Wisconsin. With a PhD in Astronomy, he has a very strong math and science background. He started programming professionally as a teenager in the late 1980s. During free time, he is an active participant in several online software development communities and is an avid bicyclist. ? 2005 Chris Dolan | xhtml, css, gpg vcard If I edit File::Extract::HTML and add the "tighten => 0" option to the HTML::TreeBuilder constructor, I get more useful output, but still with a little too much whitespace: Chris Dolan skip navigation Christopher J. Dolan Home About Projects Talk Chris Dolan is a software developer living in Madison, Wisconsin. With a PhD in Astronomy, he has a very strong math and science background. He started programming professionally as a teenager in the late 1980s. During free time, he is an active participant in several online software development communities and is an avid bicyclist. ? 2005 Chris Dolan | xhtml, css, gpg vcard