Skip Menu |

This queue is for tickets about the HTML-Format CPAN distribution.

Report information
The Basics
Id: 69426
Status: open
Priority: 0/
Queue: HTML-Format

People
Owner: Nobody in particular
Requestors: jik [...] kamens.brookline.ma.us
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.05
Fixed in: (no value)



Subject: ’ in HTML input yields garbage character in PostScript output
Test script: Show quoted text
---cut here--- #!/usr/bin/perl use HTML::TreeBuilder; use HTML::FormatPS; $html = "<html><body>it&rsquo;s an apostrophe</body></html>"; $tree = HTML::TreeBuilder->new_from_content($html); $formatter = HTML::FormatPS->new(); $ps = $formatter->format($tree); binmode STDOUT; print $ps;
---cut here--- Redirect the output of the script to test.ps and then view test.ps and you'll see that there's a garbage character where the apostrophe is supposed to be.
This should be fixed in 2.08 See https://github.com/nigelm/html- format/commit/58fc839da0a0102d80c43acc1376347c7e56153e
Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Wed, 13 Jul 2011 17:18:46 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
You fixed &rsquo;, but it looks like you didn't fix &rdquo; or &ldquo;, and I don't know whether you fixed &rdquo;. Is it possible to do a more comprehensive fix that covers all the HTML entities that could cause problems? Thanks.
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Wed, 13 Jul 2011 17:19:33 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
Sorry, I meant to say I don't know whether you fixed &lsquo;
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

On Wed Jul 13 17:19:43 2011, jik@kamens.us wrote: Show quoted text
> Sorry, I meant to say I don't know whether you fixed &lsquo;
&lsquo; is fixed in 2.08 The double quote sets cannot be fixed without just mapping both open/close (right/left) quote sets to &quot; which would have people screaming about that too. The postcript is using latin1 encoding. If you look at the latin1 character set - http://www.utoronto.ca/web/HTMLdocs/NewHTML/iso_table.html - you will see that there is only one double quote character. So to make this work correctly we would have to either:- change the postscript encoding (along with the embedded code font encoding vector) use a hacked latin1 encoding with 2 glyths replaced with double quote chars special case the double quote chars so the string is rendered differently any of these is a bit of a hack (best one is just making it handle unicode throughout - but thats a ton of work and would mean a huge boilerplate encoding vector). Alternative solutions welcome, but I don't think there is a reasonable fix.
Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Thu, 14 Jul 2011 13:49:25 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
Any of the options you listed is better than what happens now, which is that &ldquo; and &rdquo; show up as garbage characters.
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

On Thu Jul 14 13:49:47 2011, jik@kamens.us wrote: Show quoted text
> Any of the options you listed is better than what happens now, which is > that &ldquo; and &rdquo; show up as garbage characters.
The unmappable characters should now be replaced by ? chars - the Encode to latin1 should do that. However have changed all the double quote code points to map to " which is wrong, but the best that can be done without significant re-architecting. Would love someone to do the work of reimplementing the whole thing into unicode throughout but I took this on as a basic maintainer, and do not intend to get into serious rewrite work. 2.09 has just uploaded