Skip Menu |

This queue is for tickets about the Pod-Pandoc CPAN distribution.

Report information
The Basics
Id: 133684
Status: resolved
Priority: 0/
Queue: Pod-Pandoc

People
Owner: Nobody in particular
Requestors: jkeenan [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: pod2pandoc POD-to-HTML conversion ignores '=encoding' directive
The POD-to-HTML converter used by pod2pandoc fails to honor an '=encoding utf8' directive at the head of a file composed in .pod format. No "Content-Type" header is set in the output file. As a consequence, characters which need HTML encoding may be rendered inaccurately by certain web browser/server combinations. Consider the following program: ##### $ cat reading-list.pod =encoding utf8 =head1 Reading List Fanny Pigeaud and Ndongo Samba Sylla, I<L'Arme Invisible de la Françafrique: Une Histoire du Franc CFA>, La Decouverte, 2018. (In French) =cut ##### Run this program through 'pod2pandoc': ##### pod2pandoc ./reading-list.pod -o ./reading-list.html ##### Here is the output: ##### $ cat reading-list.html <h1 id="reading-list">Reading List</h1> <p>Fanny Pigeaud and Ndongo Samba Sylla, <em>L'Arme Invisible de la Françafrique: Une Histoire du Franc CFA</em>, La Decouverte, 2018. (In French)</p> ##### Note that the file contains no tag setting the 'charset'. Indeed, it sets no 'Content-Type' header at all. I would have expected something like this tag at the start of the file: ##### <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> ##### Now, on my laptop, notwithstanding the absence of a 'charset' attribute, the c-cedilla in 'Françafrique' renders correctly. However, once I 'scp' this .html file to a server to which I have access, the c-cedilla fails to render correctly. See: http://thenceforward.net/perl/misc/reading-list.html. If I go to that server and manually insert the 'meta' tag listed above into the HTML (copying then renaming the file), the c-cedilla renders correctly. See: http://thenceforward.net/perl/misc/reading-list-corrected.html. I do not experience this problem with the POD-to-txt, POD-to-pdf or POD-to-odt converters used by 'pod2pandoc'. The POD-to-HTML conversion process used by 'pod2pandoc' needs to be revised to take into account any '=encoding utf8' directive and include the corresponding 'charset' attribute in the .html output. Thank you very much. Jim Keenan
Thanks for your feedback and example. You wrote: Show quoted text
> Run this program through 'pod2pandoc': > > ##### > pod2pandoc ./reading-list.pod -o ./reading-list.html > ##### > > Here is the output: > > ##### > $ cat reading-list.html > <h1 id="reading-list">Reading List</h1> > <p>Fanny Pigeaud and Ndongo Samba Sylla, <em>L'Arme Invisible de la > Françafrique: Une Histoire du Franc CFA</em>, La Decouverte, 2018. (In > French)</p> > ##### > > Note that the file contains no tag setting the 'charset'. Indeed, it > sets no 'Content-Type' header at all. I would have expected something > like this tag at the start of the file: > > ##### > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> > #####
HTML fragment without explicit encoding is the default HTML output format of pandoc. To include ad header, add option --standalone / -S: pod2pandoc ./reading-list.pod -o ./reading-list.html -S See https://pandoc.org/MANUAL.html#options for additional options.
On Fri Nov 06 03:49:19 2020, jakob@nichtich.de wrote: Show quoted text
> Thanks for your feedback and example. You wrote: >
> > Run this program through 'pod2pandoc': > > > > ##### > > pod2pandoc ./reading-list.pod -o ./reading-list.html > > ##### > > > > Here is the output: > > > > ##### > > $ cat reading-list.html > > <h1 id="reading-list">Reading List</h1> > > <p>Fanny Pigeaud and Ndongo Samba Sylla, <em>L'Arme Invisible de la > > Françafrique: Une Histoire du Franc CFA</em>, La Decouverte, 2018. > > (In > > French)</p> > > ##### > > > > Note that the file contains no tag setting the 'charset'. Indeed, it > > sets no 'Content-Type' header at all. I would have expected > > something > > like this tag at the start of the file: > > > > ##### > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> > > #####
> > HTML fragment without explicit encoding is the default HTML output > format of pandoc. To include ad header, add option --standalone / -S: > > pod2pandoc ./reading-list.pod -o ./reading-list.html -S > > See https://pandoc.org/MANUAL.html#options for additional options.
Thanks for your response; your suggestion works. ##### $ pod2pandoc ./reading-list.pod -o my-reading-list.html --standalone $ cat my-reading-list.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta name="generator" content="pandoc" /> <title></title> <style type="text/css">code{white-space: pre;}</style> </head> <body> <h1 id="reading-list">Reading List</h1> <p>Fanny Pigeaud and Ndongo Samba Sylla, <em>L'Arme Invisible de la Françafrique: Une Histoire du Franc CFA</em>, La Decouverte, 2018. (In French)</p> </body> </html> ##### Thank you very much. Jim Keenan