Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Pod-Simple CPAN distribution.

Report information
The Basics
Id: 74390
Status: resolved
Priority: 0/
Queue: Pod-Simple

People
Owner: Nobody in particular
Requestors: pertusus [...] free.fr
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 3.26



Subject: keep =encoding information
Date: Wed, 25 Jan 2012 21:19:18 +0100
To: bug-Pod-Simple [...] rt.cpan.org
From: Patrice Dumas <pertusus [...] free.fr>
This has a wishlist severity. When Pod::Simple processes a pod fragment, it uses the information of =encoding, but do not give that information to converter subclasses. Although many converters do not care, some do, for example a translator to Texinfo I am writing could use this information. What would be even better would be to also convey the information that an encoding is used, maybe by having an accessor to the currently used encoding?
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 12:26:15 -0800
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
On Jan 25, 2012, at 12:19 PM, Patrice Dumas via RT wrote: Show quoted text
> When Pod::Simple processes a pod fragment, it uses the information of > =encoding, but do not give that information to converter subclasses. Although > many converters do not care, some do, for example a translator to Texinfo > I am writing could use this information. > > What would be even better would be to also convey the information that > an encoding is used, maybe by having an accessor to the currently used > encoding?
Why would that be important? If there is an =encoding, then the resulting text will be decoded to Perl’s internal form. You could then encode it for output however you like, without regard to the original encoding. Best, David
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 21:31:40 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Wed, Jan 25, 2012 at 03:26:26PM -0500, David Wheeler via RT wrote: Show quoted text
> > Why would that be important? If there is an =encoding, then the resulting text will be decoded to Perl’s internal form. You could then encode it for output however you like, without regard to the original encoding.
Sure, but you may also want to encode for output in the original encoding, considering that it is a hint on the user preferred encoding for that manual. This is especially relevant for conversion to formats that are not themselves output formats but are documentation formats too (like Texinfo, LaTeX...). -- Pat
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 12:44:30 -0800
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <david [...] justatheory.com>
On Jan 25, 2012, at 12:31 PM, Patrice Dumas via RT wrote: Show quoted text
> Sure, but you may also want to encode for output in the original > encoding, considering that it is a hint on the user preferred > encoding for that manual. This is especially relevant for conversion > to formats that are not themselves output formats but are documentation > formats too (like Texinfo, LaTeX...).
Okay, I could see that. Care to create a patch? Maybe just an encoding attribute on the Pod::Simple object. Would that do it?
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 21:53:11 +0100
To: "david [...] justatheory.com via RT" <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Wed, Jan 25, 2012 at 03:44:41PM -0500, david@justatheory.com via RT wrote: Show quoted text
> > Okay, I could see that. Care to create a patch? Maybe just an encoding attribute on the Pod::Simple object. Would that do it?
I could have a try. I think that an encoding attribute on the Pod::Simple object would not be enough, because it would not be possible to know when it becomes avalable if one do not know that an =encoding was just processed. Typically, when starting the parsing, this wouldn't be set, but when the end of an =encoding happens, it would be the right moment to look at the encoding attribute.
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 13:57:49 -0800
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
On Jan 25, 2012, at 12:53 PM, Patrice Dumas via RT wrote: Show quoted text
> I could have a try. > > I think that an encoding attribute on the Pod::Simple object would not > be enough, because it would not be possible to know when it becomes > avalable if one do not know that an =encoding was just processed. > > Typically, when starting the parsing, this wouldn't be set, but when > the end of an =encoding happens, it would be the right moment to > look at the encoding attribute.
I guess your formatter could just not output anything until it knows what encoding it’s dealing with. And it would reach that point once the the =encoding was seen. In fact, you can already get that via $self->{encoding}. Note what `perlpod` says about =encoding: Show quoted text
> "=encoding encodingname" > This command is used for declaring the encoding of a document. > Most users won’t need this; but if your encoding isn’t US‐ASCII or > Latin−1, then put a "=encoding encodingname" command early in the > document so that pod formatters will know how to decode the > document. For encodingname, use a name recognized by the > Encode::Supported module. Examples: > > =encoding utf8 > > =encoding koi8−r > > =encoding ShiftJIS > > =encoding big5 > > "=encoding" affects the whole document, and must occur only once.
So once you see $self->{encoding} has a value, you should be good to go. Best, David
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 25 Jan 2012 23:48:08 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Wed, Jan 25, 2012 at 04:58:02PM -0500, David Wheeler via RT wrote: Show quoted text
> > I guess your formatter could just not output anything until it knows what encoding it’s dealing with. And it would reach that point once the the =encoding was seen. In fact, you can already get that via $self->{encoding}. > > So once you see $self->{encoding} has a value, you should be good to go.
Not necessarily, as it is set even if the encoding is not known. Also, I don't think that ascii happening before =encoding should be ignored. It happens for example for tests in t/corpus/cp1256.txt. So, in the patch I prepared, I add a $self->{'used_encoding'} member set only if the encoding was succesfully set, and a corresponding accessor. I addition, I also added a sub encoding function that returns 'encoding' (or sets it, but you can remove that code if you don't want it).
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Thu, 26 Jan 2012 00:17:12 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
Here is a patch proposal. Some tests fail, but I checked that the difference is the addition of encoding in the tree. -- Pat

Message body is not shown because sender requested not to inline it.

On 2012-01-25 18:17:47, pertusus@free.fr wrote: Show quoted text
> Here is a patch proposal. Some tests fail, but I checked that > the difference is the addition of encoding in the tree.
I'm thinking that, instead of `used_encoding()`, it should be called `detected_encoding()`, maybe? Also, I don't understand why XML output now emits an <encoding> element: --- t/corpus/2202jp.xml 2012-01-02 16:06:30.000000000 -0800 +++ t/corpus/2202jp.xml_out 2012-03-01 23:16:44.000000000 -0800 @@ -8,6 +8,9 @@ <head1 start_line="7"> DESCRIPTION </head1> + <encoding start_line="9"> + iso-2022-jp + </encoding> <Para start_line="11"> This is a test Pod document in ISO-2202-JP. Its content is some Japanese haiku by famous poets. This seems to be the cause of the test failures, but I don't think that element should be there. Can you do something about that? Thanks, David
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Sun, 4 Mar 2012 20:49:24 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Fri, Mar 02, 2012 at 02:19:48AM -0500, David Wheeler via RT wrote: Show quoted text
> > I'm thinking that, instead of `used_encoding()`, it should be called `detected_encoding()`,
No problem, here is an updated patch. Show quoted text
> Also, I don't understand why XML output now emits an <encoding> element: > > --- t/corpus/2202jp.xml 2012-01-02 16:06:30.000000000 -0800 > +++ t/corpus/2202jp.xml_out 2012-03-01 23:16:44.000000000 -0800 > @@ -8,6 +8,9 @@ > <head1 start_line="7"> > DESCRIPTION > </head1> > + <encoding start_line="9"> > + iso-2022-jp > + </encoding> > <Para start_line="11"> > This is a test Pod document in ISO-2202-JP. Its content > is some Japanese haiku by famous poets. > > This seems to be the cause of the test failures, but I don't think that element should be there.
Adding this encoding element to the tree is the objective of this patch (in addition to separating 'detected_encoding' and 'encoding'). So having those additional elements is a feature! -- Pat

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Mon, 5 Mar 2012 21:01:28 -0800
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <david [...] kineticode.com>
On Mar 4, 2012, at 11:50 AM, Patrice Dumas via RT wrote: Show quoted text
>> I'm thinking that, instead of `used_encoding()`, it should be called `detected_encoding()`,
> > No problem, here is an updated patch.
Thanks. Show quoted text
>> Also, I don't understand why XML output now emits an <encoding> element: >> >> --- t/corpus/2202jp.xml 2012-01-02 16:06:30.000000000 -0800 >> +++ t/corpus/2202jp.xml_out 2012-03-01 23:16:44.000000000 -0800 >> @@ -8,6 +8,9 @@ >> <head1 start_line="7"> >> DESCRIPTION >> </head1> >> + <encoding start_line="9"> >> + iso-2022-jp >> + </encoding> >> <Para start_line="11"> >> This is a test Pod document in ISO-2202-JP. Its content >> is some Japanese haiku by famous poets. >> >> This seems to be the cause of the test failures, but I don't think that element should be there.
> > Adding this encoding element to the tree is the objective of this patch > (in addition to separating 'detected_encoding' and 'encoding'). > > So having those additional elements is a feature!
Wait, what? Why? XML has its own encoding syntax that should go into the root-level element. An emitted document may have only one encoding. Best, David
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Wed, 7 Mar 2012 23:09:09 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Tue, Mar 06, 2012 at 12:01:48AM -0500, David Wheeler via RT wrote: Show quoted text
> > > > Adding this encoding element to the tree is the objective of this patch > > (in addition to separating 'detected_encoding' and 'encoding'). > > > > So having those additional elements is a feature!
> > Wait, what? Why? XML has its own encoding syntax that should go into the root-level element.
That's a really weird comment... What I understand is that the XML produced in the corpus.t test is simply a representation of the tree as produced by Pod::Simple::DumpAsXML. That's not a 'real' XML, and as far as I can tell it never has an encoding. And if the =encoding, as proposed in my patch is added to the tree, it will be in the XML output dumping the tree (but also in any other output that correspond to a dumping of the tree). Show quoted text
> An emitted document may have only one encoding.
The number of encoding in the tree resulting from a Pod file parsing is not related to the XML encoding. It may be mandated that there is only one such element, but because there is one such node in the perl structure tree resulting from the Pod file parsing. That's completly unrelated to the XML output. -- Pat
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Sat, 10 Mar 2012 12:53:14 -0800
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
On Mar 7, 2012, at 1:47 PM, Patrice Dumas via RT wrote: Show quoted text
> That's a really weird comment... What I understand is that the XML > produced in the corpus.t test is simply a representation of the tree as > produced by Pod::Simple::DumpAsXML. That's not a 'real' XML, and as far > as I can tell it never has an encoding. And if the =encoding, as > proposed in my patch is added to the tree, it will be in the XML output > dumping the tree (but also in any other output that correspond to a > dumping of the tree).
Oh, I’m an idiot. Sorry. Show quoted text
>
>> An emitted document may have only one encoding.
> > The number of encoding in the tree resulting from a Pod file parsing is > not related to the XML encoding. It may be mandated that there is only > one such element, but because there is one such node in the perl > structure tree resulting from the Pod file parsing. That's completly > unrelated to the XML output.
Okay. Looks okay then. Two more requests: 1. Please update the tests so that they pass. 2. Please document the new methods. Then I will commit. :-) Thanks, David
Subject: Re: [rt.cpan.org #74390] keep =encoding information
Date: Sat, 17 Mar 2012 00:16:29 +0100
To: David Wheeler via RT <bug-Pod-Simple [...] rt.cpan.org>
From: Dumas Patrice <pertusus [...] free.fr>
On Sat, Mar 10, 2012 at 03:53:28PM -0500, David Wheeler via RT wrote: Show quoted text
> > Okay. Looks okay then. Two more requests: > > 1. Please update the tests so that they pass. > 2. Please document the new methods.
Here it is. Since setting the encoding with $parser->encoding('encoding') is not really well tested I didn't really document it. -- Pat

Message body is not shown because sender requested not to inline it.

This functionality has been added in Pod::Simple 3.22, now on CPAN. Check it out!