Skip Menu |

This queue is for tickets about the podlators CPAN distribution.

Report information
The Basics
Id: 118240
Status: resolved
Priority: 0/
Queue: podlators

People
Owner: RRA [...] cpan.org
Requestors: h.huzen [...] belastingdienst.nl
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 4.09



Subject: In EBCDIC context a '[' gets deleted
Date: Mon, 3 Oct 2016 12:53:24 +0200
To: bug-podlators [...] rt.cpan.org
From: h.huzen [...] belastingdienst.nl
Using podlators-4.08 on z/OS (IBM mainframe, perl 5.22, default codepage is EBCDIC 1047) a '[' gets deleted: === sample code === use Pod::Text; my $textin = qq` =pod brackets: [] =cut `; print "IN : $textin\n"; my $textin_fh; my $textout_fh; my $tparser= Pod::Text->new(); open($textin_fh, '<', \$textin) || die; open($textout_fh, '>', \$textout) || die; $tparser->parse_from_file($textin_fh, $textout_fh); print "OUT: $textout\n"; === output === IN : =pod brackets: [] =cut OUT: brackets: ] ============ Kind regards, Harrie Huzen ------------------------------------------------------------------------ De Belastingdienst stelt e-mail niet open voor aanvragen, aangiften, bezwaarschriften, verzoeken, klachten, ingebrekestellingen en soortgelijke formele berichten. Dit bericht is uitsluitend bestemd voor de geadresseerde. Het bericht kan vertrouwelijke informatie bevatten waarvoor de fiscale geheimhoudingsplicht geldt. Als u dit bericht per abuis hebt ontvangen, wordt u verzocht het te verwijderen en de afzender te informeren. The Dutch Tax and Customs Administration does not accept filings, requests, appeals, complaints, notices of default or similar formal notices, sent by email. This message is solely intended for the addressee. It may contain information that is confidential and legally privileged. If you are not the intended recipient please delete this message and notify the sender.
Subject: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Mon, 03 Oct 2016 09:43:55 -0700
To: "h.huzen\ [...] belastingdienst.nl via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> Using podlators-4.08 on z/OS (IBM mainframe, perl 5.22, default codepage is > EBCDIC 1047) a '[' gets deleted:
Could you try this with Pod::Simple::Text and see if you get the same behavior? I'm trying to narrow this down to see if it's something that Pod::Text is doing or if it's in Pod::Simple, which does the codepage handling. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
Subject: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Tue, 4 Oct 2016 09:04:04 +0200
To: bug-podlators [...] rt.cpan.org
From: h.huzen [...] belastingdienst.nl
Hi Russ, It is very confusing. 1) Pod::Simple::Text I tried with Pod::Simple::Text, with following result: IN : =pod brackets: [] =cut Use of uninitialized value $Pod::Simple::nbsp in regexp compilation at /u/a21g098/huzeh00/NB/EI/adk_root/os390/perl/lib/site_perl/5.22.0/Pod/Simple/Text.pm line 100, <$textin_fh> line 5. Use of uninitialized value $Pod::Simple::shy in regexp compilation at /u/a21g098/huzeh00/NB/EI/adk_root/os390/perl/lib/site_perl/5.22.0/Pod/Simple/Text.pm line 101, <$textin_fh> line 5. OUT: brackets: [] So apart from the uninitialized-error the brackets come out ok. 2) tr command in Pod::Text I found that in Pod::Text.pm at about line 280 there is a line $text =~ tr/\240\255/ /d; I guess it is to remove characters 0xA0 and 0xFF from the $text string. but when I put in a hexdump just before and just after I see that the 0xAD character (which is the '[' in codepage 1047) is removed: IN : 40 40 40 40 40 82 99 81 83 92 85 a3 a2 7a 40 ad bd 15 15 IN : brackets: [] OUT: 40 40 40 40 40 82 99 81 83 92 85 a3 a2 7a 40 bd 15 15 OUT: brackets: ] I don't know the purpose of that line so I cannot say if it is correct. But it is tricky, since there are 2 'oldchars' specified and 1 'newchar' plus a delete command. If I understand the comment in http://stackoverflow.com/questions/30710164/need-help-in-understanding-perl-tr-command-with-d correctly, the \240 character will be replaced with the space and the \255 will be deleted. Furthermore this action is codepage unaware, so that is a risk in itself. But all of this does not explain why the 0xAD character is removed. I find that it is the \255 character that matches my 0xAD. Which I don't get. So (apart from the obscurity of the tr-line) this narrows it down to the question why the \255 in the tr command matches my 0xAD. 3) buggy tr ? # cat tt.pl sub dumpData { my ($pkg,$l,$t)=@_; print "$t: "; my @o=unpack("C*",$l); for my $o (@o) { printf "%02x ",$o; } print "\n"; print "$t: "; printf "%s", $l; print "\n"; } my $text="abc[]"; dumpData(undef, $text, "IN " ); $text =~ tr/\255//d; dumpData(undef, $text, "OUT" ); #HUZEH00@APMVST1 /u/a21g098/huzeh00 # perl tt.pl IN : 81 82 83 ad bd IN : abc[] OUT: 81 82 83 bd OUT: abc] The circumstance is that I am now working on a newly installed perl 5.22 distribution, so I'm getting the feeling that tr might be buggy here. I hope this information helps. What is your opinion? Met vriendelijke groeten Harrie Huzen Specialist Ontwikkelen ........................................................................ Belastingdienst Centrum voor Applicatieontwikkeling en -onderhoud Service Delivery – FAD Ondersteuning Ontwikkel Services (OOS) Competence Center Gen/GuardIEn John F. Kennedylaan 8 | 7314 PS | Apeldoorn | G2 Flex Postbus 9050 | 7300 GM | Apeldoorn ........................................................................ M 06 - 55 42 08 50 h.huzen@belastingdienst.nl Competence Center Gen/GuardIEn ........................................................................ = may the source be with you = Van: "Russ Allbery via RT" <bug-podlators@rt.cpan.org> Aan: h.huzen@belastingdienst.nl Datum: 03-10-2016 18:44 Onderwerp: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted <URL: https://rt.cpan.org/Ticket/Display.html?id=118240 > "h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> Using podlators-4.08 on z/OS (IBM mainframe, perl 5.22, default codepage
is Show quoted text
> EBCDIC 1047) a '[' gets deleted:
Could you try this with Pod::Simple::Text and see if you get the same behavior? I'm trying to narrow this down to see if it's something that Pod::Text is doing or if it's in Pod::Simple, which does the codepage handling. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print ------------------------------------------------------------------------ De Belastingdienst stelt e-mail niet open voor aanvragen, aangiften, bezwaarschriften, verzoeken, klachten, ingebrekestellingen en soortgelijke formele berichten. Dit bericht is uitsluitend bestemd voor de geadresseerde. Het bericht kan vertrouwelijke informatie bevatten waarvoor de fiscale geheimhoudingsplicht geldt. Als u dit bericht per abuis hebt ontvangen, wordt u verzocht het te verwijderen en de afzender te informeren. The Dutch Tax and Customs Administration does not accept filings, requests, appeals, complaints, notices of default or similar formal notices, sent by email. This message is solely intended for the addressee. It may contain information that is confidential and legally privileged. If you are not the intended recipient please delete this message and notify the sender.
Download graycol.gif
image/gif 105b
graycol.gif
Subject: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Tue, 4 Oct 2016 09:09:13 +0200
To: bug-podlators [...] rt.cpan.org
From: h.huzen [...] belastingdienst.nl
Hi Russ, Foolish me. My brain is much more hex than octal. The \255 of course is NOT 0xFF but it is exactly 0xAD. So that explains that 0xAD is removed ! Met vriendelijke groeten Harrie Huzen Specialist Ontwikkelen ........................................................................ Belastingdienst Centrum voor Applicatieontwikkeling en -onderhoud Service Delivery – FAD Ondersteuning Ontwikkel Services (OOS) Competence Center Gen/GuardIEn John F. Kennedylaan 8 | 7314 PS | Apeldoorn | G2 Flex Postbus 9050 | 7300 GM | Apeldoorn ........................................................................ M 06 - 55 42 08 50 h.huzen@belastingdienst.nl Competence Center Gen/GuardIEn ........................................................................ = may the source be with you = Van: "Russ Allbery via RT" <bug-podlators@rt.cpan.org> Aan: h.huzen@belastingdienst.nl Datum: 03-10-2016 18:44 Onderwerp: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted <URL: https://rt.cpan.org/Ticket/Display.html?id=118240 > "h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> Using podlators-4.08 on z/OS (IBM mainframe, perl 5.22, default codepage
is Show quoted text
> EBCDIC 1047) a '[' gets deleted:
Could you try this with Pod::Simple::Text and see if you get the same behavior? I'm trying to narrow this down to see if it's something that Pod::Text is doing or if it's in Pod::Simple, which does the codepage handling. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print ------------------------------------------------------------------------ De Belastingdienst stelt e-mail niet open voor aanvragen, aangiften, bezwaarschriften, verzoeken, klachten, ingebrekestellingen en soortgelijke formele berichten. Dit bericht is uitsluitend bestemd voor de geadresseerde. Het bericht kan vertrouwelijke informatie bevatten waarvoor de fiscale geheimhoudingsplicht geldt. Als u dit bericht per abuis hebt ontvangen, wordt u verzocht het te verwijderen en de afzender te informeren. The Dutch Tax and Customs Administration does not accept filings, requests, appeals, complaints, notices of default or similar formal notices, sent by email. This message is solely intended for the addressee. It may contain information that is confidential and legally privileged. If you are not the intended recipient please delete this message and notify the sender.
Download graycol.gif
image/gif 105b
graycol.gif
Subject: Re: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Tue, 04 Oct 2016 08:51:05 -0700
To: "h.huzen\ [...] belastingdienst.nl via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> 2) tr command in Pod::Text
Show quoted text
> I found that in Pod::Text.pm at about line 280 there is a line
Show quoted text
> $text =~ tr/\240\255/ /d;
Show quoted text
> I guess it is to remove characters 0xA0 and 0xFF from the $text string. > but when I put in a hexdump just before and just after I see that the 0xAD > character (which is the '[' in codepage 1047) is removed:
Aha! Thank you. Bitten by that again. I'll work out some alternate approach there. That's always been a bit of an ugly hack. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
Subject: Re: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Thu, 27 Oct 2016 22:30:23 -0700
To: "Russ Allbery via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"Russ Allbery via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> "h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes:
Show quoted text
>> 2) tr command in Pod::Text
Show quoted text
>> I found that in Pod::Text.pm at about line 280 there is a line
Show quoted text
>> $text =~ tr/\240\255/ /d;
Show quoted text
>> I guess it is to remove characters 0xA0 and 0xFF from the $text string. >> but when I put in a hexdump just before and just after I see that the 0xAD >> character (which is the '[' in codepage 1047) is removed:
Show quoted text
> Aha! Thank you. Bitten by that again.
Show quoted text
> I'll work out some alternate approach there. That's always been a bit of > an ugly hack.
Okay, I think I've fixed this. Unfortunately, it's impossible to test an EBCDIC encoding with Pod::Simple without being on an EBCDIC system (so far as I know). Could you possibly check the current podlators at: https://github.com/rra/podlators to see if it still has this problem? If you grab the current development source and run: perl Makefile.PL make you can then run: perl -Iblib/lib scripts/pod2text foo.pod to run the newly built module on some POD file named foo.pod without having to install it. I think this will keep the open brackets from disappearing. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
Subject: Betr: Re: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Fri, 28 Oct 2016 08:27:17 +0200
To: bug-podlators [...] rt.cpan.org
From: h.huzen [...] belastingdienst.nl
Hello Russ. I tested the update on the IBM z/OS-USS EBCDIC environment. The POD handling is working fine now. No problems anymore. Thank you & kind Regards. Harrie Huzen ------------------------------------------------------------------------ De Belastingdienst stelt e-mail niet open voor aanvragen, aangiften, bezwaarschriften, verzoeken, klachten, ingebrekestellingen en soortgelijke formele berichten. Dit bericht is uitsluitend bestemd voor de geadresseerde. Het bericht kan vertrouwelijke informatie bevatten waarvoor de fiscale geheimhoudingsplicht geldt. Als u dit bericht per abuis hebt ontvangen, wordt u verzocht het te verwijderen en de afzender te informeren. The Dutch Tax and Customs Administration does not accept filings, requests, appeals, complaints, notices of default or similar formal notices, sent by email. This message is solely intended for the addressee. It may contain information that is confidential and legally privileged. If you are not the intended recipient please delete this message and notify the sender.
Subject: Re: Betr: Re: Betr: Re: [rt.cpan.org #118240] In EBCDIC context a '[' gets deleted
Date: Thu, 27 Oct 2016 23:31:27 -0700
To: "h.huzen\ [...] belastingdienst.nl via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"h.huzen@belastingdienst.nl via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> I tested the update on the IBM z/OS-USS EBCDIC environment. The POD > handling is working fine now. No problems anymore.
Awesome, thank you! I'll get a new release out, and that will hopefully make it into Perl proper without too much of a delay. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print