On Sat Apr 27 22:53:52 2019, uri@stemsystems.com wrote:
Show quoted text> On 4/20/19 10:48 AM, Chase Whitener via RT wrote:
> > Queue: File-Slurp
> > Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=84918 >
> >
> > Well, I don't think that it was that. We have to remember that this
> > module was written well before Perl layers and unicode and ... It was
> > doing the right thing for a very long time.
>
> yep. i was aiming just to make it easy and fast to slurp in plain text
> files. in my long perl career, i have yet had to deal directly with
> unicode (much to my happiness! :).
> >
> > So, we're still at somewhat of an impasse. While a large part of me
> > sees it as kind of OK to break current practice in favor of doing the
> > right thing on Windows as well, the other part of me does not agree.
> >
> > Two options:
> >
> > 1) Keep the current functionality and document the bug on Windows.
> > This documentation would need to explain the problem and the
> > reasoning for not fixing it.
> >
> > 2) Break back-compat on Windows and let the Perl layers do the line
> > ending conversions for us on the various user-supplied layers via
> > binmode. This would break some assumptions about how the _module_
> > works on Windows, but would comply with most people's assumptions
> > about what the code _should_ be doing. This would also need a heaping
> > helping of documentation.
> >
> > I don't want to make a BDFL, fist-on-the-table declaration about what
> > to do here. This may be vote time. Currently, there are two
> > maintainers, myself and Uri. I am not sure how Uri feels about this
> > topic as we haven't yet discussed it. My work thus far has been
> > strongly focusing on _NOT_ breaking any backwards compatibility. I
> > would not want to do anything at all here without Uri's and maybe
> > some sort of other vote process in place. It's my opinion that Uri's
> > vote overrides whatever vote I may cast and even that of any type of
> > community vote.
> the problem is that i still don't get all the problems and issues
> here.
> i just coded up line endings and nothing else. i passed along binmode
> as
> it was for real binary files. i hadn't thought that unicode files were
> considered binary or whatever. i am showing my massive ignorance on
> this
> topic.
>
> i still don't see why we can't bypass all the layers and such (this is
> a
> general question, not specifically slurp related). my view on
> encoding/decoding (which i do have experience with) is to do them at
> the
> edge of the system and keep a standard internal format. when reading
> in
> a unicode file, i say treat it raw (no layers), then the user should
> deal with the encoding/decoding then. in fact the terms encode/decode
> make no sense for unicode. it is more a translate or something. the
> module Encode is very poorly named. i don't see why perl has to get
> involved in inferring what the coder really wants. it should be
> explicit
> what to do with the text from the file.
I'm inevitably going to word things poorly in my attempt at explaining a bit, so apologies in advance (
https://perldoc.pl/PerlIO for more official wording than anything I might produce here).
Perl, since Perl 5.8, is unicode-aware. Perl tries to be helpful and solves some decoding and line ending problems with the use of layers. Part of that problem solving is to also do line-ending translation on OSes where that makes sense. Yes, we could leave this up to the user to solve each time individually, but that usually leads to bad copy-pasta code that doesn't do the right thing.
If I'm opening and reading in a UTF-8 encoded text file, I'd typically `open my $fh, '<:encoding(UTF-8)', '/path/file.txt';` and then when I read anything in, it will already be decoded into the internal representation of characters rather than the raw UTF-8 bytes: `my $chars = <$fh>;`.
Alternately, I could `open my $fh, '<:raw', '/path/file.txt';` and then anytime I readline something in, it will be in raw bytes. In order to understand those bytes as text characters, I'd need to `my $bytes = <$fh>; my $chars = Encode::decode('UTF-8', $bytes, Encode::FB_CROAK | Encode::LEAVE_SRC);`. This is more to code and debug and is more work for me.
The first way would automatically handle line ending translation for me where applicable, whereas the second would not; I'd also have to handle that. Prior to Perl 5.8, this was not an issue as Perl was not yet unicode-aware and we didn't tend to worry about what encoding a particular file was in. Nowadays, we are very concerned about what it is that we're receiving. For the most part it's UTF-8, but even that can't be assumed.
Show quoted text> > Uri, what are your opinions? There's a lot going on in this ticket
> > and much to digest. Also, just because I can only think of the two
> > options above doesn't mean that there isn't a third or nth. If you
> > have other options, I'd love to hear them.
> >
> >
> as you say there is a lot going on. one part of me hates redmond and
> says break it! use linux if you want proper support and slurping. the
> other say leave the damned bug, document the hell out of it. windows
> users won't read the docs anyhow and they get screwed either way. if i
> am not making sense to you, i am in agreement with it not making sense
> to me.
>
> so my gut says, document the bug and show some reasonable workaround.
> perl already has issues with windows (ever try to do terminal i/o with
> select on windows?? hell on earth).
I think I've come around to the point where I'd like to suggest we break it so that the module behaves as everyone expects Perl to behave. By default we act just as `open()` does, and if the user wants to add encodings/other layers, they can. And when they do, it behaves the same way across each OS rather than having the caveat of Windows.
Show quoted text>
> hope this helped. i doubt it did. :/
>
> thanx,
>
> uri
I'm fairly convinced that such a change wouldn't be seen as a negative, but rather as a fix to ensure each OS behaves as expected.
I could work up the fix and run through yet another large smoke test session on Windows to see if we've broken anything on CPAN. Thoughts?
Thanks,
Chase