Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 67065
Status: open
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: leonerd-cpan [...] leonerd.org.uk
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.35
Fixed in: (no value)



Subject: FB_QUIET is useless for error detection while nonblocking
Setting CHECK to FB_DEFAULT, FB_CROAK or FB_WARN give you error-handling semantics of expected- complete UTF-8 bytestrings. Setting CHECK to FB_RETURN makes it immediately return on any failure. None of these distinguish failure because an invalid byte was found, from failure due to running out of bytes while processing a hitherto-valid encoding. This makes it impossible to actually apply error-handling semantics to nonblocking or fixed-size buffer reading, as your example otherwise would suggest. I would recommend adding a new control bit, RETURN_ON_EOS, which has the behaviour of returning, rather than signalling an error, but *only* if the failure was running out of bytes. This could be applied in addition to the other control modes. For example, the following would croak on actually-invalid input, but still correctly handle partial UTF-8 encodings split across multiple reads: my $buffer = ''; my $string = ''; while(read $fh, $buffer, 256, length($buffer)){ $string .= decode($encoding, $buffer, Encode::FB_CROAK|Encode::RETURN_ON_EOS); # $buffer now contains the unprocessed partial character } -- Paul Evans
On Tue Mar 29 15:21:28 2011, PEVANS wrote: Show quoted text
> Setting CHECK to FB_RETURN makes it immediately return on any failure.
Oops. I of course mean FB_QUIET here. -- Paul Evans
Subject: Please document RETURN_ON_EOS as a full API feature
On Tue Mar 29 15:21:28 2011, PEVANS wrote: Show quoted text
> I would recommend adding a new control bit, RETURN_ON_EOS, which has > the behaviour of returning, > rather than signalling an error, but *only* if the failure was running > out of bytes. This could be > applied in addition to the other control modes.
Actually, I have just been informed this does exist: 14:47 <+chansen> LeoNerd: take a look at the undocumented Encode::STOP_AT_PARTIAL flag =) Having tested it out, it seems to work as expected. I'm now given to wondering why it isn't documented. I'm going to have a look at the implementation, and if it looks like it ought to be, I'll send a documentation patch so this can be relied upon. I'd like to use it in IO::Async::Stream to support applying encodings to a bytestream - I don't want to rely on undocumented features as they could change at any point. I've therefore changed the bug subject. -- Paul Evans
I see you've marked this as resolved, but looking at the latest version of Encode on CPAN http://search.cpan.org/~dankogai/Encode-2.43/Encode.pm I'm afraid I still don't see this flag documented at all. I'm not asking a great deal; just a line or two of documentation to make this feature officially documented, so it can be relied upon by other code.. Or else a reason why it should not be documented and instead remain internal and private. I'm even happy to write that documentation for you, if you'd prefer.. Then you can merge it. Right now I've had to release some code that specifically documents a feature as relying on undocumented behaviour from Encode, because this is currently not documented. See the "IMPORTANT NOTE" in the "encoding" parameter here: http://search.cpan.org/~pevans/IO-Async-0.41/lib/IO/Async/Stream.pm#PARAMETERS -- Paul Evans