Mark Overmeer wrote via RT:
Show quoted text> Hi Sébastien,
Hello Mark,
Show quoted text> There are two different problems:
> 1 which character-set is used in the log-files.
> 2 protecting the dangerous escapes
>
> ad1)
> Traditionally, logfiles are probably latin1 (I think that is the
> best name for UNIX without any understanding of >ASCII)
> What I found-out in a little research, is that many (free) UNIXes
> have switched to utf-8 for the logfiles.
Maybe recent systems, but older systems didn't. You'll say that
sysadmins of old systems don't upgrade their Perl modules, and I'd
agree, except sometimes, they do. So I can't really assume anything
WRT encoding.
Show quoted text> The least what Sys::Syslog must do, is convert strings from an
> internal
> latin1/fake-utf8 mixture into a consequent charset. Probably by
> default utf8. syslog($level, "%s", encode('UTF-8', $string))
I'd say this is not the responsibility of Sys::Syslog but instead of
the caller. People can't throw random text at Sys::Syslog and expect
it to do the Right Thing by guessing the correct encoding. I can
concur regarding the encoding of the messages sent by Sys::Syslog,
but by default, I'd say it'd be better to do nothing and only
transform to a specific encoding when asked to do so.
Show quoted text> ad2)
> Replace non-printables with their ASCII-table name (see 'man ascii')
> [...]
>
> for instance "\x1B" -> "<esc>"
> For security reasons, this rewrite should be the default.
I fully agree on this. I can make a release with code to do this
quite fast as I think nobody will oppose to such a change.
Show quoted text> I was looking for a module which does this, but couldn't find it
> yet.
Even if there was one, I couldn't use it in Sys::Syslog as it is a
core module, and therefore I can only use module from the core.
Show quoted text> Probably, it should be an additions to the Encode suite.
I'm unsure of this but you should ask Dan Kogai.
Show quoted text> This should not be an intergral part of Sys::Syslog itself.
It must for the reasons previously exposed. And given that
Sys::Syslog is currently compatible with Perl 5.005, and I'd like to
keep this, I can only use core modules from this version of Perl.
Show quoted text> Preferrably implemented in C for performance.
Yes. But I'll probably first code it in Perl to check how it
performs :-)
Show quoted text> The cleanest way IMO is to add options to openlog. Either as 4th
> argument or inside $logopt:
>
> openlog 'myprog', 'pid,charset=latin1,unsafe', 'local0';
> # charset=none or raw to switch back to old behavior?
>
> or maybe "encoding(latin1)" as in PerlIO
Clearly not "charset" as it is semantically incorrect and Juerd will
beat me :-)
Passing the encoding this way is quite ugly, but I guess I don't
really have the choice: passing it as a 4th parameter or inside a
hashref given as 4th parameters will mean two different ways to pass
different options. Confusing.
Show quoted text> Yes, it will break some peoples applications... but now farmost Perl
> programs are unsafe, especially in debug mode where all incoming
> packages are being logged.
I am not allowed to break applications, because that means breaking
things like SpamAssassin. Which means all the BOFH from the Perl
community over my shoulder /o\
Show quoted text> What do you think about it?
I imagine you opened this ticket because to have a better I18N
support in Log::Report.
As I said, I agree WRT replacing non-printable characters, and I
mostly agree WRT the encoding, but for both changes I'll also seek
advice from P5P.
--
Sébastien Aperghis-Tramoni
Close the world, txEn eht nepO.