Skip Menu |

This queue is for tickets about the Sys-Syslog CPAN distribution.

Report information
The Basics
Id: 41174
Status: open
Priority: 0/
Queue: Sys-Syslog

People
Owner: Nobody in particular
Requestors: MARKOV [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.27
Fixed in: (no value)



Subject: escape codes
Syslog is sloppy with character-sets. Before logging a message, it should get translated into latin1 or real utf-8 explicitly. Besides, escape codes must get filtered, as the following script will demonstrate #!/usr/bin/perl use Sys::Syslog; openlog 'test', '', 'local0'; syslog err => "Test\a\033[2J\033[2;5m\033[1;31mHACKER~ ATTACK\033[2;25m\033[22;30m\033[3q"; (At least under Linux this has nice effects in de log-file) So: we need an option to explicitly specify in which character-set to write the syslog. Before translation, all non-printable need to be replaced by something else. Log-files are usually read by root, and it is simple to create an escape sequence which insert characters in the terminal input stream; be executed with super-user rights.
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Tue, 25 Nov 2008 09:11:18 +0100
To: bug-Sys-Syslog [...] rt.cpan.org
From: Sébastien Aperghis-Tramoni <saper [...] cpan.org>
Hello Mark, Mark Overmeer wrote: Show quoted text
> Syslog is sloppy with character-sets. Before logging a message, it > should get translated into latin1 or real utf-8 explicitly. Besides, > escape codes must get filtered, as the following script will > demonstrate > > #!/usr/bin/perl > use Sys::Syslog; > > openlog 'test', '', 'local0'; > syslog err => "Test\a\033[2J\033[2;5m\033[1;31mHACKER~ > ATTACK\033[2;25m\033[22;30m\033[3q"; > > (At least under Linux this has nice effects in de log-file)
I guess this will set the terminal title to "HACKER~ATTACK". Or is it doing something more evil? Anyway, your point is valid. Show quoted text
> So: we need an option to explicitly specify in which character-set > to write the syslog. Before translation, all non-printable need to > be replaced by something else.
Agreed. However there is the same problem as with any module that must process random text: how can we know the encoding of the incoming text? Unconditionally transforming everything to Latin-1 or UTF-8 will surely produce junk. Possible solutions: - transliterate each character outside ASCII into an ASCII equivalent with Text::Unidecode if available - transform each character outside ASCII into its hexadecimal value ("é" -> \x{e9} - expect parameters so the user provide the encoding of the message and the encoding of how Sys::Syslog should send the text This is possible but will make the code more complex, and the current API isn't really extensible. Also having different behaviours depending on the installed modules may not please administrators, even if it's documented. Show quoted text
> Log-files are usually read by root, and it is simple to create an > escape sequence which insert characters in the terminal input stream; > be executed with super-user rights.
I see two solutions: either strip the escape sequences (which means being able to recognise them), or protect the backslashes so the administrator has a chance to safely see the attack. I would favour the later because it's simpler and faster to implement, and preserve more information. -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Tue, 25 Nov 2008 09:38:50 +0100
To: Sébastien Aperghis-Tramoni via RT <bug-Sys-Syslog [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
Hi Sébastien, * Sébastien Aperghis-Tramoni via RT (bug-Sys-Syslog@rt.cpan.org) [081125 08:11]: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=41174 >
> > Syslog is sloppy with character-sets. > > > > openlog 'test', '', 'local0';
> > Agreed. However there is the same problem as with any module that > must process random text: how can we know the encoding of the > incoming text? Unconditionally transforming everything to Latin-1 or > UTF-8 will surely produce junk.
There are two different problems: 1 which character-set is used in the log-files. 2 protecting the dangerous escapes ad1) Traditionally, logfiles are probably latin1 (I think that is the best name for UNIX without any understanding of >ASCII) What I found-out in a little research, is that many (free) UNIXes have switched to utf-8 for the logfiles. The least what Sys::Syslog must do, is convert strings from an internal latin1/fake-utf8 mixture into a consequent charset. Probably by default utf8. syslog($level, "%s", encode('UTF-8', $string)) ad2) Replace non-printables with their ASCII-table name (see 'man ascii') 000 0 00 NUL '\0' 001 1 01 SOH 002 2 02 STX 003 3 03 ETX 004 4 04 EOT 005 5 05 ENQ 006 6 06 ACK 007 7 07 BEL '\a' 010 8 08 BS '\b' 011 9 09 HT '\t' 012 10 0A LF '\n' 013 11 0B VT '\v' 014 12 0C FF '\f' 015 13 0D CR '\r' 016 14 0E SO 017 15 0F SI 020 16 10 DLE 021 17 11 DC1 022 18 12 DC2 023 19 13 DC3 024 20 14 DC4 025 21 15 NAK 026 22 16 SYN 027 23 17 ETB 030 24 18 CAN 031 25 19 EM 032 26 1A SUB 033 27 1B ESC 034 28 1C FS 134 92 5C \ '\\' 035 29 1D GS 036 30 1E RS 037 31 1F US 177 127 7F DEL for instance "\x1B" -> "<esc>" For security reasons, this rewrite should be the default. I was looking for a module which does this, but couldn't find it yet. Probably, it should be an additions to the Encode suite. This should not be an intergral part of Sys::Syslog itself. Preferrably implemented in C for performance. The cleanest way IMO is to add options to openlog. Either as 4th argument or inside $logopt: openlog 'myprog', 'pid,charset=latin1,unsafe', 'local0'; # charset=none or raw to switch back to old behavior? Yes, it will break some peoples applications... but now farmost Perl programs are unsafe, especially in debug mode where all incoming packages are being logged. What do you think about it? -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Tue, 25 Nov 2008 09:41:32 +0100
To: Sébastien Aperghis-Tramoni via RT <bug-Sys-Syslog [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Mark Overmeer (mark@overmeer.net) [081125 09:38]: Show quoted text
> > openlog 'myprog', 'pid,charset=latin1,unsafe', 'local0'; > # charset=none or raw to switch back to old behavior?
or maybe "encoding(latin1)" as in PerlIO -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 01:48:15 +0100
To: bug-Sys-Syslog [...] rt.cpan.org
From: Sébastien Aperghis-Tramoni <saper [...] cpan.org>
Mark Overmeer wrote via RT: Show quoted text
> Hi Sébastien,
Hello Mark, Show quoted text
> There are two different problems: > 1 which character-set is used in the log-files. > 2 protecting the dangerous escapes > > ad1) > Traditionally, logfiles are probably latin1 (I think that is the > best name for UNIX without any understanding of >ASCII) > What I found-out in a little research, is that many (free) UNIXes > have switched to utf-8 for the logfiles.
Maybe recent systems, but older systems didn't. You'll say that sysadmins of old systems don't upgrade their Perl modules, and I'd agree, except sometimes, they do. So I can't really assume anything WRT encoding. Show quoted text
> The least what Sys::Syslog must do, is convert strings from an > internal > latin1/fake-utf8 mixture into a consequent charset. Probably by > default utf8. syslog($level, "%s", encode('UTF-8', $string))
I'd say this is not the responsibility of Sys::Syslog but instead of the caller. People can't throw random text at Sys::Syslog and expect it to do the Right Thing by guessing the correct encoding. I can concur regarding the encoding of the messages sent by Sys::Syslog, but by default, I'd say it'd be better to do nothing and only transform to a specific encoding when asked to do so. Show quoted text
> ad2) > Replace non-printables with their ASCII-table name (see 'man ascii') > [...] > > for instance "\x1B" -> "<esc>" > For security reasons, this rewrite should be the default.
I fully agree on this. I can make a release with code to do this quite fast as I think nobody will oppose to such a change. Show quoted text
> I was looking for a module which does this, but couldn't find it > yet.
Even if there was one, I couldn't use it in Sys::Syslog as it is a core module, and therefore I can only use module from the core. Show quoted text
> Probably, it should be an additions to the Encode suite.
I'm unsure of this but you should ask Dan Kogai. Show quoted text
> This should not be an intergral part of Sys::Syslog itself.
It must for the reasons previously exposed. And given that Sys::Syslog is currently compatible with Perl 5.005, and I'd like to keep this, I can only use core modules from this version of Perl. Show quoted text
> Preferrably implemented in C for performance.
Yes. But I'll probably first code it in Perl to check how it performs :-) Show quoted text
> The cleanest way IMO is to add options to openlog. Either as 4th > argument or inside $logopt: > > openlog 'myprog', 'pid,charset=latin1,unsafe', 'local0'; > # charset=none or raw to switch back to old behavior? > > or maybe "encoding(latin1)" as in PerlIO
Clearly not "charset" as it is semantically incorrect and Juerd will beat me :-) Passing the encoding this way is quite ugly, but I guess I don't really have the choice: passing it as a 4th parameter or inside a hashref given as 4th parameters will mean two different ways to pass different options. Confusing. Show quoted text
> Yes, it will break some peoples applications... but now farmost Perl > programs are unsafe, especially in debug mode where all incoming > packages are being logged.
I am not allowed to break applications, because that means breaking things like SpamAssassin. Which means all the BOFH from the Perl community over my shoulder /o\ Show quoted text
> What do you think about it?
I imagine you opened this ticket because to have a better I18N support in Log::Report. As I said, I agree WRT replacing non-printable characters, and I mostly agree WRT the encoding, but for both changes I'll also seek advice from P5P. -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 09:25:13 +0100
To: Sébastien Aperghis-Tramoni via RT <bug-Sys-Syslog [...] rt.cpan.org>
From: NLnet webmaster <webmaster [...] nlnet.nl>
* Sébastien Aperghis-Tramoni via RT (bug-Sys-Syslog@rt.cpan.org) [081126 00:48]: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=41174 >
> > ad1) > > Traditionally, logfiles are probably latin1 (I think that is the > > best name for UNIX without any understanding of >ASCII) > > What I found-out in a little research, is that many (free) UNIXes > > have switched to utf-8 for the logfiles.
> > Maybe recent systems, but older systems didn't. You'll say that > sysadmins of old systems don't upgrade their Perl modules, and I'd > agree, except sometimes, they do. So I can't really assume anything > WRT encoding.
Well, luckily ASCII and UTF-8 overlap. Character-encodings on UNIX-systemwide level is a mess. Windows made a good decission to move to (UTF16) everywhere. Show quoted text
> > The least what Sys::Syslog must do, is convert strings from an > > internal > > latin1/fake-utf8 mixture into a consequent charset. Probably by > > default utf8. syslog($level, "%s", encode('UTF-8', $string))
> > I'd say this is not the responsibility of Sys::Syslog but instead of > the caller. People can't throw random text at Sys::Syslog and expect > it to do the Right Thing by guessing the correct encoding. I can > concur regarding the encoding of the messages sent by Sys::Syslog, > but by default, I'd say it'd be better to do nothing and only > transform to a specific encoding when asked to do so.
I do not know how to interpret this: do you suggest to make "default is raw" or not have the Sys::Syslog::syslog() encode at all? Show quoted text
> > ad2) > > Replace non-printables with their ASCII-table name (see 'man ascii') > > [...] > > > > for instance "\x1B" -> "<esc>" > > For security reasons, this rewrite should be the default.
> > I fully agree on this. I can make a release with code to do this > quite fast as I think nobody will oppose to such a change.
That would be nice! Show quoted text
> > I was looking for a module which does this, but couldn't find it > > yet. Probably, it should be an additions to the Encode suite.
> Even if there was one, I couldn't use it in Sys::Syslog as it is a > core module, and therefore I can only use module from the core. > > I'm unsure of this but you should ask Dan Kogai.
Encode is also in Core ;-) I have asked him via RT. Waiting for a response. Show quoted text
> > or maybe "encoding(latin1)" as in PerlIO
> Clearly not "charset" as it is semantically incorrect and Juerd will > beat me :-)
"encoding" is a very bad name as well: too general. But used everywhere in Perl, so... openlog() following the binmode syntax of open() as far as possible cannot be a bad choice. Show quoted text
> I am not allowed to break applications, because that means breaking > things like SpamAssassin. Which means all the BOFH from the Perl > community over my shoulder /o\
I know... although not core modules, I have modules which are often used as well. On the other hand, Schwern is breaking the interface of Test::More quite often :((( Show quoted text
> I imagine you opened this ticket because to have a better I18N > support in Log::Report. > As I said, I agree WRT replacing non-printable characters, and I > mostly agree WRT the encoding, but for both changes I'll also seek > advice from P5P.
Ok. It will follow soon as two seperate items. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 12:50:56 +0100
To: Sébastien Aperghis-Tramoni via RT <bug-Sys-Syslog [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Sébastien Aperghis-Tramoni via RT (bug-Sys-Syslog@rt.cpan.org) [081126 00:48]: Show quoted text
> I imagine you opened this ticket because to have a better I18N > support in Log::Report.
Well, Log::Report is very careful with encodings, and I would like all back-ends to support it as well. I was investigating it, and then realized how dangerous my own programs are wrt syslog... Show quoted text
> As I said, I agree WRT replacing non-printable characters, and I > mostly agree WRT the encoding, but for both changes I'll also seek > advice from P5P.
I have created to seperate tickets for this. I could publish these like that on P5P if you wish. Or will they get published automatically? -- MarkOv
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 16:30:50 +0100
To: bug-Sys-Syslog [...] rt.cpan.org
From: Sébastien Aperghis-Tramoni <saper [...] cpan.org>
Mark Overmeer wrote via RT: Show quoted text
> Sebastien Aperghis-Tramoni wrote via RT: >
> > > The least what Sys::Syslog must do, is convert strings from an > > > internal > > > latin1/fake-utf8 mixture into a consequent charset. Probably by > > > default utf8. syslog($level, "%s", encode('UTF-8', $string))
> > > > I'd say this is not the responsibility of Sys::Syslog but instead of > > the caller. People can't throw random text at Sys::Syslog and expect > > it to do the Right Thing by guessing the correct encoding. I can > > concur regarding the encoding of the messages sent by Sys::Syslog, > > but by default, I'd say it'd be better to do nothing and only > > transform to a specific encoding when asked to do so.
> > I do not know how to interpret this: do you suggest to make "default > is raw" or not have the Sys::Syslog::syslog() encode at all?
In a sense, yes. I want to avoid causing the problem that libnet did in version 1.20, when Graham suddenly decided to utf8::encode() everything passing through Net::Cmd. Suddenly, all the mails generated by my programs were double-encoded. » http://rt.cpan.org/Public/Bug/Display.html?id=24835 Therefore I prefer to keep the current behaviour as the default, and be smart and encode when asked to do so. Show quoted text
> > > I was looking for a module which does this, but couldn't find it > > > yet. Probably, it should be an additions to the Encode suite.
> > Even if there was one, I couldn't use it in Sys::Syslog as it is a > > core module, and therefore I can only use module from the core. > > > > I'm unsure of this but you should ask Dan Kogai.
> > Encode is also in Core ;-) I have asked him via RT. Waiting for a > response.
Encode is core since 5.8. Sys::Syslog is core since before 5.000 and the current CPAN version is compatible with Perl 5.005 » http://bbbike.radzeit.de/~slaven/cpantestersmatrix.cgi?dist=Sys-Syslog So Sys::Syslog can use Encode when it is available, but will have to work without when it isn't. Show quoted text
> > > or maybe "encoding(latin1)" as in PerlIO
> > Clearly not "charset" as it is semantically incorrect and Juerd will > > beat me :-)
> > "encoding" is a very bad name as well: too general. But used > everywhere in Perl, so... openlog() following the binmode syntax > of open() as far as possible cannot be a bad choice.
Agreed. Even if it's not the best term, let's be consistent :) Show quoted text
> > I am not allowed to break applications, because that means breaking > > things like SpamAssassin. Which means all the BOFH from the Perl > > community over my shoulder /o\
> > I know... although not core modules, I have modules which are often > used as well. On the other hand, Schwern is breaking the interface > of Test::More quite often :(((
To be honest I was only bitten once or twice by Schwern breaking Test::More. Also, if it's an important module for the developpers, it is not as important (for the user or the sysadmin) as modules that are used during the true execution of programs, like libnet or Sys::Syslog. -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 16:42:05 +0100
To: Sébastien Aperghis-Tramoni via RT <bug-Sys-Syslog [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Sébastien Aperghis-Tramoni via RT (bug-Sys-Syslog@rt.cpan.org) [081126 15:31]: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=41174 >
> > I do not know how to interpret this: do you suggest to make "default > > is raw" or not have the Sys::Syslog::syslog() encode at all?
> > In a sense, yes. I want to avoid causing the problem that libnet did in > version 1.20, when Graham suddenly decided to utf8::encode() everything > passing through Net::Cmd. Suddenly, all the mails generated by my programs > were double-encoded. > » http://rt.cpan.org/Public/Bug/Display.html?id=24835
Net::Cmd is a much to general-purpose module to do character encodings. Sys::Syslog is end-user material. Show quoted text
> Therefore I prefer to keep the current behaviour as the default, and be > smart and encode when asked to do so.
Then you suggest that there are people who use syslog correctly w.r.t character recoding? Did you ever get complaints on the subject? Then probably no-one does realize it yet. Show quoted text
> > Encode is also in Core ;-) I have asked him via RT. Waiting for a > > response.
> > Encode is core since 5.8. Sys::Syslog is core since before 5.000 and the > current CPAN version is compatible with Perl 5.005 > » http://bbbike.radzeit.de/~slaven/cpantestersmatrix.cgi?dist=Sys-Syslog > > So Sys::Syslog can use Encode when it is available, but will have to work > without when it isn't.
In older versions, Encode doesn't work at all... it is easy to avoid using it. BEGIN { if( $[ >= 5.008 ) { require Encode; Encode->import('encode'); } else { *encode = sub { shift }; } } -- MarkOv ------------------------------------------------------------------------ drs Mark A.C.J. Overmeer MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 18:04:46 +0100
To: bug-Sys-Syslog [...] rt.cpan.org
From: Sébastien Aperghis-Tramoni <maddingue [...] free.fr>
Mark Overmeer wrote via RT: Show quoted text
> Sebastien Aperghis-Tramoni wrote via RT: >
> > > I do not know how to interpret this: do you suggest to make "default > > > is raw" or not have the Sys::Syslog::syslog() encode at all?
> > > > In a sense, yes. I want to avoid causing the problem that libnet did in > > version 1.20, when Graham suddenly decided to utf8::encode() everything > > passing through Net::Cmd. Suddenly, all the mails generated by my programs > > were double-encoded. > > » http://rt.cpan.org/Public/Bug/Display.html?id=24835
> > Net::Cmd is a much to general-purpose module to do character encodings. > Sys::Syslog is end-user material.
I agree, Net::Cmd isn't for the end-user, but it is only one layer below end-user modules (Net::SMTP, Net::FTP). Show quoted text
> > Therefore I prefer to keep the current behaviour as the default, and be > > smart and encode when asked to do so.
> > Then you suggest that there are people who use syslog correctly w.r.t > character recoding? Did you ever get complaints on the subject? Then > probably no-one does realize it yet.
Most tickets can be summed up with "it doesn't work". No one complained about an encoding problem. I can't tell whether it just works or if nobody realised there is a problem. -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.
Subject: Re: [rt.cpan.org #41174] escape codes
Date: Wed, 26 Nov 2008 18:05:47 +0100
To: bug-Sys-Syslog [...] rt.cpan.org
From: Sébastien Aperghis-Tramoni <saper [...] cpan.org>
Mark Overmeer wrote via RT: Show quoted text
> Sebastien Aperghis-Tramoni wrote via RT: >
> > > I do not know how to interpret this: do you suggest to make "default > > > is raw" or not have the Sys::Syslog::syslog() encode at all?
> > > > In a sense, yes. I want to avoid causing the problem that libnet did in > > version 1.20, when Graham suddenly decided to utf8::encode() everything > > passing through Net::Cmd. Suddenly, all the mails generated by my programs > > were double-encoded. > > » http://rt.cpan.org/Public/Bug/Display.html?id=24835
> > Net::Cmd is a much to general-purpose module to do character encodings. > Sys::Syslog is end-user material.
I agree, Net::Cmd isn't for the end-user, but it is only one layer below end-user modules (Net::SMTP, Net::FTP). Show quoted text
> > Therefore I prefer to keep the current behaviour as the default, and be > > smart and encode when asked to do so.
> > Then you suggest that there are people who use syslog correctly w.r.t > character recoding? Did you ever get complaints on the subject? Then > probably no-one does realize it yet.
Most tickets can be summed up with "it doesn't work". No one complained about an encoding problem. I can't tell whether it just works or if nobody realised there is a problem. -- Sébastien Aperghis-Tramoni Close the world, txEn eht nepO.