Skip Menu |

This queue is for tickets about the Mail-Message CPAN distribution.

Report information
The Basics
Id: 125350
Status: resolved
Priority: 0/
Queue: Mail-Message

People
Owner: Nobody in particular
Requestors: gary [...] intrepid.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sat, 19 May 2018 13:31:06 -0700
To: bug-Mail-Message [...] rt.cpan.org
From: Gary Funck <gary [...] intrepid.com>
package Mail::Message::Body VERSION: 3.006 This code, 251 my $dir = shift; 252 my $filename = ''; 253 if(defined $base) 254 { $filename = basename $base; 255 $filename =~ s/[^\w.-]//; 256 } The purpose of the string substitution is unclear. It looks like it may want to remove all non-word, or "." and "-" characters, but is missing a s///g. Instead, it just removes the first match. That said, it isn't clear to me that dispositionFilename() should modify the filename in this way. The documentation doesn't seem to indicate that it does this. In my case, I'd like the original filename, unaltered.
CC: ;
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 09:34:50 +0200
To: Gary Funck via RT <bug-Mail-Message [...] rt.cpan.org>
From: Mark Overmeer <mark [...] nluug.nl>
* Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180519 20:31]: Show quoted text
> Sat May 19 16:31:23 2018: Request 125350 was acted upon. > Transaction: Ticket created by gary@intrepid.com > Queue: Mail-Message > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > package Mail::Message::Body > VERSION: 3.006 > > 255 $filename =~ s/[^\w.-]//; > > The purpose of the string substitution is unclear. It looks like it may > want to remove all non-word, or "." and "-" characters, but is missing a > s///g. Instead, it just removes the first match.
Yes, it should have a 'g' Show quoted text
> That said, it isn't clear to me that dispositionFilename() should modify > the filename in this way. The documentation doesn't seem to indicate that > it does this. In my case, I'd like the original filename, unaltered.
Read https://tools.ietf.org/html/rfc6266#section-4.3 MailBox is usually for automatic processing, so there is no-one to check whether there resulting filename is harmless... I choose to be very careful. I would consider accepting blanks: for($filename) { s/\s+/ /g; s/ $//; s/^ //; s/[^\w .-]//g; } Are there other characters you like to see passed, safe on any OS? -- greetz, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 08:42:30 -0700
To: bug-Mail-Message [...] rt.cpan.org
From: Gary Funck <gary [...] intrepid.com>
On Sun, May 20, 2018 at 12:42 AM, Mark Overmeer via RT < bug-Mail-Message@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > * Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180519 20:31]:
> > Sat May 19 16:31:23 2018: Request 125350 was acted upon. > > > > 255 $filename =~ s/[^\w.-]//; > > > > The purpose of the string substitution is unclear. It looks like it may > > want to remove all non-word, or "." and "-" characters, but is missing a > > s///g. Instead, it just removes the first match.
> > Yes, it should have a 'g' >
> > That said, it isn't clear to me that dispositionFilename() should modify > > the filename in this way. The documentation doesn't seem to indicate
> that
> > it does this. In my case, I'd like the original filename, unaltered.
> > Read https://tools.ietf.org/html/rfc6266#section-4.3 > > MailBox is usually for automatic processing, so there is no-one to > check whether there resulting filename is harmless... I choose to be > very careful. > > I would consider accepting blanks: > > for($filename) { > s/\s+/ /g; s/ $//; s/^ //; > s/[^\w .-]//g; > } > > Are there other characters you like to see passed, safe on any OS? >
Greetz, I hope the last decade/two since we last corresponded have treated you well. :) Thanks for the RFC reference. I see how/why caution is advised when writing the attachment. However, in my case, [- .] are innocuous and I'd like them back; I want the raw filename, but wouldn't mind some documentation that tells me how/why I should be careful, or even having the filtering as default. For reference, here is the current documentation. $obj->dispositionFilename( [$directory] ) Returns the name which can be used as filename to store the information in the indicated $directory. To get a filename, various fields are searched for "filename" and "name" attributes. Without $directory, the name found will be returned. Only the basename of the found name will be used, for security reasons: otherwise, it may be possible to access other directories than the one indicated. If no name was found, or the name is already in use, then an unique name is generated. Can we have dispositionFilenameRaw(), or dispositionFilename(raw => 1), or a cleanup function callout? Barring that - break out the cleanup into its own sub that can be overridden by the user (via the symbol table)? Regarding your suggested change: for($filename) { s/\s+/ /g; s/ $//; s/^ //; s/[^\w .-]//g; } s/\s+/ /g; OK s/ $//; maybe s/ +$//; instead s/^ //; maybe s/^ +//; instead Regarding s/[^\w .-]//g; Won't that " " (space) in there undo the previous rewrites into space characters? If that is what you're after, then this will (I think) be an equivalent pattern, covering all of them. s/[\W\s.-]+//g; BTW, what prevents the current code and the suggested new code from wiping out the final "." which delimits the file's extension? I don't recall the current implementation doing that (I have not noticed issues with extensions being glommed into the filename), but don't see why it doesn't do that? Reading the current implementation, it looks like I might already have my 'raw' filename capability. It seems that this filtering only kicks in *if* the (optional) directory argument is supplied. At the moment, I supply that parameter, but it would be easy enough for me to prepend it upon return. If that capability can be left in (and documented) that would meet my needs. Regarding the RFC, it seems that filtering extensions would be in keeping with the theme? For example, https://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/ has a list of dangerous extensions (for Windows/DOS users).

Message body is not shown because it is too large.

CC: ;
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 20:00:42 +0200
To: Gary Funck via RT <bug-Mail-Message [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180520 15:43]: Show quoted text
> Queue: Mail-Message > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > Greetz, I hope the last decade/two since we last corresponded have treated > you well. :)
In all dimensions ;-) Show quoted text
> Can we have dispositionFilenameRaw(), or dispositionFilename(raw => 1), or > a cleanup function callout? Barring that - break out the cleanup into its > own sub that can be overridden by the user (via the symbol table)?
When you do not provide a directory, you get the unprocessed name back. Show quoted text
> Regarding your suggested change: > > for($filename) { > s/\s+/ /g; s/ $//; s/^ //; > s/[^\w .-]//g; > } > > s/\s+/ /g; OK > s/ $//; maybe s/ +$//; instead > s/^ //; maybe s/^ +//; instead
After the first change, there can only be one blank at the beginning and the end. Show quoted text
> Regarding s/[^\w .-]//g; > > Won't that " " (space) in there undo the previous rewrites into space > characters? If that is what you're after, then this will (I think) be an > equivalent pattern, covering all of them. > > s/[\W\s.-]+//g;
This is very different. My expression says: do *not* throw away word-chars, nor blanks, nor dots, nor dashes. That's what you want. Show quoted text
> BTW, what prevents the current code and the suggested new code from wiping > out the final "." which delimits the file's extension?
It does not removee dots. Show quoted text
> Reading the current implementation, it looks like I might already have my > 'raw' filename capability. [...] > If that capability can be left in (and documented) that would meet > my needs.
The docs are unclear, I agree: Without $directory, the name found will be returned. I will restructure the docs: Various fields are searched for C<filename> and C<name> attributes. Without $directory, the name found will be returned unmodified. When a $directory is given, a filename is composed. For security reasons, only the basename of the found name gets used and many potentially dangerous characters removed. If no name was found, or when the found name is already in use, then an unique name is generated. Don't forget to read RFC6266 section 4.3 for the security aspects in your email application. Show quoted text
> Regarding the RFC, it seems that filtering extensions would be in keeping > with the theme? For example, > https://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/ > has a list of dangerous extensions (for Windows/DOS users).
I think that handling the contents of the attachment is out-of-scope for MailBox. Let's hope the author of the client know what he is doing ;-) -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 11:32:38 -0700
To: bug-Mail-Message [...] rt.cpan.org
From: Gary Funck <gary [...] intrepid.com>
On Sun, May 20, 2018 at 11:20 AM, Mark Overmeer via RT < bug-Mail-Message@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > * Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180520 15:43]: > >
> > Can we have dispositionFilenameRaw(), or dispositionFilename(raw => 1),
> or
> > a cleanup function callout? Barring that - break out the cleanup into
> its
> > own sub that can be overridden by the user (via the symbol table)?
> > When you do not provide a directory, you get the unprocessed name back. >
> > Regarding your suggested change: > > > > for($filename) { > > s/\s+/ /g; s/ $//; s/^ //; > > s/[^\w .-]//g; > > } > > > > s/\s+/ /g; OK > > s/ $//; maybe s/ +$//; instead > > s/^ //; maybe s/^ +//; instead
> > After the first change, there can only be one blank at the beginning and > the end. >
> > Regarding s/[^\w .-]//g; > > > > Won't that " " (space) in there undo the previous rewrites into space > > characters? If that is what you're after, then this will (I think) be an > > equivalent pattern, covering all of them. > > > > s/[\W\s.-]+//g;
> > This is very different. My expression says: do *not* throw away > word-chars, > nor blanks, nor dots, nor dashes. That's what you want. >
Yeah, I didn't read that carefully. Obviously (now), the initial ^ applied to all the characters in that set. A small comment might help there, for those of us not paying attention in class. Show quoted text
>
> > BTW, what prevents the current code and the suggested new code from
> wiping
> > out the final "." which delimits the file's extension?
> > It does not remove dots. >
Got it now, thanks. Show quoted text
>
> > Reading the current implementation, it looks like I might already have my > > 'raw' filename capability. [...] > > If that capability can be left in (and documented) that would meet > > my needs.
> > The docs are unclear, I agree: > > Without $directory, the name found will be returned. > > I will restructure the docs: > > Various fields are searched for C<filename> and C<name> attributes. > Without > $directory, the name found will be returned unmodified. > > When a $directory is given, a filename is composed. For security > reasons, > only the basename of the found name gets used and many potentially > dangerous characters removed. If no name was found, or when the found > name is already in use, then an unique name is generated. > > Don't forget to read RFC6266 section 4.3 for the security aspects in > your > email application. >
Sounds good, with a hyperlink to the RFC, perhaps. Show quoted text
>
> > Regarding the RFC, it seems that filtering extensions would be in keeping > > with the theme? For example, > > https://www.howtogeek.com/137270/50-file-extensions-
> that-are-potentially-dangerous-on-windows/
> > has a list of dangerous extensions (for Windows/DOS users).
> > I think that handling the contents of the attachment is out-of-scope for > MailBox. Let's hope the author of the client know what he is doing ;-) >
Yes, let's hope. :)
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 12:25:01 -0700
To: bug-Mail-Message [...] rt.cpan.org
From: Gary Funck <gary [...] intrepid.com>
On Sun, May 20, 2018 at 11:20 AM, Mark Overmeer via RT < bug-Mail-Message@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > * Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180520 15:43]:
> > Queue: Mail-Message > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 >
>
> > Can we have dispositionFilenameRaw(), or dispositionFilename(raw => 1),
> or
> > a cleanup function callout? Barring that - break out the cleanup into
> its
> > own sub that can be overridden by the user (via the symbol table)?
> > When you do not provide a directory, you get the unprocessed name back. > >
Yes, except, we also don't get a name generated for us if there isn't one found in the attachment's disposition description. This code is not executed. unless(length $filename) { my $ext = ($self->mimeType->extensions)[0] || 'raw'; my $unique; for($unique = 'part-0'; 1; $unique++) { my $out = File::Spec->catfile($dir, "$unique.$ext"); open IN, '<', $out or last; # does not exist: can use it close IN; } $filename = "$unique.$ext"; } I guess that's OK, just some logic that has to be moved out to the client. BTW, that "$unique++" above appears to be problematic when applied to 'part-0'?
CC: ;
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Mon, 21 May 2018 00:40:38 +0200
To: Gary Funck via RT <bug-Mail-Message [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180520 19:25]: Show quoted text
> Queue: Mail-Message > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > Yes, except, we also don't get a name generated for us if there isn't one > found in the attachment's disposition description.
True, either you want the pure value or the intelligence ;-) Show quoted text
> for($unique = 'part-0'; 1; $unique++) > BTW, that "$unique++" above appears to be problematic when applied to > 'part-0'?
Ai, the '-' makes it a non-bareword. Compare: perl -we '$a = "part-0"; print ++$a, "\n"' perl -we '$a = "part0"; print ++$a, "\n"' I'll upload the changes to Github -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #125350] dispositionFilename() gratuitously removes first non-word character in returned file name
Date: Sun, 20 May 2018 15:50:06 -0700
To: bug-Mail-Message [...] rt.cpan.org
From: Gary Funck <gary [...] intrepid.com>
On Sun, May 20, 2018 at 3:40 PM, Mark Overmeer via RT < bug-Mail-Message@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > * Gary Funck via RT (bug-Mail-Message@rt.cpan.org) [180520 19:25]:
> > Queue: Mail-Message > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=125350 > > > > > Yes, except, we also don't get a name generated for us if there isn't one > > found in the attachment's disposition description.
> > True, either you want the pure value or the intelligence ;-) >
I'm slowly coming around to your new suggested fix. :) BTW, if gilding the lily, the code could check the OS variable and use safe characters for Windows (or cygwin?) or Linux. If neither of those, then revert to your current approach. Show quoted text
>
> > for($unique = 'part-0'; 1; $unique++) > > BTW, that "$unique++" above appears to be problematic when applied to > > 'part-0'?
> > Ai, the '-' makes it a non-bareword. Compare: > perl -we '$a = "part-0"; print ++$a, "\n"' > perl -we '$a = "part0"; print ++$a, "\n"' > > >
Yup. I didn't analyze it. Just tried it out like your first version ('part-0') above and it burped.
Released fixes in 3.007