On Sun Aug 02 18:02:33 2015, RSCHUPP wrote:
Show quoted text> On 2015-08-02 00:48:42, SLAFFAN wrote:
> > Instrumenting _glob_in_inc to print to stdout whenever unico[rd]e is
> > passed as the subdir argument has no effect, so I assume the utf8.pm
> > preload sub is not being run for the above preload rules.
>
> Thanks for investigating. I tried to figure out at what point
> utf8_heavy.pl
> comes into play. For that I prepended this to your sample script
>
> BEGIN
> {
> # insert spy CODE into require's module lookup
> unshift @INC, sub
> {
> my ($self, $pm) = @_;
> print STDERR "# require $pm\n";
> ($package, $filename, $line) = caller;
> print STDERR "# from $package ($filename:$line)\n";
> return; # i.e. take a pass
> };
> }
>
> This intercepts any (explicit or implicit) "require", prints out what
> is required
> and from where and then resumes "normal" processing. Here's the output
>
> # require PDL.pm
> # from main (/home/roderich/todo/PAR/Module-ScanDeps/shawn.pl:15)
> # require PDL/Core.pm
> # from main ((eval 1):6)
> # require PDL/Types.pm
> # from PDL::Core (/usr/lib/x86_64-linux-
> gnu/perl5/5.22/PDL/Core.pm:223)
> # require Carp.pm
> # from PDL::Types (/usr/lib/x86_64-linux-
> gnu/perl5/5.22/PDL/Types.pm:6)
> # require strict.pm
> # from Carp (/usr/share/perl/5.22/Carp.pm:4)
> # require warnings.pm
> # from Carp (/usr/share/perl/5.22/Carp.pm:5)
> # require Exporter.pm
> # from Carp (/usr/share/perl/5.22/Carp.pm:99)
> # require overload.pm
> # from PDL::Type (/usr/lib/x86_64-linux-
> gnu/perl5/5.22/PDL/Types.pm:428)
> # require overloading.pm
> # from overload (/usr/share/perl/5.22/overload.pm:83)
> # require warnings/register.pm
> # from overload (/usr/share/perl/5.22/overload.pm:144)
> # require Exporter/Heavy.pm
> # from Exporter (/usr/share/perl/5.22/Exporter.pm:16)
> # require PDL/Exporter.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):314)
> # require DynaLoader.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):315)
> # require Config.pm
> # from DynaLoader (/usr/lib/x86_64-linux-
> gnu/perl/5.22/DynaLoader.pm:21)
> # require vars.pm
> # from Config (/usr/lib/x86_64-linux-gnu/perl/5.22/Config.pm:11)
> # require Scalar/Util.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1000)
> # require List/Util.pm
> # from Scalar::Util (/usr/lib/x86_64-linux-
> gnu/perl/5.22/Scalar/Util.pm:11)
> # require XSLoader.pm
> # from List::Util (/usr/lib/x86_64-linux-
> gnu/perl/5.22/List/Util.pm:21)
> # require utf8.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1028)
> # require utf8_heavy.pl
> # from utf8 (/usr/share/perl/5.22/utf8.pm:16)
> # require re.pm
> # from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:4)
> # require unicore/Heavy.pl
> # from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:185)
> # require unicore/lib/Alpha/Y.pl
> # require PDL/Options.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):3288)
> # require Fcntl.pm
> # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):4167)
> ...
>
> utf8.pm and the utf8_heavy.pl are actually loaded from PDL::Core.pm
> The funny "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)" is caused by the
> fact
> that PDL/Core.pm is a generated file with some
>
> # line 123 "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)"
>
> lines in it. And the offending line is
>
> if $value =~ /e\p{IsAlpha}/ or $value =~ /\p{IsAlpha}e/;
>
> There's no explicit mention of utf8.pm here - the code uses a Unicode
> property
> in a regular expression. utf8.pm (at least in Perl 5.22) doesn't do
> anything
> except setting up a AUTOLOAD sub that will require utf8_heavy.pl when
> being run.
> (If you check $utf8::AUTOLOAD when our @INC spy is called, it's value
> is "utf8::SWASHNEW".)
>
> So the whole utf8_heavy.pl + unico[dr]e shebang is triggered on demand
> whenever
> some Unicode feature of Perl is requested, e.g. a Unicode property in
> a regex,
> probably lots of others.
>
> I don't think it's feasible to try to detect this by statical
> analysis.
> Should we just add this stuff (at least 4 MB speread over more than
> 400 files)
> to _every_ packed executable?
>
> Cheers, Roderich
Thanks Roderich,
The size issue rears its head once more...
It would also be a Herculean task to get static scanning to detect all such cases (although maybe PPI could be leveraged if someone ever has the tuits -
https://metacpan.org/pod/PPI::Token::Regexp ).
Perhaps another flag could be added to pp for the cases where the code does not explicitly call for unicode, but it is needed for a packed executable to work. pp --unicode?
I also now think that this is the root cause of an issue I've been working around for a while using the code below. I use the pp -x flag when building, and set an environment variable in my script before calling pp.
if ($ENV{BDV_PP_BUILDING}) {
use 5.016;
use feature 'unicode_strings';
my $string = "sp_self_only() and \N{WHITE SMILING FACE}";
$string =~ /\bsp_self_only\b/;
}
Given that, it should be possible to statically scan for the various permutations of /use feature 'unicode_/ to detect unicode_strings and unicode_eval. If someone is using those features in their code then they need the extra libraries.
https://metacpan.org/pod/feature#The-unicode_strings-feature
Such scanning would not detect multiline chunks, as per the documentation caveats. A "pp -unicode" style flag would still be needed in such cases.
https://metacpan.org/pod/Module::ScanDeps#CAVEATS
WRT the pp flag, maybe a more general approach would be something that parallels the feature pragma, e.g.
pp --feature=unicode_strings,unicode_eval
pp --feature=":5.12"
Regards,
Shawn.