Skip Menu |

This queue is for tickets about the String-BOM CPAN distribution.

Report information
The Basics
Id: 101175
Status: open
Priority: 0/
Queue: String-BOM

People
Owner: Nobody in particular
Requestors: arfreitas [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.3
Fixed in: (no value)



Subject: strip_bom_from_string not working with \x{feff}
Greetings Daniel, My name is Alceu and I think I found a bug with strip_bom_from_string function from your distro. Although string_has_bom can correctly identify the BOM character in a UTF-8 file, strip_bom_from_string cannot remove it. Here is a example of a working workaround that I applied: if ( string_has_bom($_) ) { my $header = strip_bom_from_string($_); $header =~ s/^\x{feff}//; } I also attached the file with the character for testing. Here is my testing environment details: -bash-3.2$ perl -V Summary of my perl5 (revision 5 version 20 subversion 1) configuration: Platform: osname=linux, osvers=2.6.39-300.26.1.el5uek, archname=i686-linux uname='linux localhost.localdomain 2.6.39-300.26.1.el5uek #1 smp thu jan 3 18:33:10 pst 2013 i686 i686 i386 gnulinux ' config_args='-de -Dprefix=/ood_repository/siebel_log_mon/perl5/perls/perl-5.20.1 -Aeval:scriptdir=/ood_repository/siebel_log_mon/perl5/perls/perl-5.20.1/bin' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.1.2 20080704 (Red Hat 4.1.2-54)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.5.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.5' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_MALLOC_WRAP PERL_NEW_COPY_ON_WRITE PERL_PRESERVE_IVUV USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under linux Compiled at Dec 22 2014 21:07:37 Please let me know if you need more information. Thank you in advance, Alceu
Subject: WorkMon.log
Download WorkMon.log
application/octet-stream 415b

Message body not shown because it is not plain text.

Subject: strip_bom_from_string() does not strip Unicode/character strings
Thanks, it looks like the string in question is a Unicode string and not a bytes string. strip_bom_from_string() should probably handle it either way but warn (maybe even die**) when given a Unicode string since it doesn't make sense to do byte operation on a characters string. ** compare bytes operation on a bytes string vs a Unicode string via base64: perl -MMIME::Base64 -e 'print MIME::Base64::encode_base64("\xfe\xff")' perl -MMIME::Base64 -e 'print MIME::Base64::encode_base64("\x{feff}")'