Skip Menu |

This queue is for tickets about the Text-Ngrams CPAN distribution.

Report information
The Basics
Id: 11395
Status: new
Priority: 0/
Queue: Text-Ngrams

People
Owner: Nobody in particular
Requestors: yona [...] cs.technion.ac.il
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: problems processing Hebrew texts
Hello, I would like to report a possible bug while processing UTF8 encoded Hebrew text. Please check out the following usage (you might need Hebrew fonts in order to read the Hebrew text): use utf8; use Text::Ngrams; my $ng = Text::Ngrams->new( type => 'utf8' ); $ng->process_text('שלום עולם!'); print $ng->to_string; Unfortunately, the output suggest that there was no text (or an empty string) was in the input text, as follows: BEGIN OUTPUT BY Text::Ngrams version 1.7 1-GRAMS (total count: 0) FIRST N-GRAM: LAST N-GRAM: ------------------------ 2-GRAMS (total count: 0) FIRST N-GRAM: LAST N-GRAM: ------------------------ 3-GRAMS (total count: 0) FIRST N-GRAM: LAST N-GRAM: ------------------------ END OUTPUT BY Text::Ngrams I'm using Perl 5.8.6: Summary of my perl5 (revision 5 version 8 subversion 6) configuration: Platform: osname=linux, osvers=2.2.17, archname=i686-linux-thread-multi uname='linux gimlet 2.2.17 #1 sun jun 25 09:24:41 est 2000 i686 unknown ' config_args='-ders -Dcc=gcc -Accflags=-DNO_HASH_SEED -Dusethreads -Duseithreads -Ud_sigsetjmp -Uinstallusrbinperl -Ulocincpth= -Uloclibpth= -Duselargefiles -Uusemallocwrap -Dinc_version_list=5.8.5/$archname 5.8.5 5.8.4/$archname 5.8.4 5.8.3/$archname 5.8.3 5.8.2/$archname 5.8.2 5.8.1/$archname 5.8.1 5.8.0/$archname 5.8.0 -Duseshrplib -Dprefix=/usr/local/ActivePerl-5.8 -Dcf_by=ActiveState -Dcf_email=support@ActiveState.com' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DNO_HASH_SEED -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DNO_HASH_SEED -fno-strict-aliasing -pipe' ccversion='', gccversion='2.95.2 20000220 (Debian GNU/Linux)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags ='' libpth=/lib /usr/lib /usr/local/lib libs=-lnsl -lndbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lposix perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc -lposix libc=/lib/libc-2.1.3.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.1.3' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/local/ActivePerl-5.8/lib/5.8.6/i686-linux-thread-multi/CORE' cccdlflags='-fpic', lddlflags='-shared' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Locally applied patches: ActivePerl Build 811 21540 Fix backward-compatibility issues in if.pm 23565 Wrong MANIFEST.SKIP Built under linux Compiled at Dec 5 2004 07:09:45 @INC: /usr/local/ActivePerl-5.8/lib/5.8.6/i686-linux-thread-multi /usr/local/ActivePerl-5.8/lib/5.8.6 /usr/local/ActivePerl-5.8/lib/site_perl/5.8.6/i686-linux-thread-multi /usr/local/ActivePerl-5.8/lib/site_perl/5.8.6 /usr/local/ActivePerl-5.8/lib/site_perl .