Bug #84419 for CAM-PDF: Can get unicode text?

Thu Apr 04 03:08:19 2013 ZDK [...] cpan.org - Ticket created

Subject:

Can get unicode text?

Hi Chris, I have an an issue for you :) if you have some time to consider. Basically, I'm trying to extract Thai text from Pdf file. I have only just tried the 'getPageText' method, but I don't get any relevant Thai text from pdf. (I'm not exactly sure if they have any problem with other languages like Chinese, Japanese etc. or it is just my font problem, I'm dump about pdf) Anyway, I have created and attached bug_unicode.t test file along with sample pdf files for you to check out Could you see if anything wrong with the test? ⮀ CAM-PDF-1.59 prove -vwl t/bug_unicode.t t/bug_unicode.t .. 1..1 not ok 1 - Should get expected text # Failed test 'Should get expected text' # at t/bug_unicode.t line 11. Wide character in print at /Users/zdk/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.4/Test/Builder.pm line 1759. # got: '!"#$%&'(')!* # ' # expected: 'ภาษาไทย' # Looks like you failed 1 test of 1. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests Test Summary Report ------------------- t/bug_unicode.t (Wstat: 256 Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.07 cusr 0.00 csys = 0.09 CPU) Result: FAIL Thanks Cheers, zdk

Subject:

CH3.pdf

Download CH3.pdf
application/pdf 140.9k

Message body not shown because it is not plain text.

Thu Apr 04 03:18:13 2013 ZDK [...] cpan.org - Correspondence added

Subject:

test_th.pdf

Download test_th.pdf
application/pdf 5.7k

Message body not shown because it is not plain text.

Thu Apr 04 03:18:14 2013 ZDK [...] cpan.org - Status changed from 'new' to 'open'

Wed Apr 10 09:32:14 2013 ZDK [...] cpan.org - Correspondence added

Add missing file

Subject:

bug_unicode.t

use utf8; use warnings; use strict; use Test::More tests => 1; use CAM::PDF; { my $pdf = CAM::PDF->new('t/test_th.pdf') || die $CAM::PDF::errstr; is( $pdf->getPageText(1) , 'à¸ à¸²à¸©à¸²à¹à¸à¸¢', "Should get expected text"); }