Skip Menu |

This queue is for tickets about the Poppler CPAN distribution.

Report information
The Basics
Id: 130280
Status: resolved
Priority: 0/
Queue: Poppler

People
Owner: Nobody in particular
Requestors: NOCTUX [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.0101
Fixed in: (no value)



Subject: Double free when handling quadrilaterals
There is a double free when using quadrilaterals returned from highlight-annotations, that sometimes manifests into glibc asserts being triggered ("double free" or "free of invalid size"). A minimal example program and example pdf is attached to this bugreport. Executing the program under valgrind shows the double free: Show quoted text
> valgrind perl ./pdfextract_bug.pl ./test_annot.pdf > ==20006== Memcheck, a memory error detector > ==20006== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==20006== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info > ==20006== Command: perl ./pdfextract_bug.pl /tmp/test_annot.pdf > ==20006== > $VAR1 = bless( {}, 'Poppler::AnnotTextMarkup' ); > Poppler::Quadrilateral > ==20006== Invalid free() / delete / delete[] / realloc() > ==20006== at 0x48399AB: free (vg_replace_malloc.c:530) > ==20006== by 0x6496ECA: g_boxed_free (gboxed.c:401) > ==20006== by 0x640091D: boxed_wrapper_destroy (GBoxed.xs:391) > ==20006== by 0x640091D: default_boxed_destroy (GBoxed.xs:454) > ==20006== by 0x64019F5: XS_Glib__Boxed_DESTROY (GBoxed.xs:907) > ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237) > ==20006== by 0x48C8298: Perl_call_sv (perl.c:3043) > ==20006== by 0x49C81EE: S_curse (sv.c:6992) > ==20006== by 0x49C8CDC: Perl_sv_clear (sv.c:6586) > ==20006== by 0x49C9D47: Perl_sv_free2 (sv.c:7093) > ==20006== by 0x4A2E494: Perl_leave_scope (scope.c:1191) > ==20006== by 0x4A47BF7: Perl_pp_leave (pp_ctl.c:2136) > ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537) > ==20006== Address 0x90c1cb0 is 0 bytes inside a block of size 64 free'd > ==20006== at 0x48399AB: free (vg_replace_malloc.c:530) > ==20006== by 0x654A0A2: array_free (garray.c:372) > ==20006== by 0x66647CC: _free_array (gperl-i11n-marshal-array.c:48) > ==20006== by 0x66647CC: array_to_sv (gperl-i11n-marshal-array.c:181) > ==20006== by 0x66647CC: arg_to_sv (gperl-i11n-marshal-arg.c:230) > ==20006== by 0x6665E37: invoke_c_code.isra.0 (gperl-i11n-invoke-c.c:236) > ==20006== by 0x666BCAE: XS_Glib__Object__Introspection_invoke (GObjectIntrospection.xs:992) > ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237) > ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537) > ==20006== by 0x48D3E0D: S_run_body (perl.c:2716) > ==20006== by 0x48D3E0D: perl_run (perl.c:2639) > ==20006== by 0x1091B5: main (perlmain.c:127) > ==20006== Block was alloc'd at > ==20006== at 0x48386AF: malloc (vg_replace_malloc.c:298) > ==20006== by 0x483ADE7: realloc (vg_replace_malloc.c:826) > ==20006== by 0x651A0C8: g_realloc (gmem.c:164) > ==20006== by 0x6549AFB: g_array_maybe_expand (garray.c:820) > ==20006== by 0x654B139: g_array_sized_new (garray.c:208) > ==20006== by 0x700526E: create_poppler_quads_from_annot_quads (poppler-annot.cc:293) > ==20006== by 0x700526E: poppler_annot_text_markup_get_quadrilaterals (poppler-annot.cc:1648) > ==20006== by 0x65E06CF: ffi_call_unix64 (in /usr/lib/libffi.so.6.0.4) > ==20006== by 0x65E009F: ffi_call (in /usr/lib/libffi.so.6.0.4) > ==20006== by 0x666592A: invoke_c_code.isra.0 (gperl-i11n-invoke-c.c:202) > ==20006== by 0x666BCAE: XS_Glib__Object__Introspection_invoke (GObjectIntrospection.xs:992) > ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237) > ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537)
From debugging with GDB, I could debug that 0x90c1cb0 in the output above is the address of the g_array.data of the glib-array of Poppler-quadrilaterals, so basically the address of the first element of the quads. From gdb, I further saw, that glib's array_free is then later called with 'flags=FREE_SEGMENT', so when deallocating the whole array, all quad-elements are freeed as well when cleaning up the local variables in perls stackframe. Yet, the individual quads contained in the array are once more deconstructed by the perl-bindings as a boxed glib-value (XS_Glib__Boxed_DESTROY), causing the actual double free. Unfortunately, I am not firm enough with glib and its ownership model to actually fix the issue myself. I have however written an equivalent C-implementation that exhibits exactly the same error-pattern (attached as pdfextract_bug.c) with some comments, which hopefully help to better understand the specific issue. Compile and run with: Show quoted text
> gcc -o /tmp/pdfextract ./pdfextract_bug.c `pkg-config --cflags --libs poppler-glib` \ > && valgrind /tmp/pdfextract file://$(readlink -f /tmp/test_annot.pdf)
Thank you for your project and for reading this bugreport :) I hope the information is more or less complete, but I'd be happy to assist you in any way in further troubleshooting the issue. Best regards, Simon Relevant library versions: - CPAN Module Poppler: 1.0101 - Distro: Archlinux - Perl Version 5.30.0-3 - Glib2 version: 2.60.6-1 - glib-perl/gtk2-perl version: 1.329-2 - Glib::Object::Introspection version 0.047-3 - gcc version: 9.1.0 - glibc version: 2.29-4
Subject: pdfannots_bug.pl
#!/usr/bin/env perl use strict; use warnings; BEGIN { push @INC, qw( /home/noctux/perl5/lib/perl5/x86_64-linux-thread-multi /home/noctux/perl5/lib/perl5 /usr/lib/perl5/5.28/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.28/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.28/core_perl /usr/share/perl5/core_perl); } use Poppler; use Search::Xapian; use Data::Dumper; my $pdf = Poppler::Document->new_from_file($ARGV[0]); my $n_pages = $pdf->get_n_pages; for my $pagenr (2) { my $page = $pdf->get_page($pagenr); my $mappings = $page->get_annot_mapping(); for my $mapping ($mappings) { next unless $mapping; my $annot = $mapping->annot; my $type = $annot->get_annot_type(); if ($type eq "highlight") { study 1; my @quads = $annot->get_quadrilaterals(); #for my $quad (@$quads) { ##print " " . ref($quad) . "\n"; #} } else { # TODO: DEBUG print "Unsupported Annotation type: $type\n"; } } }
Subject: pdfextract_bug.c
#include <poppler.h> #include <stdio.h> int main(int argc, const char *argv[]) { if (argc < 2) { printf("Usage: %s <document>", argv[0]); return -1; } GError *err = NULL; PopplerDocument *doc = poppler_document_new_from_file(argv[1], NULL, &err); if (!doc) { fprintf(stderr, "Error creating poppler document: %s\n", err->message); g_error_free(err); return -2; } int npages = poppler_document_get_n_pages(doc); for (int i = 0; i < npages; i++) { PopplerPage *page = poppler_document_get_page(doc, i); if (!page) return -3; GList *mapping = poppler_page_get_annot_mapping(page); if (!mapping) { g_object_unref(page); continue; } for(GList *i=mapping; i; i = i->next) { PopplerAnnot* annot = ((PopplerAnnotMapping *)i->data)->annot; PopplerAnnotType type = poppler_annot_get_annot_type(annot); if (type == POPPLER_ANNOT_HIGHLIGHT) { GArray* quads = poppler_annot_text_markup_get_quadrilaterals((PopplerAnnotTextMarkup*) annot); for (guint i = 0; i < quads->len; i++) { PopplerQuadrilateral *quad = &g_array_index(quads, PopplerQuadrilateral, i); printf("%p\n", quad); // This free is invalid, as we will free quad twice, once here and once below g_free(quad); } // this is where the double free will then happen, because free_segment is set to 1, so we try to double free quad g_array_free(quads, 1); } } poppler_page_free_annot_mapping(mapping); g_object_unref(page); } g_object_unref(doc); return 0; }
Subject: test_annot.pdf
Download test_annot.pdf
application/pdf 45.7k

Message body not shown because it is not plain text.

Thanks for the detailed bug report. The Poppler module doesn't actually include any XS anymore -- it utilizes Glib::Object::Introspection to auto-generate the bindings. It might be possible to fix this by overriding the behavior of the appropriate class, but I suspect this will be beyond my capabilities -- at least to solve quickly. I have contacted the gtk-perl mailing list to seek advice on whether this is an issue with the Poppler bindings or with Glib::Object::Introspection.
On Mon Aug 26 12:45:06 2019, VOLKENING wrote: Show quoted text
> Thanks for the detailed bug report. The Poppler module doesn't > actually include any XS anymore -- it utilizes > Glib::Object::Introspection to auto-generate the bindings. It might be > possible to fix this by overriding the behavior of the appropriate > class, but I suspect this will be beyond my capabilities -- at least > to solve quickly.
No worries, I know how that feels, got a bit lost in G::O::I's source code at least once as well. Show quoted text
> I have contacted the gtk-perl mailing list to seek advice on whether > this is an issue with the Poppler bindings or with > Glib::Object::Introspection.
Thank you for forwarding this bug to gtk-perl! Let's see how this develops. While rereading this bug, I noticed that I made a small mistake within my original upload and pasted an old version of pdfannots_bug.pl with the wrong page number (the original faulty pdf was copyright-protected and had a bug on page 2, but the uploaded pdf exhibits this issue on page 0). I'm reuploading the fixed version with this message. Diff: Show quoted text
> 18c18 > < for my $pagenr (2) { > ---
> > for my $pagenr (0) {
Thank you again for the prompt response and for relaying this issue. Best regards, Simon
Subject: pdfannots_bug.pl
#!/usr/bin/env perl use strict; use warnings; BEGIN { push @INC, qw( /home/noctux/perl5/lib/perl5/x86_64-linux-thread-multi /home/noctux/perl5/lib/perl5 /usr/lib/perl5/5.28/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.28/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.28/core_perl /usr/share/perl5/core_perl); } use Poppler; use Search::Xapian; use Data::Dumper; my $pdf = Poppler::Document->new_from_file($ARGV[0]); my $n_pages = $pdf->get_n_pages; for my $pagenr (0) { my $page = $pdf->get_page($pagenr); my $mappings = $page->get_annot_mapping(); for my $mapping ($mappings) { next unless $mapping; my $annot = $mapping->annot; my $type = $annot->get_annot_type(); if ($type eq "highlight") { study 1; my @quads = $annot->get_quadrilaterals(); #for my $quad (@$quads) { ##print " " . ref($quad) . "\n"; #} } else { # TODO: DEBUG print "Unsupported Annotation type: $type\n"; } } }
As it seems, there has been some activity on gtk-perl regarding the issue that you posted on G::O::I: https://mail.gnome.org/archives/gtk-perl-list/2019-September/msg00004.html I have tested the code with the proposed changes by building G::O::I from master which includes https://gitlab.gnome.org/GNOME/perl-glib-object-introspection/commit/42cdec8f455bd855c3f4af056d82f6acd10ab36a This commit seems to indeed resolve the original issue, Segfaults are gone in the test code and valgrind is happy as well. I did not directly respond to the mailinglist as I could not really figure out what Message-ID to respond to to not break threading. Please pass along my thanks to Torsten Schönfeld if you reply to the list. However, there is something fishy about the objects returned. When acting upon those objects returned. Transitive objects (in this case the points within the quadrilaterals) themselves seem to be freed too early while they are still in use from the perl side, causing invalid reads. I've attached an slightly revised example of the test code (pdfextract_invalid_read.pl) to this message, which exhibits the issue with the test pdf (test_annot.pdf) using current git-master of G::O::I (revision 42cdec8f455bd855c3f4af056d82f6acd10ab36a). Valgrind output: Show quoted text
> $ valgrind perl -I/tmp/poppler-test/local/lib/perl5 -I/home/noctux/perl5/lib/perl5/x86_64-linux-thread-multi -I/home/noctux/perl5/lib/perl5 -I/usr/lib/perl5/5.28/site_perl -I/usr/share/perl5/site_perl -I/usr/lib/perl5/5.28/vendor_perl -I/usr/share/perl5/vendor_perl -I/usr/lib/perl5/5.28/core_perl -I/usr/share/perl5/core_perl ./pdfextract_invalid_read.pl ./test_annot.pdf > ==25785== Memcheck, a memory error detector > ==25785== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==25785== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info > ==25785== Command: perl -I/tmp/poppler-test/local/lib/perl5 -I/home/noctux/perl5/lib/perl5/x86_64-linux-thread-multi -I/home/noctux/perl5/lib/perl5 -I/usr/lib/perl5/5.28/site_perl -I/usr/share/perl5/site_perl -I/usr/lib/perl5/5.28/vendor_perl -I/usr/share/perl5/vendor_perl -I/usr/lib/perl5/5.28/core_perl -I/usr/share/perl5/core_perl ./pdfextract_invalid_read.pl ./test_annot.pdf > ==25785== > ==25785== Invalid read of size 8 > ==25785== at 0x650CF50: g_field_info_get_field (in /usr/lib/libgirepository-1.0.so.1.0.0) > ==25785== by 0x64EECBF: get_field.constprop.0 (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x64EF02C: XS_Glib__Object__Introspection__get_field (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x4953220: Perl_pp_entersub (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x4949785: Perl_runops_standard (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x48BE2A5: perl_run (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x1091A6: main (in /usr/bin/perl) > ==25785== Address 0x8cbf920 is 0 bytes inside a block of size 64 free'd > ==25785== at 0x48399AB: free (vg_replace_malloc.c:530) > ==25785== by 0x63DD112: ??? (in /usr/lib/libglib-2.0.so.0.6200.0) > ==25785== by 0x64EFE2C: arg_to_sv (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x64F1327: invoke_c_code.isra.0 (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x64F1836: XS_Glib__Object__Introspection_invoke (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x4953220: Perl_pp_entersub (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x4949785: Perl_runops_standard (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x48BE2A5: perl_run (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x1091A6: main (in /usr/bin/perl) > ==25785== Block was alloc'd at > ==25785== at 0x48386AF: malloc (vg_replace_malloc.c:298) > ==25785== by 0x483ADE7: realloc (vg_replace_malloc.c:826) > ==25785== by 0x63AE728: g_realloc (in /usr/lib/libglib-2.0.so.0.6200.0) > ==25785== by 0x63E346B: ??? (in /usr/lib/libglib-2.0.so.0.6200.0) > ==25785== by 0x63E3929: g_array_sized_new (in /usr/lib/libglib-2.0.so.0.6200.0) > ==25785== by 0x6E9266E: poppler_annot_text_markup_get_quadrilaterals (in /usr/lib/libpoppler-glib.so.8.14.0) > ==25785== by 0x4BE26CF: ffi_call_unix64 (in /usr/lib/libffi.so.6.0.4) > ==25785== by 0x4BE209F: ffi_call (in /usr/lib/libffi.so.6.0.4) > ==25785== by 0x64F0E4A: invoke_c_code.isra.0 (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x64F1836: XS_Glib__Object__Introspection_invoke (in /tmp/poppler-test/local/lib/perl5/x86_64-linux-thread-multi/auto/Glib/Object/Introspection/Introspection.so) > ==25785== by 0x4953220: Perl_pp_entersub (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== by 0x4949785: Perl_runops_standard (in /usr/lib/perl5/5.30/core_perl/CORE/libperl.so) > ==25785== > 216 > ==25785== > ==25785== HEAP SUMMARY: > ==25785== in use at exit: 6,877,766 bytes in 33,066 blocks > ==25785== total heap usage: 93,703 allocs, 60,637 frees, 17,065,111 bytes allocated > ==25785== > ==25785== LEAK SUMMARY: > ==25785== definitely lost: 21,117 bytes in 31 blocks > ==25785== indirectly lost: 65,200 bytes in 29 blocks > ==25785== possibly lost: 6,113,354 bytes in 26,529 blocks > ==25785== still reachable: 674,159 bytes in 6,450 blocks > ==25785== of which reachable via heuristic: > ==25785== newarray : 20,720 bytes in 607 blocks > ==25785== suppressed: 0 bytes in 0 blocks > ==25785== Rerun with --leak-check=full to see details of leaked memory > ==25785== > ==25785== For counts of detected and suppressed errors, rerun with: -v > ==25785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Thank you again for taking interest in this bug-report :) ~ Simon
Subject: pdfextract_invalid_read.pl
#!/usr/bin/env perl use strict; use warnings; use Poppler; use Data::Dumper; my $pdf = Poppler::Document->new_from_file($ARGV[0]); my $n_pages = $pdf->get_n_pages; my $title = $pdf->get_title; my $author = $pdf->get_author; my $keywords = $pdf->get_keywords; my $i = 0; for my $pagenr (0 .. $n_pages-1) { my $page = $pdf->get_page($pagenr); for my $mapping ($page->get_annot_mapping()) { my $annot = $mapping->annot; my $type = $annot->get_annot_type(); if ($type eq "highlight") { my $quads = $annot->get_quadrilaterals(); for my $quad (@$quads) { my $p1 = $quad->p1; print $p1->x . "\n"; } } } }
Subject: test_annot.pdf
Download test_annot.pdf
application/pdf 45.7k

Message body not shown because it is not plain text.

Ah, what I've forgotten above: removing the "print $p1->x" line removes the invalid read, so it is the culprit of the issue.
Thanks for testing Torsten's patches and reporting back. Show quoted text
> However, there is something fishy about the objects returned. When > acting upon those objects returned. Transitive objects (in this case > the points within the quadrilaterals) themselves seem to be freed too > early while they are still in use from the perl side, causing invalid > reads. I've attached an slightly revised example of the test code > (pdfextract_invalid_read.pl) to this message, which exhibits the issue > with the test pdf (test_annot.pdf) using current git-master of G::O::I > (revision 42cdec8f455bd855c3f4af056d82f6acd10ab36a).
Do you want to open a new thread on the gtk-perl mailing list regarding this? I think that will be the place to get help, as the new issue is probably still related to G:I:O. In the meantime, I will reply to the original request and let them know the status.
On Mon Sep 16 11:38:12 2019, VOLKENING wrote: Show quoted text
> Do you want to open a new thread on the gtk-perl mailing list > regarding this? I think that will be the place to get help, as the new > issue is probably still related to G:I:O. In the meantime, I will > reply to the original request and let them know the status.
Thank you for your efforts and for relaying that message. Yepp, you are write, this bug is most likely not related to Poppler itself, but to the perl-glib-bindings. I have reported the issue to G::O::I's bugtracker now: https://gitlab.gnome.org/GNOME/perl-glib-object-introspection/issues/1 So I'm closing this bug for now. Thank you once more for your help. ~Simon