Subject: | Double free when handling quadrilaterals |
There is a double free when using quadrilaterals returned from
highlight-annotations, that sometimes manifests into glibc asserts being
triggered ("double free" or "free of invalid size").
A minimal example program and example pdf is attached to this bugreport.
Executing the program under valgrind shows the double free:
Show quoted text
> valgrind perl ./pdfextract_bug.pl ./test_annot.pdf
> ==20006== Memcheck, a memory error detector
> ==20006== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==20006== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
> ==20006== Command: perl ./pdfextract_bug.pl /tmp/test_annot.pdf
> ==20006==
> $VAR1 = bless( {}, 'Poppler::AnnotTextMarkup' );
> Poppler::Quadrilateral
> ==20006== Invalid free() / delete / delete[] / realloc()
> ==20006== at 0x48399AB: free (vg_replace_malloc.c:530)
> ==20006== by 0x6496ECA: g_boxed_free (gboxed.c:401)
> ==20006== by 0x640091D: boxed_wrapper_destroy (GBoxed.xs:391)
> ==20006== by 0x640091D: default_boxed_destroy (GBoxed.xs:454)
> ==20006== by 0x64019F5: XS_Glib__Boxed_DESTROY (GBoxed.xs:907)
> ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237)
> ==20006== by 0x48C8298: Perl_call_sv (perl.c:3043)
> ==20006== by 0x49C81EE: S_curse (sv.c:6992)
> ==20006== by 0x49C8CDC: Perl_sv_clear (sv.c:6586)
> ==20006== by 0x49C9D47: Perl_sv_free2 (sv.c:7093)
> ==20006== by 0x4A2E494: Perl_leave_scope (scope.c:1191)
> ==20006== by 0x4A47BF7: Perl_pp_leave (pp_ctl.c:2136)
> ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537)
> ==20006== Address 0x90c1cb0 is 0 bytes inside a block of size 64 free'd
> ==20006== at 0x48399AB: free (vg_replace_malloc.c:530)
> ==20006== by 0x654A0A2: array_free (garray.c:372)
> ==20006== by 0x66647CC: _free_array (gperl-i11n-marshal-array.c:48)
> ==20006== by 0x66647CC: array_to_sv (gperl-i11n-marshal-array.c:181)
> ==20006== by 0x66647CC: arg_to_sv (gperl-i11n-marshal-arg.c:230)
> ==20006== by 0x6665E37: invoke_c_code.isra.0 (gperl-i11n-invoke-c.c:236)
> ==20006== by 0x666BCAE: XS_Glib__Object__Introspection_invoke (GObjectIntrospection.xs:992)
> ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237)
> ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537)
> ==20006== by 0x48D3E0D: S_run_body (perl.c:2716)
> ==20006== by 0x48D3E0D: perl_run (perl.c:2639)
> ==20006== by 0x1091B5: main (perlmain.c:127)
> ==20006== Block was alloc'd at
> ==20006== at 0x48386AF: malloc (vg_replace_malloc.c:298)
> ==20006== by 0x483ADE7: realloc (vg_replace_malloc.c:826)
> ==20006== by 0x651A0C8: g_realloc (gmem.c:164)
> ==20006== by 0x6549AFB: g_array_maybe_expand (garray.c:820)
> ==20006== by 0x654B139: g_array_sized_new (garray.c:208)
> ==20006== by 0x700526E: create_poppler_quads_from_annot_quads (poppler-annot.cc:293)
> ==20006== by 0x700526E: poppler_annot_text_markup_get_quadrilaterals (poppler-annot.cc:1648)
> ==20006== by 0x65E06CF: ffi_call_unix64 (in /usr/lib/libffi.so.6.0.4)
> ==20006== by 0x65E009F: ffi_call (in /usr/lib/libffi.so.6.0.4)
> ==20006== by 0x666592A: invoke_c_code.isra.0 (gperl-i11n-invoke-c.c:202)
> ==20006== by 0x666BCAE: XS_Glib__Object__Introspection_invoke (GObjectIntrospection.xs:992)
> ==20006== by 0x49BF49E: Perl_pp_entersub (pp_hot.c:5237)
> ==20006== by 0x4976E79: Perl_runops_debug (dump.c:2537)
From debugging with GDB, I could debug that 0x90c1cb0 in the output above is
the address of the g_array.data of the glib-array of Poppler-quadrilaterals, so
basically the address of the first element of the quads.
From gdb, I further saw, that glib's array_free is then later called with
'flags=FREE_SEGMENT', so when deallocating the whole array, all quad-elements
are freeed as well when cleaning up the local variables in perls stackframe.
Yet, the individual quads contained in the array are once more deconstructed by
the perl-bindings as a boxed glib-value (XS_Glib__Boxed_DESTROY), causing the
actual double free.
Unfortunately, I am not firm enough with glib and its ownership model to
actually fix the issue myself.
I have however written an equivalent C-implementation that exhibits exactly the
same error-pattern (attached as pdfextract_bug.c) with some comments, which
hopefully help to better understand the specific issue. Compile and run with:
Show quoted text> gcc -o /tmp/pdfextract ./pdfextract_bug.c `pkg-config --cflags --libs poppler-glib` \
> && valgrind /tmp/pdfextract file://$(readlink -f /tmp/test_annot.pdf)
Thank you for your project and for reading this bugreport :)
I hope the information is more or less complete, but I'd be happy to assist you
in any way in further troubleshooting the issue.
Best regards,
Simon
Relevant library versions:
- CPAN Module Poppler: 1.0101
- Distro: Archlinux
- Perl Version 5.30.0-3
- Glib2 version: 2.60.6-1
- glib-perl/gtk2-perl version: 1.329-2
- Glib::Object::Introspection version 0.047-3
- gcc version: 9.1.0
- glibc version: 2.29-4
Subject: | pdfannots_bug.pl |
#!/usr/bin/env perl
use strict;
use warnings;
BEGIN {
push @INC, qw( /home/noctux/perl5/lib/perl5/x86_64-linux-thread-multi /home/noctux/perl5/lib/perl5 /usr/lib/perl5/5.28/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.28/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.28/core_perl /usr/share/perl5/core_perl);
}
use Poppler;
use Search::Xapian;
use Data::Dumper;
my $pdf = Poppler::Document->new_from_file($ARGV[0]);
my $n_pages = $pdf->get_n_pages;
for my $pagenr (2) {
my $page = $pdf->get_page($pagenr);
my $mappings = $page->get_annot_mapping();
for my $mapping ($mappings) {
next unless $mapping;
my $annot = $mapping->annot;
my $type = $annot->get_annot_type();
if ($type eq "highlight") {
study 1;
my @quads = $annot->get_quadrilaterals();
#for my $quad (@$quads) {
##print " " . ref($quad) . "\n";
#}
} else {
# TODO: DEBUG
print "Unsupported Annotation type: $type\n";
}
}
}
Subject: | pdfextract_bug.c |
#include <poppler.h>
#include <stdio.h>
int main(int argc, const char *argv[])
{
if (argc < 2) {
printf("Usage: %s <document>", argv[0]);
return -1;
}
GError *err = NULL;
PopplerDocument *doc = poppler_document_new_from_file(argv[1], NULL, &err);
if (!doc) {
fprintf(stderr, "Error creating poppler document: %s\n", err->message);
g_error_free(err);
return -2;
}
int npages = poppler_document_get_n_pages(doc);
for (int i = 0; i < npages; i++) {
PopplerPage *page = poppler_document_get_page(doc, i);
if (!page) return -3;
GList *mapping = poppler_page_get_annot_mapping(page);
if (!mapping) {
g_object_unref(page);
continue;
}
for(GList *i=mapping; i; i = i->next) {
PopplerAnnot* annot = ((PopplerAnnotMapping *)i->data)->annot;
PopplerAnnotType type = poppler_annot_get_annot_type(annot);
if (type == POPPLER_ANNOT_HIGHLIGHT) {
GArray* quads = poppler_annot_text_markup_get_quadrilaterals((PopplerAnnotTextMarkup*) annot);
for (guint i = 0; i < quads->len; i++) {
PopplerQuadrilateral *quad = &g_array_index(quads, PopplerQuadrilateral, i);
printf("%p\n", quad);
// This free is invalid, as we will free quad twice, once here and once below
g_free(quad);
}
// this is where the double free will then happen, because free_segment is set to 1, so we try to double free quad
g_array_free(quads, 1);
}
}
poppler_page_free_annot_mapping(mapping);
g_object_unref(page);
}
g_object_unref(doc);
return 0;
}
Subject: | test_annot.pdf |
Message body not shown because it is not plain text.