Subject: | new color counting code |
Date: | Fri, 13 Jul 2007 11:15:06 +1000 |
To: | bug-Imager [...] rt.cpan.org |
From: | tonyc [...] cpan.org |
----- Forwarded message from Gabriel Vasseur <gvasseur@messagelabs.com> -----
X-Spam-Checker-Version: SpamAssassin 3.1.7-deb (2006-10-05) on
develop-help.com
X-Spam-Level:
X-VirusChecked: Checked
X-Env-Sender: gvasseur@messagelabs.com
X-Msg-Ref: server-19.tower-48.messagelabs.com!1184257965!3107438!1
X-StarScan-Version: 5.5.12.11; banners=messagelabs.com,-,-
X-Originating-IP: [62.231.131.6]
Date: Thu, 12 Jul 2007 17:32:38 +0100
From: Gabriel Vasseur <gvasseur@messagelabs.com>
To: tony@imager.perl.org
Subject: Re: Imager memory corruption bug
X-OriginalArrivalTime: 12 Jul 2007 16:32:42.0801 (UTC) FILETIME=[45322A10:01C7C4A2]
Hi Tony,
First to answer your last email: (I know I'm a bit late ;-)
So are you saying that you've patched the code to refuse to look up a
colour in the palette if the colour index is outside the range of the
palette, right?
What does it do if it finds something? It would be useful for me to have
access to the information, something like "N pixels had a colour index
lying outside of the range of the palette".
Second thing is, no I don't have svn. I checked Imager-0.59, which was
released recently (I think after I contacted you?), but didn't see any
change relative to the aforementioned problem, compared to version 0.58.
In case I'm confused, I would be grateful if you can enlighten me.
Otherwise, can you send to me the actual code?
Third thing, I'm proposing a modest contribution to Imager, that you'll
find attached.
In two words, the changes are:
- a faster getcolorcount function (at least it is on my system)
- a new getcolorusagehash function (entirely in perl, using already
existing Imager tools, because that didn't happen to be much faster in
C/XS anyway, I think because of the high volume of data to transfer from
C to Perl).
- a new getcolorusage function which basically returns a sorted version
of the values of the hash returned by getcolorusagehash, but it actually
do the job in C, this time.
I tried to shape the code in a form that would be useful to you, but
couldn't spend too much time on it either... I wrote a very short test
script. Attached is also the output of a diff -r against Imager-0.59
(that contains comments that might be useful to figure out what does what).
It does require some work. You'll probably want to tidy the code, change
the names (I didn't try to figure out the naming convention), add safety
checks, etc. I tried to use 'color' and not 'colour' but I'm afraid I've
been a bit lazy once or twice... ;-)
I hope you'll find that contribution more useful than bothering, though!
Any feedback is welcome.
Sorry again if that's not in a format convenient for you... :-(
Cheers,
Gabriel.
tony@imager.perl.org wrote:
Show quoted text
>On Thu, Jun 21, 2007 at 02:24:21PM +0100, Gabriel Vasseur wrote:
>
Show quoted text>>Cool. Thanks for that.
>>
>>So the fact that it returned a "random" colour explains why the result
>>would depend on the presence or absence of a line of code... I guess.
>>Any idea why this instability would only happen in threadless perl?
>>Maybe it's wasn't using the same part of the memory... Strange because
>>it was quite reproducible. Anyway...
>>
>
>It was happening, though with less randomness, on both, since you were
>getting 21 colours instead of 20.
>
>As to the variation between the two, they depend on how any previous
>processing has modified the memory, which has presumably been freed
>and then remalloced for use as the palette of the new image.
>
>I suspect if I'd run it under valgrind, I would have seen it complain
>about using uninitialized values.
>
>If you have subversion, you can checkout the fixed version with:
>
> svn co http://imager.perl.org/svn/trunk/Imager/
>
>Tony
>
>______________________________________________________________________
>This email has been scanned by the MessageLabs Email Security System.
>For more information please visit http://www.messagelabs.com/email
>______________________________________________________________________
>
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
diff -r Imager-0.59/datatypes.c Imager-0.60/datatypes.c
219c219,220
< int ci,idx[8];
---
> int ci;
> /* int idx[8]; */ /* Useless because not used */
224c225
< ct->cnt++;
---
> /* ct->cnt++; */ /* I think this was useless */
231,232c232,233
< c->cnt++;
< idx[i]=ci;
---
> /* c->cnt++; */ /* I think this was useless */
> /* idx[i]=ci; */ /* Useless because not used */
233a235
> c->cnt++; /* New. The only thing really needed (I think) */
269a272,290
>
> /* This whole function is new */
> /* walk through the tree and for each colour, store its seen count in the
> space pointed by *col_usage_it_adr */
> void
> octt_histo(struct octt *ct, unsigned int **col_usage_it_adr) {
> int i,c;
> c = 0;
> for(i = 0; i < 8; i++)
> if (ct->t[i] != NULL) {
> octt_histo(ct->t[i], col_usage_it_adr);
> c++;
> }
> if (!c) {
> *(*col_usage_it_adr)++ = ct->cnt;
> }
> }
>
>
diff -r Imager-0.59/image.c Imager-0.60/image.c
1223a1224,1225
> /* This function has been changed and is now faster. It's using
> * i_gsamp instead of i_gpix */
1228,1229d1229
< int xsize,ysize;
< i_color val;
1230a1231,1232
> int maxc = 10000000;
> i_sample_t * samp;
1232,1242c1234,1251
< mm_log((1,"i_count_colors(im 0x%08X,maxc %d)\n"));
<
< xsize=im->xsize;
< ysize=im->ysize;
< ct=octt_new();
<
< colorcnt=0;
< for(y=0;y<ysize;y++) for(x=0;x<xsize;x++) {
< i_gpix(im,x,y,&val);
< colorcnt+=octt_add(ct,val.rgb.r,val.rgb.g,val.rgb.b);
< if (colorcnt > maxc) { octt_delete(ct); return -1; }
---
> int xsize = im->xsize;
> int ysize = im->ysize;
> int samp_cnt = 3 * xsize;
> ct = octt_new();
>
> samp = (i_sample_t *) mymalloc( xsize * 3 * sizeof(i_sample_t));
>
> colorcnt = 0;
> for(y = 0; y < ysize; ) {
> i_gsamp(im, 0, xsize, y++, samp, NULL, 3);
> for(x = 0; x < samp_cnt; ) {
> colorcnt += octt_add(ct, samp[x], samp[x+1], samp[x+2]);
> x += 3;
> if (colorcnt > maxc) {
> octt_delete(ct);
> return -1;
> }
> }
1243a1253
> myfree(samp);
1247a1258,1343
> /* sorts the array ra[0..n-1] into increasing order using heapsort algorithm
> * (adapted from the Numerical Recipes)
> */
> /* Needed by get_anonymous_color_histo */
> void hpsort(unsigned int n, int *ra) {
> unsigned int i,
> ir,
> j,
> l,
> rra;
>
> if (n < 2) return;
> l = n >> 1;
> ir = n - 1;
> for(;;) {
> if (l > 0) {
> rra = ra[--l];
> }
> else {
> rra = ra[ir];
> ra[ir] = ra[0];
> if (--ir == 0) {
> ra[0] = rra;
> break;
> }
> }
> i = l;
> j = 2 * l + 1;
> while (j <= ir) {
> if (j < ir && ra[j] < ra[j+1]) j++;
> if (rra < ra[j]) {
> ra[i] = ra[j];
> i = j;
> j++; j <<= 1; j--;
> }
> else break;
> }
> ra[i] = rra;
> }
> }
>
> /* This function constructs an ordered list which represents how much the
> * different colors are used. So for instance (100, 100, 500) means that one
> * color is used for 500 pixels, another for 100 pixels and another for 100
> * pixels. It's tuned for performance. You might not like the way I've hardcoded
> * the maxc ;-) and you might want to change the name... */
> /* Uses octt_histo */
> int
> get_anonymous_color_histo(i_img *im, unsigned int **col_usage) {
> struct octt *ct;
> int x,y,i;
> int colorcnt;
> int maxc = 10000000;
> unsigned int *col_usage_it;
> i_sample_t * samp;
>
> int xsize = im->xsize;
> int ysize = im->ysize;
> int samp_cnt = 3 * xsize;
> ct = octt_new();
>
> samp = (i_sample_t *) mymalloc( xsize * 3 * sizeof(i_sample_t));
>
> colorcnt = 0;
> for(y = 0; y < ysize; ) {
> i_gsamp(im, 0, xsize, y++, samp, NULL, 3);
> for(x = 0; x < samp_cnt; ) {
> colorcnt += octt_add(ct, samp[x], samp[x+1], samp[x+2]);
> x += 3;
> if (colorcnt > maxc) {
> octt_delete(ct);
> return -1;
> }
> }
> }
> myfree(samp);
> /* Now that we know the number of colours... */
> col_usage_it = *col_usage = (unsigned int *) mymalloc(colorcnt * 2 * sizeof(unsigned int));
> octt_histo(ct, &col_usage_it);
> hpsort(colorcnt, *col_usage);
> octt_delete(ct);
> return colorcnt;
> }
>
>
>
diff -r Imager-0.59/Imager.pm Imager-0.60/Imager.pm
3281a3282,3308
> # Returns a reference to a hash. The keys are colour named (packed) and the
> # values are the number of pixels in this colour.
> sub getcolorusagehash {
> my $self = shift;
> my $channels= $self->getchannels;
> # We don't want to look at the alpha channel, because some gifs using it
> # doesn't define it for every colour (but only for some)
> $channels -= 1 if $channels == 2 or $channels == 4;
> my %colour_use;
> my $height = $self->getheight;
> for my $y (0 .. $height - 1) {
> my $colours = $self->getsamples(y => $y, channels => [ 0 .. $channels - 1 ]);
> while (length $colours) {
> $colour_use{ substr($colours, 0, $channels, '') }++;
> }
> }
> return \%colour_use;
> }
>
> # This will return a ordered array of the colour usage. Kind of the sorted
> # version of the values of the hash returned by getcolorusagehash.
> # You might want to add safety checks and change the names, etc...
> sub getcolorusage {
> my $self=shift;
> return get_anonymous_colour_usage ($self);
> }
>
diff -r Imager-0.59/Imager.xs Imager-0.60/Imager.xs
2993a2994,3009
> void
> get_anonymous_colour_usage(Imager::ImgRaw im)
> PPCODE:
> int i;
> unsigned int ** col_usage = (unsigned int **) mymalloc(sizeof(unsigned int *));
> unsigned int * p;
> int col_cnt = get_anonymous_color_histo(im, col_usage);
> EXTEND(SP, col_cnt);
> p = *col_usage;
> for (i = 0; i < col_cnt; ) {
> PUSHs(sv_2mortal(newSViv( p[i++])));
> }
> myfree(p);
> myfree(col_usage);
> XSRETURN(col_cnt);
>
Only in Imager-0.60/t: t999new.t
----- End forwarded message -----