Bug #50820 for Geo-ReadGRIB: Fails on 64bit systems when uselongdouble is defined

Mon Oct 26 01:39:25 2009 ANDK [...] cpan.org - Ticket created

Subject:

Fails on 64bit systems when uselongdouble is defined

As the subject says. I first observed it on cpantesters. Systems that fail have an nvsize of 12 or 16. Test Summary Report ------------------- t/8-Geo-ReadGRIB-Dateline-CMS.t (Wstat: 256 Tests: 305 Failed: 1) Failed test: 6 Non-zero exit status: 1 Files=1, Tests=305, 1 wallclock secs ( 0.03 usr 0.01 sys + 0.24 cusr 0.05 csys = 0.33 CPU) Result: FAIL Regards, HTH,

Mon Oct 26 14:20:38 2009 frank.l.cox [...] gmail.com - Correspondence added

Interesting bug. The failing test runs on a newly supported GRIB file variant file shipped in the package. It's a Canadian weather office file covering longs 180 to 180 and lat 90 to -90. Projected onto a lat/long grid there are two 180 degree lines on the west and east edges. There are two separate data points for these for each latitude in the file. The test asserts the two separate 180 degree data points should have the same value for each latitude in the file. (This includes +-90 which are points on the globe but in the projection have as many data points as any other lat.)

Mon Oct 26 14:20:40 2009 The RT System itself - Status changed from 'new' to 'open'

Mon Oct 26 14:35:46 2009 frank.l.cox [...] gmail.com - Correspondence added

I don't have easy access to a 64bit longdouble system except through CPAN Testers. Also, the list of possible suspects is long... I'll upload v1_1 later today with some more tests to help zero in on the problem.

Tue Oct 27 19:49:20 2009 frank.l.cox [...] gmail.com - Correspondence added

I just uploaded v1.0_2 which has some tests that revel the smoking gun or at least a little whiff of gun smoke. All fail reports I've seen are for a single failed test to see if data values for -180 and 180 degrees longitude are the same for all latitudes in the file from pole to pole inclusive. That's .6 degree interval for 301 points total. The specific point that fails is 89.4 degrees. These files scan south to north so this is the last latitude except the 90 or the pole. Since the pole is a point, all longs should have the same value. It may "fail" for 90 north too and still pass this test. The data in the file is stored as a continuous sequence of 32 bit values starting at a given point in the file. The data for this type of GRIB file scans a row of longs west to east 180 to 180 degrees and lats from south to north -90 to 90. The first data point is (-90,-180) and the last is (90,180). Data point (-89.4,-180) is 601 further into the file than (-90, -180). I have a method called lalo2offset() that takes a lat and long and returns an offset into this data. I've demonstrated that it returns a different value for both (89.4, -180) and (98.4, 180) on 64bit use longdouble systems than on any other CPAN Testers system. It's off by one. The code is this: # shift long east until Lo1 = 0 and make sure any long # > 360 degrees is moved back into the range of 0 - 360 $thislong = $long + $self->lo_shift; if ( $thislong > 360 ) { $thislong -= 360; } $out = ( ( ( $lat - $self->La1 ) / $self->LaInc ) * $self->Ni ) + ( ($thislong ) / $self->LoInc ); return sprintf "%d", $out; ------------------------------------------------------------------- lo_shift = 180 so for -180 and 180 $thislong is 1 and 360 $lat = 89.4 La1 = -90 LaInc = LoInc = .6 Ni = 601 Using my HP calculator-- for $long = -180, (((89.4 - -90)/.6) * 601) + (0 /.6) = 179699 for $long = 180, (((89.4 - -90)/.6) * 601) + (360 /.6) = 180299 On 64bit uselongdouble systems lalo2offset() returns 179698 and 180298. Hmmm! I'm still investigating. I'm preparing a new test will explore this bug in more detail. At this point I won't be surprised to find something wrong with my assumptions...

Tue Oct 27 19:54:17 2009 frank.l.cox [...] gmail.com - Correspondence added

TYPO HERE, Should be: $thisLong is 0 and 360 I get it right in the calculations Show quoted text

> > lo_shift = 180 so for -180 and 180 $thislong is 1 and 360 > $lat = 89.4 > La1 = -90 > LaInc = LoInc = .6 > Ni = 601

Wed Oct 28 20:20:28 2009 frank.l.cox [...] gmail.com - Correspondence added

It looks like these fails happen when a different result is returned by lalo2offset() for some inputs when run on a 64bit uselongdouble Perl. The results on all other tested platforms agree with my HP calculator or hand calculation of the algorithm. Specifically, this test fails only on 64bit uselongdouble and on no other known Perl varient: my $calc = (((63 - -90)/.6) * 601) + (360 /.6); ok( $calc == 153855 ) or diag ("raw ((63 - -90)/.6) * 601) + (360 /.6) = 153855 not $calc "); I'm not sure what the right thing to do here. I can detect the platform but I don't yet know how to modify my algorithm to work correctly. I'm going to try one more test to zero in a bit more on the problem...

Sat Oct 31 02:29:19 2009 frank.l.cox [...] gmail.com - Correspondence added

Show quoted text

> my $calc = (((63 - -90)/.6) * 601) + (360 /.6); > ok( $calc == 153855 ) or > diag ("raw ((63 - -90)/.6) * 601) + (360 /.6) = 153855 not $calc ");

This test failed on uselongdouble platforms but it didn't turn out to be very diagnostic of the real problem, partly because of the problems of using == with floating point numbers.

Sat Oct 31 02:41:00 2009 frank.l.cox [...] gmail.com - Correspondence added

The problem turned out to be the use of sprintf "%d" on the return value of lalo2offset(). This truncates the floating-point result rather than rounds it as expected. This still gave the correct result on all platforms I tested on until this new test that uncovered subtle differences on uselongdouble Perl. Using the "%.0f" format instead gives the same results on all platforms.

Sat Oct 31 02:41:01 2009 frank.l.cox [...] gmail.com - Status changed from 'open' to 'resolved'