Bug #91452 for App-RecordStream: Tests fail due to hash randomisation

Mon Dec 16 15:39:32 2013 tsibley [...] cpan.org - Ticket created

Subject:

Tests fail due to hash randomisation

The new hash randomisation since Perl 5.18 (and backported to older versions, I believe) cause test failures in multiple test files. The actual failures vary from run to run, as expected, but attached is an example run. Without digging into the failures themselves, it's unclear to me if the assumptions about hash key ordering are limited to the tests themselves or extend into the actual tools. I'm testing against git master as of 3.7.3-28-gfd6412e.

Subject:

build-fail.log

Download build-fail.log
application/octet-stream 33.1k

Message body not shown because it is not plain text.

Mon Dec 16 19:35:54 2013 https://www.google.com/accounts/o8/id?id=AItOawmUqDfWICIMz_fwUIshaTn8ORdDj523HxU - Correspondence added

From:

eli [...] siliconsprawl.com

Repro'd on 5.18.1 as well. The test failures I've seen are all different orderings of the columns within a record (but no changes in record order), which should only affect the tests. Possible exception is fromxml, which looks to have some odd ordering going on with its --element arguments.

Mon Dec 16 19:35:54 2013 The RT System itself - Status changed from 'new' to 'open'

Tue Dec 17 02:17:54 2013 https://www.google.com/accounts/o8/id?id=AItOawmUqDfWICIMz_fwUIshaTn8ORdDj523HxU - Correspondence added

From:

eli [...] siliconsprawl.com

Poked around a bit more. All of the test failures I see are either recs-to* scripts or recs-fromxml. The output scripts don't have any canonical key ordering. One potential fix is to just do a sort over the initial column of keys (eg. https://github.com/benbernard/RecordStream/blob/master/lib/App/RecordStream/Operation/tocsv.pm#L44). Shouldn't be too much overhead since it's one sort over one row, and there's (typically) just one recs-to* at a time. It also seems nicer for the user so columns don't flip around between runs. recs-fromxml looks a bit trickier. It's relying on XML::Simple, which explicitly doesn't provide ordering guarantees. With multiple element arguments, we end up seeing the records themselves in different orders. We can't canonicalize the stream order without buffering all the records and backing up the pipeline; two options I can think of are switching to an ordered parser (XML::SAX?) or just solving it on the test side by sorting the results (maybe in OperationHelper?)

Wed Dec 18 17:43:22 2013 tsibley [...] cpan.org - Correspondence added

On Mon Dec 16 23:17:54 2013, https://www.google.com/accounts/o8/id?id=AItOawmUqDfWICIMz_fwUIshaTn8ORdDj523HxU wrote: Show quoted text

> The output scripts don't have any canonical key ordering. One > potential fix is to just do a sort over the initial column of keys > (eg. > https://github.com/benbernard/RecordStream/blob/master/lib/App/RecordStream/Operation/tocsv.pm#L44). > Shouldn't be too much overhead since it's one sort over one row, and > there's (typically) just one recs-to* at a time. It also seems nicer > for the user so columns don't flip around between runs.

It'd be nice to optionally preserve the original source key ordering as much as possible. For example, I've found myself working with csv data recently and doing this trick a lot to inspect output: recs-fromcsv --header foo.csv | ... | recs-totable -k `head -n1 foo.csv` Perhaps a flag to preserve ordering data in fromcsv (as part of the record) and use that in recs-to* if available, falling back to alphanumeric otherwise (i.e. for new fields).

Thu Jan 16 18:02:45 2014 tsibley [...] cpan.org - Correspondence added

RT-Send-CC:

eli [...] siliconsprawl.com

All tests, including optional ones, now pass on 5.18. Thanks Eli!

Thu Jan 16 18:02:46 2014 tsibley [...] cpan.org - Status changed from 'open' to 'resolved'