Skip Menu |

This queue is for tickets about the App-RecordStream CPAN distribution.

Report information
The Basics
Id: 91452
Status: resolved
Priority: 0/
Queue: App-RecordStream

People
Owner: Nobody in particular
Requestors: tsibley [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: Tests fail due to hash randomisation
The new hash randomisation since Perl 5.18 (and backported to older versions, I believe) cause test failures in multiple test files. The actual failures vary from run to run, as expected, but attached is an example run. Without digging into the failures themselves, it's unclear to me if the assumptions about hash key ordering are limited to the tests themselves or extend into the actual tools. I'm testing against git master as of 3.7.3-28-gfd6412e.
Subject: build-fail.log
Download build-fail.log
application/octet-stream 33.1k

Message body not shown because it is not plain text.

From: eli [...] siliconsprawl.com
Repro'd on 5.18.1 as well. The test failures I've seen are all different orderings of the columns within a record (but no changes in record order), which should only affect the tests. Possible exception is fromxml, which looks to have some odd ordering going on with its --element arguments.
From: eli [...] siliconsprawl.com
Poked around a bit more. All of the test failures I see are either recs-to* scripts or recs-fromxml. The output scripts don't have any canonical key ordering. One potential fix is to just do a sort over the initial column of keys (eg. https://github.com/benbernard/RecordStream/blob/master/lib/App/RecordStream/Operation/tocsv.pm#L44). Shouldn't be too much overhead since it's one sort over one row, and there's (typically) just one recs-to* at a time. It also seems nicer for the user so columns don't flip around between runs. recs-fromxml looks a bit trickier. It's relying on XML::Simple, which explicitly doesn't provide ordering guarantees. With multiple element arguments, we end up seeing the records themselves in different orders. We can't canonicalize the stream order without buffering all the records and backing up the pipeline; two options I can think of are switching to an ordered parser (XML::SAX?) or just solving it on the test side by sorting the results (maybe in OperationHelper?)
On Mon Dec 16 23:17:54 2013, https://www.google.com/accounts/o8/id?id=AItOawmUqDfWICIMz_fwUIshaTn8ORdDj523HxU wrote: Show quoted text
> The output scripts don't have any canonical key ordering. One > potential fix is to just do a sort over the initial column of keys > (eg. > https://github.com/benbernard/RecordStream/blob/master/lib/App/RecordStream/Operation/tocsv.pm#L44). > Shouldn't be too much overhead since it's one sort over one row, and > there's (typically) just one recs-to* at a time. It also seems nicer > for the user so columns don't flip around between runs.
It'd be nice to optionally preserve the original source key ordering as much as possible. For example, I've found myself working with csv data recently and doing this trick a lot to inspect output: recs-fromcsv --header foo.csv | ... | recs-totable -k `head -n1 foo.csv` Perhaps a flag to preserve ordering data in fromcsv (as part of the record) and use that in recs-to* if available, falling back to alphanumeric otherwise (i.e. for new fields).
RT-Send-CC: eli [...] siliconsprawl.com
All tests, including optional ones, now pass on 5.18. Thanks Eli!