Subject: | When post to database fails, archiver does not run, leading to data loss |
As I have reported on irc.perl.org #smoke and on the perl5-porters mailing list, perl5.test-smoke.org has been offline since sometime on Sep 04 2020. This is a severe setback to our QA efforts for the Perl core distribution.
In this bug report I will cite one significant side effect of this event: The failure of smokecurrent.sh to create log files in the logs/smokecurrent/ subdirectory when a POST to the DB fails.
Here's what I normally expect my logs/smokecurrent/ subdirectory to look like:
#####
[perlmonger: p5smoke] $ ls -ltr logs/smokecurrent/ | tail -n 8
-rw-rw-rw- 1 jkeenan jkeenan 1296 Sep 3 04:03 rpt73bc4847e4fc9b19ddc490fb6e70d7f83c9f8bb9.rpt.gz
-rw-rw-rw- 1 jkeenan jkeenan 5527 Sep 3 04:03 out73bc4847e4fc9b19ddc490fb6e70d7f83c9f8bb9.out.gz
-rw-rw-rw- 1 jkeenan jkeenan 4315 Sep 3 04:03 jsn73bc4847e4fc9b19ddc490fb6e70d7f83c9f8bb9.jsn.gz
-rw-rw-rw- 1 jkeenan jkeenan 4825 Sep 3 04:03 log73bc4847e4fc9b19ddc490fb6e70d7f83c9f8bb9.log.gz
-rw-rw-rw- 1 jkeenan jkeenan 1292 Sep 4 05:15 rpt1ffb4498bfa2fb91d355283a2808e6786ff8819f.rpt.gz
-rw-rw-rw- 1 jkeenan jkeenan 5516 Sep 4 05:15 out1ffb4498bfa2fb91d355283a2808e6786ff8819f.out.gz
-rw-rw-rw- 1 jkeenan jkeenan 4270 Sep 4 05:15 jsn1ffb4498bfa2fb91d355283a2808e6786ff8819f.jsn.gz
-rw-rw-rw- 1 jkeenan jkeenan 4882 Sep 4 05:15 log1ffb4498bfa2fb91d355283a2808e6786ff8819f.log.gz
#####
(I gzip files in this directory at the start of a subsequent test-smoke run.)
And, normally, I expect the tail of the logs/smokecurrent/log*.log file to look like this:
#####
...
[2020-09-04 05:15:27+0000] Gathering CoreSmokeDB information...
[2020-09-04 05:15:27+0000] Locally applied patches from '/home/jkeenan/p5smoke/perl-current/patchlevel.h'
[2020-09-04 05:15:27+0000] Patches: 'uncommitted-changes;SMOKE1ffb4498bfa2fb91d355283a2808e6786ff8819f'
[2020-09-04 05:15:27+0000] Write to '/home/jkeenan/p5smoke/perl-current/mktest.jsn': ok
[2020-09-04 05:15:27+0000] ==> Starting sendreport
[2020-09-04 05:15:27+0000] ==> Skipping mailer
[2020-09-04 05:15:27+0000] ==> Starting poster
[2020-09-04 05:15:27+0000] Posting to http://perl5.test-smoke.org/report via HTTP::Tiny.
[2020-09-04 05:15:28+0000] http://perl5.test-smoke.org/report/118090
[2020-09-04 05:15:28+0000] ==> Starting archiver
[2020-09-04 05:15:28+0000] Copy(/home/jkeenan/p5smoke/perl-current/mktest.rpt, /home/jkeenan/p5smoke/logs/smokecurrent/rpt1ffb4498bfa2fb91d355283a2808e6786ff8819f.rpt): ok
[2020-09-04 05:15:28+0000] Copy(/home/jkeenan/p5smoke/perl-current/mktest.out, /home/jkeenan/p5smoke/logs/smokecurrent/out1ffb4498bfa2fb91d355283a2808e6786ff8819f.out): ok
[2020-09-04 05:15:28+0000] Copy(/home/jkeenan/p5smoke/perl-current/mktest.jsn, /home/jkeenan/p5smoke/logs/smokecurrent/jsn1ffb4498bfa2fb91d355283a2808e6786ff8819f.jsn): ok
#####
Note that there are no entries after 05:15 UTC on Sep 4 2020. On this host I have run smokecurrent.sh several times since then, but the program dies when sending the report to the database fails. Here's what smokecurrent.log looks like now:
#####
[2020-09-19 12:08:33+0000] Locally applied patches from '/home/jkeenan/p5smoke/perl-current/patchlevel.h'
[2020-09-19 12:08:33+0000] Patches: 'uncommitted-changes;SMOKEde437ac54dd4224bc78642a4cb1d0d7d5263d1b6'
[2020-09-19 12:08:33+0000] Write to '/home/jkeenan/p5smoke/perl-current/mktest.jsn': ok
[2020-09-19 12:08:33+0000] ==> Starting sendreport
[2020-09-19 12:08:33+0000] ==> Skipping mailer
[2020-09-19 12:08:33+0000] ==> Starting poster
[2020-09-19 12:08:33+0000] Posting to http://perl5.test-smoke.org/report via HTTP::Tiny.
[2020-09-19 12:08:49+0000] POST failed: 599 Internal Exception
Use of uninitialized value $response_body in concatenation (.) or string at /usr/home/jkeenan/p5smoke/lib/Test/Smoke/Poster/Base.pm line 147.
[]: malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/home/jkeenan/p5smoke/lib/Test/Smoke/Poster/Base.pm line 146.
at /usr/home/jkeenan/p5smoke/lib/Test/Smoke/Poster/Base.pm line 147.
Test::Smoke::Poster::Base::post(Test::Smoke::Poster::HTTP_Tiny=HASH(0x804f65a20)) called at /usr/home/jkeenan/p5smoke/lib/Test/Smoke/App/SendReport.pm line 57
Test::Smoke::App::SendReport::run(Test::Smoke::App::SendReport=HASH(0x80525b888)) called at /usr/home/jkeenan/p5smoke/lib/Test/Smoke/App/SmokePerl.pm line 102
Test::Smoke::App::SmokePerl::run(Test::Smoke::App::SmokePerl=HASH(0x8050ea3a8)) called at ./tssmokeperl.pl
#####
Here is the relevant section of Test::Smoke::Poster::Base:
#####
126 =head2 $poster->post()
127
128 Post the JSON report to CoreSmokeDB.
129
130 =head3 Arguments
131
132 None.
133
134 =head3 Returns
135
136 The id of the CoreSmokeDB report on success.
137
138 =head3 Exceptions
139
140 HTTP or Test::Smoke::Gateway-application errors.
141
142 =cut
143
144 sub post {
145 my $self = shift;
146 my $response_body = eval {decode_json($self->_post_data())};
147 confess("[$response_body]: " . $@) if $@;
148 $self->_process_post_result($response_body);
149 }
#####
The 599 exception probably sets $response_body to undef at line 146. That generates the uninitialized value warning at line 147. But what is most unfortunate is the fact that the 'confess' in line 147 means that in the event of an unsuccessful POST -- $@ populated -- the entire smokecurrent.sh process dies.
That means that the 'archiver' stage of the process never runs -- which means that, once I overwrite smokecurrent.log in the top-level directory during a subsequent smoke-test run, I have no data from a given run. In the case of my FreeBSD-11 smoker, that means I've lost data that took 5-1/2 hours to generate.
Assuming that we can resurrect perl5.test-smoke.org, Test::Smoke::Poster::Base::post() should be rewritten to not completely die if a report cannot be sent via POST to the database. Perhaps this will be as simple as replacing 'confess' with 'Carp::cluck'. The archiver should then be able to run to completion.
Thank you very much.
Jim Keenan