Subject: | Optimize runs wild, swamping stderr |
call xmlshell.pl < strange-error.xml
When you do this, it will add a document to the index, and then optimize and close the index. During optimization, Plucene runs havoc, swaping stderr with two different character sequences (maybe UTF-8). See the image in the attached files. The "appearant" spaces are either of 0x82 or 0x83, the Terminal is in Latin-9 mode.
Some information from the Perl debugger:
ÂÃÂÃÂÃÂÃÂÃÂÃÂenabstand (content lt content) at /usr/share/perl5/Plucene/Index/SegmentMerger.pm line 151
Plucene::Index::TermInfosWriter::add('Plucene::Index::TermInfosWriter=HASH(0x8bce4c8)', 'Plucene::Index::Term=HASH(0x8bcd720)', 'Plucene::Index::TermInfo=HASH(0x8bd3490)') called at /usr/share/perl5/Plucene/Index/SegmentMerger.pm line 151
Plucene::Index::SegmentMerger::_merge_term_info('Plucene::Index::SegmentMerger=HASH(0x8b6cc58)', 'Plucene::Index::SegmentMergeInfo=HASH(0x8bd15f0)') called at /usr/share/perl5/Plucene/Index/SegmentMerger.pm line 138
Plucene::Index::SegmentMerger::_merge_term_infos('Plucene::Index::SegmentMerger=HASH(0x8b6cc58)') called at /usr/share/perl5/Plucene/Index/SegmentMerger.pm line 109
Plucene::Index::SegmentMerger::_merge_terms('Plucene::Index::SegmentMerger=HASH(0x8b6cc58)') called at /usr/share/perl5/Plucene/Index/SegmentMerger.pm line 78
Plucene::Index::SegmentMerger::merge('Plucene::Index::SegmentMerger=HASH(0x8b6cc58)') called at /usr/share/perl5/Plucene/Index/Writer.pm line 280
Plucene::Index::Writer::_merge_segments('Plucene::Index::Writer=HASH(0x8b6ebd8)', 0) called at /usr/share/perl5/Plucene/Index/Writer.pm line 206
Plucene::Index::Writer::optimize('Plucene::Index::Writer=HASH(0x8b6ebd8)') called at Midcom/Plucene/RequestProcessor.pm line 121
Midcom::Plucene::RequestProcessor::close('Midcom::Plucene::RequestProcessor=HASH(0x8b315fc)') called at xmlshell.pl line 19
Plucene::Index::TermInfosWriter::add(/usr/share/perl5/Plucene/Index/TermInfosWriter.pm:93):
93: carp "Frequency pointer out of order"
94: if $ti->freq_pointer < $self->{last_ti}->freq_pointer;
The size of this output mess grows exponentially(!) with the number of documents in the index, so right now I had to disable the optimization sequence to even be able to *test* the system.
Interestingly, I do not know how this corrupt information comes together, it is definitly not part of the docuemnt I want to store into Plucene, you can easily verify this by turning on the $request->dump line in XMLComm.pl::_ParseIndex.
If you need any further information, please ask. You can also have shell access to the box where this script is currently being developed, in case there may be version inconsistencies.
Message body not shown because it is not plain text.