Subject: | extract_text does not work for some queries |
For any odd reason, Babelfish does usually include the translated text
as value of an <input type=hidden name="q">-tag, which WWW::Babelfish
currently relies on, but reproducibly does not so for some texts, e.g.
perl -MWWW::Babelfish -le '$b=new
WWW::Babelfish;print$b->translate(source=>"German",destination=>"English",text=>"Neuhaus
am Rennweg, Stadt")'
You'll find more examples to reproduce this problem in the attachment.
To have that work, I suggest changing the extract_text routine for
Babelfish to the following:
# Extract the text from the html we get back from babelfish
# and return it
extract_text => sub {
my($html) = @_;
my $p = HTML::TokeParser->new(\$html);
while ( my $_tag = $p->get_tag('div') ) {
my($tag,$attr,$attrseq,$text) = @$_tag;
next unless @$attrseq == 1
&& $attrseq->[-1] eq 'style'
&& $attr->{style} eq 'padding:10px;';
my($token) = $p->get_token or return;
my ( $type, $text, $is_data ) = @$token;
next if $type ne 'T';
return decode( utf8 => $text );
}
}
Regards,
fany
Subject: | textlist |
Message body not shown because it is not plain text.