Subject: | Problem decoding http contenet |
Date: | Fri, 22 Oct 2010 04:28:59 -0700 (PDT) |
To: | bug-Net-Analysis [...] rt.cpan.org |
From: | marios [...] cs.ucr.edu |
Hi,
I first want to congratulate you for this excellent libraries! I go
directly to the problem: I use Net-Analysis-0.41 on perl v5.8.8 on a Linux
version 2.6.18-194.11.4.el5.
My goal is to extract hyperlinks from a pcap file. I use the following code:
----------------------------------------
use strict;
use warnings;
require Exporter;
use Data::Dumper;
use Net::Analysis::Dispatcher;
use Net::Analysis::EventLoop;
use Net::Analysis::Listener::TCP;
use Net::Analysis::Listener::HTTP;
my ($d) = Net::Analysis::Dispatcher->new();
my ($el) = Net::Analysis::EventLoop->new (dispatcher => $d);
my $mon_obj_tcp = Net::Analysis::Listener::TCP->new(dispatcher => $d);
my $mon_obj_http = Net::Analysis::Listener::HTTP->new(dispatcher => $d);
my $mon_obj_base = HTTPCollector->new(dispatcher => $d);
my $target = shift;
die "could not read file '$target'\n" if (! -r $target);
$el->loop_file (filename => $target);
----------------------------------------
The HTTPCollector object inherits from (Net::Analysis::Listener::Base) and
tries to modify the http_transaction event in order to parse the HTTP
requests. Inside the http_transaction function I noticed that sometimes
the $resp->decoded_content() function does not decode the content.
Returns undefl.
I tried to find out why and I noticed that sometimes the payload itself is
not a vlid gzip compressed object. This only happens when HTTP uses the
Transfer-Encoding option set to chunked.
if ( defined $resp->header("Transfer-Encoding") ) {
if ( $resp->header("Transfer-Encoding") =~ /chunked/ ) {
}
}
In this case the payload (content) has this particular format:
<Offset in hex 1> (this i an 8 digit hex number denoting the size of the
first gziped chunk)
Gziped chunk
<Offeset in hex 2>
Gziped chunk
Is not hard to go through the content and remove the <Offset in hex *> and
any new line characters that might exists there.
The problem happens frequently. For example, Bing and Facebook use the
Transfer-Encoding option when they use a keep-alive TCP connection for
their HTTP requests (which the do so quite frequently).
I currently have a code that strips the payload from these offset pointers
and sovles the problem. I can send you the code if you like! :)
Thank you so much for putting this together!!!!
Best regards
Marios