Skip Menu |

This queue is for tickets about the WWW-Curl CPAN distribution.

Report information
The Basics
Id: 61569
Status: resolved
Priority: 0/
Queue: WWW-Curl

People
Owner: Nobody in particular
Requestors: andy.jenkinson [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: redirect handling
Date: Wed, 22 Sep 2010 18:08:34 +0100
To: bug-WWW-Curl [...] rt.cpan.org
From: Andy Jenkinson <andy.jenkinson [...] gmail.com>
Hi, Not being familiar with the inner workings, I'm not sure if this can/should be resolved in WWW::Curl, but here goes: When setting CURLOPT_FOLLOWLOCATION to 1, curl will follow redirects and fetch the new response appropriately. However, the filehandles specified in CURLOPT_WRITEHEADER (and I assume CURLOPT_WRITEDATA but I have not tested this) are written to multiple times - once per server response. This means that in a typical 301/302 redirect situation the header filehandle will contain two headers once finished, ending up looking like: HTTP/1.1 301 Moved Permanently Location: http://uri.of.redirect ... etc HTTP/1.1 200 OK ... etc If not addressable efficiently in WWW::Curl, I suggest listing as a limitation just to make others aware? Cheers, Andy
On Wed Sep 22 13:08:48 2010, andy.jenkinson@gmail.com wrote: Show quoted text
> Hi, > > Not being familiar with the inner workings, I'm not sure if this > can/should be resolved in WWW::Curl, but here goes: > > When setting CURLOPT_FOLLOWLOCATION to 1, curl will follow redirects > and fetch the new response appropriately. However, the filehandles > specified in CURLOPT_WRITEHEADER (and I assume CURLOPT_WRITEDATA > but I have not tested this) are written to multiple times - once > per server response. This means that in a typical 301/302 redirect > situation the header filehandle will contain two headers once > finished, ending up looking like: > > HTTP/1.1 301 Moved Permanently > Location: http://uri.of.redirect > ... etc > > HTTP/1.1 200 OK > ... etc > > If not addressable efficiently in WWW::Curl, I suggest listing as a > limitation just to make others aware? > > Cheers, > Andy
Hey, I think it's default behaviour for libcurl to output all header information, if a redirect happened and header data was requested. The reasoning is a bit deep, but I think it's because either the application is not interested in headers, only content - in the case that the body gets processed, or the application is interested in the headers (for example, to build an HTTP::Response object from it). In the latter case, one of the reasons you want to hang on to the 301/302's header data is that it might contain cookies (think of the "POST /foo/login -> 301/302 -> GET /" case). So basically the application writer has three choices to resolve this: 1. Don't bother with headers at all. Most information can be extracted from getinfo[1] easily. 2. Disable CURLOPT_FOLLOWLOCATION and do redirect handling in the application code. The url to redirect to is available from getinfo with the CURLINFO_REDIRECT_URL constant. One thing to watch for is the POST->redirect->GET behaviour that's common in browsers. 3. Implement the suggested method in the CURLOPT_HEADERFUNCTION[2] documentation, that is delimit http responses based on the http status line. I will be updating the documentation to link to this bugreport along a short explanation in 4.14 [1] http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html [2] http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTHEADERFUNCTION
Subject: Re: [rt.cpan.org #61569] redirect handling
Date: Sun, 17 Oct 2010 09:26:45 +0100
To: "bug-WWW-Curl [...] rt.cpan.org" <bug-WWW-Curl [...] rt.cpan.org>
From: Andy Jenkinson <andy.jenkinson [...] gmail.com>
Thank you, very helpful! On 16 Oct 2010, at 21:53, "Balint Szilakszi via RT" <bug-WWW-Curl@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=61569 > > > On Wed Sep 22 13:08:48 2010, andy.jenkinson@gmail.com wrote:
>> Hi, >> >> Not being familiar with the inner workings, I'm not sure if this >> can/should be resolved in WWW::Curl, but here goes: >> >> When setting CURLOPT_FOLLOWLOCATION to 1, curl will follow redirects >> and fetch the new response appropriately. However, the filehandles >> specified in CURLOPT_WRITEHEADER (and I assume CURLOPT_WRITEDATA >> but I have not tested this) are written to multiple times - once >> per server response. This means that in a typical 301/302 redirect >> situation the header filehandle will contain two headers once >> finished, ending up looking like: >> >> HTTP/1.1 301 Moved Permanently >> Location: http://uri.of.redirect >> ... etc >> >> HTTP/1.1 200 OK >> ... etc >> >> If not addressable efficiently in WWW::Curl, I suggest listing as a >> limitation just to make others aware? >> >> Cheers, >> Andy
> > Hey, > > I think it's default behaviour for libcurl to output all header > information, if a redirect happened and header data was requested. > > The reasoning is a bit deep, but I think it's because either the > application is not interested in headers, only content - in the case > that the body gets processed, or the application is interested in the > headers (for example, to build an HTTP::Response object from it). In the > latter case, one of the reasons you want to hang on to the 301/302's > header data is that it might contain cookies (think of the "POST > /foo/login -> 301/302 -> GET /" case). > > So basically the application writer has three choices to resolve this: > > 1. Don't bother with headers at all. Most information can be extracted > from getinfo[1] easily. > 2. Disable CURLOPT_FOLLOWLOCATION and do redirect handling in the > application code. The url to redirect to is available from getinfo with > the CURLINFO_REDIRECT_URL constant. One thing to watch for is the > POST->redirect->GET behaviour that's common in browsers. > 3. Implement the suggested method in the CURLOPT_HEADERFUNCTION[2] > documentation, that is delimit http responses based on the http status line. > > I will be updating the documentation to link to this bugreport along a > short explanation in 4.14 > > [1] http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html > [2] > http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTHEADERFUNCTION
Release 4.14 is out, thus I'm resolving this ticket.