Bug #111232 for WWW-Github-Files: Use raw.githubusercontent.com instead of api.github.com?

Sat Jan 16 08:33:21 2016 SREZIC [...] cpan.org - Ticket created

Subject:

Use raw.githubusercontent.com instead of api.github.com?

Github files are also available using github's "raw" URLs, i.e. https://github.com/$user/$repo/raw/$branch/$path (which is redirected to a raw.githubusercontent.com URL). Accessing this URL has some advantages over the api.github.com URL: * probably there's no rate limiting --- currently api.github.com has a rate limitation of 60 requests per hour per IP, which could be easily exceeded (in fact, my smoker machines managed to produce fail reports against your module because of this limit) * less dependencies --- JSON and MIME64::Base are not needed as the returned content is already raw and unencoded * faster and less traffic --- because the content is raw and not base64-encoded, less traffic goes over the wire, and it's not needed to use the CPU to do the extra decoding. I also suspect that the raw.githubusercontent.com URL is anyway faster than the github API. So it could be beneficial to rewrite the ...::File module to use raw URLs. The ...::Dir module still would need to use the github API.

Sat Jan 16 12:19:18 2016 SREZIC [...] cpan.org - Correspondence added

On 2016-01-16 08:33:21, SREZIC wrote: Show quoted text

> Github files are also available using github's "raw" URLs, i.e. > https://github.com/$user/$repo/raw/$branch/$path (which is redirected > to a raw.githubusercontent.com URL). Accessing this URL has some > advantages over the api.github.com URL: > > * probably there's no rate limiting --- currently api.github.com has a > rate limitation of 60 requests per hour per IP, which could be easily > exceeded (in fact, my smoker machines managed to produce fail reports > against your module because of this limit) > > * less dependencies --- JSON and MIME64::Base are not needed as the > returned content is already raw and unencoded > > * faster and less traffic --- because the content is raw and not > base64-encoded, less traffic goes over the wire, and it's not needed > to use the CPU to do the extra decoding. I also suspect that the > raw.githubusercontent.com URL is anyway faster than the github API. > > So it could be beneficial to rewrite the ...::File module to use raw > URLs. The ...::Dir module still would need to use the github API.

Another advantage: the API access is restricted to 1 MB in size. This one gets if trying to fetch larger files: $ perl5.22.1 -MData::Dumper -MWWW::Github::Files -E 'say Dumper(WWW::Github::Files->new(author => "GNOME", resp => "libgweather", branch => "master")->get_file("/data/Locations.xml.in"))' | less Failed to read https://api.github.com/repos/GNOME/libgweather/contents/data/Locations.xml.in?ref=40a81b44ca7532a65f529a0de59bb2fa1c55dd00 from github: Forbidden, {"message":"This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.","errors":[{"resource":"Blob","field":"data","code":"too_large"}],"documentation_url":"https://developer.github.com/v3/repos/contents/#get-contents"} at /usr/perl5.22.1sp/lib/site_perl/5.22.1/WWW/Github/Files.pm line 103.

Mon Jan 18 02:51:53 2016 shmuelfomberg [...] gmail.com - Correspondence added

Well, you are probably right. There could be a feature that the module will detect if it is possible to fetch the file using the raw service, and use it. (there are limitation, so it is not always true) However, I'm not developing in Perl currently. Patches welcome.

Mon Jan 18 02:51:53 2016 The RT System itself - Status changed from 'new' to 'open'