Skip Menu |

This queue is for tickets about the XML-Atom CPAN distribution.

Report information
The Basics
Id: 69180
Status: open
Priority: 0/
Queue: XML-Atom

People
Owner: Nobody in particular
Requestors: len.budney [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.39
Fixed in: (no value)



Subject: XML::Atom chokes on doctypes--from Google Calendar, for example
Google Calendar returns atoms that begin with "<!doctype html>". I know that's utterly pointless, and it may be a violation of the Atom spec--I haven't checked that--but it results in a fatal error using XML::Atom to parse Google's feeds. XML::Atom needs to handle general SGML directives like doctype gracefully. If it can't, then it should at least accommodate some pretty important feeds by stripping the doctype statement before attempting to parse.
Subject: Re: [rt.cpan.org #69180] XML::Atom chokes on doctypes--from Google Calendar, for example
Date: Wed, 29 Jun 2011 09:45:19 -0400
To: bug-XML-Atom [...] rt.cpan.org
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
On Wed, Jun 29, 2011 at 9:20 AM, Leonard R Budney via RT <bug-XML-Atom@rt.cpan.org> wrote: Show quoted text
> Google Calendar returns atoms that begin with "<!doctype html>". I know that's utterly > pointless, and it may be a violation of the Atom spec--I haven't checked that--but it results in a > fatal error using XML::Atom to parse Google's feeds. > > XML::Atom needs to handle general SGML directives like doctype gracefully. If it can't, then it > should at least accommodate some pretty important feeds by stripping the doctype statement > before attempting to parse.
I disagree. Determining which feeds are "important" is totally arbitrary, and for generic purpose modules like XML::Atom it's dangerous thing to do and be even prune to security hole. That said, you should take a look at XML::Liberal first, and see if it can "sanitize" the broken XML like that before passing it to XML::Atom. -- Tatsuhiko Miyagawa
In this case, it's ALWAYS valid for XML to include a doctype; failing to handle that is a bug-- despite my agreement that doctypes are rather superfluous in XML and that spuriously adding them is a mistake. When a standard (like Atom) says "Foo is an XML document which...." then you're on the hook for supporting valid XML which...
Subject: Re: [rt.cpan.org #69180] XML::Atom chokes on doctypes--from Google Calendar, for example
Date: Wed, 29 Jun 2011 10:19:34 -0400
To: bug-XML-Atom [...] rt.cpan.org
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
You said you haven't checked if it validates on feed validator. Why not do it now. If it doesn't validate, there's no reason for XML::Atom to support them. On Wed, Jun 29, 2011 at 10:14 AM, Leonard R Budney via RT <bug-XML-Atom@rt.cpan.org> wrote: Show quoted text
>       Queue: XML-Atom >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=69180 > > > In this case, it's ALWAYS valid for XML to include a doctype; failing to handle that is a bug-- > despite my agreement that doctypes are rather superfluous in XML and that spuriously adding > them is a mistake. > > When a standard (like Atom) says "Foo is an XML document which...." then you're on the hook for > supporting valid XML which... >
-- Tatsuhiko Miyagawa