[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?

Graham Booker adium at cod3r.com
Fri Mar 16 15:18:19 UTC 2007


On Mar 16, 2007, at 9:54 AM, Augie Fackler wrote:

>
> On Mar 16, 2007, at 7:17 AM, Colin Barrett wrote:
>
>> On Mar 14, 2007, at 12:09 PM, Peter Hosey wrote:
>>
>>> On Mar 14, 2007, at 00:37:49, Colin Barrett wrote:
>>>> "</hello-->" is actually pretty easily solved. Don't generate the
>>>> "end of comment" until you hit a space or <!--.
>>>
>>> Bad solution. <!--hello--> is a legal comment that must be treated
>>> as such.
>>
>> Right. Which is why you delay the firing of the events.
>>
>> "-->" encountered. Possibly ambiguous. Parser reads more.
>> "hello" encountered. Still not sure what this is...
>> "<!--" encountered. Ah, we know it's a comment. Fire off the events
>> for a comment.
>>
>> Or, if it's actually a retardedly named element:
>>
>> "-->" encountered. Possibly ambiguous. Parser reads more.
>> "hello" encountered. Still not sure what this is...
>> "<" encountered. Ah, we know it's a retardedly named element. Fire  
>> off
>> the events for an element, hunt down the author of the document, and
>> shoot them in the fae.
>
> So < is not legal in comments? I thought it was, and in that case
> this comment borks you:
> <!-- < hello -->
>
> Augie
>
>>
>> -Colin
>>

I think the only way to know for sure is to scan backwards for  
"--" (which is illegal in comments according to w3.org).  If it is  
part of a "<!--" then it is a comment, otherwise it is a retardedly  
name element.  I don't think there is any other way which will catch  
all the cases, like:

<!-- <name--> (all comment)
<!-- lada--> <name-->  (comment and tag)
<name-->  </name-->  (second tag is definitely not a comment, because  
if it were, then the comment would also contain the "--", which is  
illegal.  First tag is unknown without looking at more preceding  
characters).
value--</name--> (both not a comment, since if it were, the comment  
would also contain the "--", which is illegal (echo in here?).  The  
characters before the first "--" are also not in a comment because  
the characters after it are not and the "--" alone cannot terminate a  
comment).

- Graham






More information about the devel mailing list