[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?
Graham Booker
adium at cod3r.com
Fri Mar 16 15:18:19 UTC 2007
On Mar 16, 2007, at 9:54 AM, Augie Fackler wrote:
>
> On Mar 16, 2007, at 7:17 AM, Colin Barrett wrote:
>
>> On Mar 14, 2007, at 12:09 PM, Peter Hosey wrote:
>>
>>> On Mar 14, 2007, at 00:37:49, Colin Barrett wrote:
>>>> "</hello-->" is actually pretty easily solved. Don't generate the
>>>> "end of comment" until you hit a space or <!--.
>>>
>>> Bad solution. <!--hello--> is a legal comment that must be treated
>>> as such.
>>
>> Right. Which is why you delay the firing of the events.
>>
>> "-->" encountered. Possibly ambiguous. Parser reads more.
>> "hello" encountered. Still not sure what this is...
>> "<!--" encountered. Ah, we know it's a comment. Fire off the events
>> for a comment.
>>
>> Or, if it's actually a retardedly named element:
>>
>> "-->" encountered. Possibly ambiguous. Parser reads more.
>> "hello" encountered. Still not sure what this is...
>> "<" encountered. Ah, we know it's a retardedly named element. Fire
>> off
>> the events for an element, hunt down the author of the document, and
>> shoot them in the fae.
>
> So < is not legal in comments? I thought it was, and in that case
> this comment borks you:
> <!-- < hello -->
>
> Augie
>
>>
>> -Colin
>>
I think the only way to know for sure is to scan backwards for
"--" (which is illegal in comments according to w3.org). If it is
part of a "<!--" then it is a comment, otherwise it is a retardedly
name element. I don't think there is any other way which will catch
all the cases, like:
<!-- <name--> (all comment)
<!-- lada--> <name--> (comment and tag)
<name--> </name--> (second tag is definitely not a comment, because
if it were, then the comment would also contain the "--", which is
illegal. First tag is unknown without looking at more preceding
characters).
value--</name--> (both not a comment, since if it were, the comment
would also contain the "--", which is illegal (echo in here?). The
characters before the first "--" are also not in a comment because
the characters after it are not and the "--" alone cannot terminate a
comment).
- Graham
More information about the devel
mailing list