[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?
Colin Barrett
timber at lava.net
Wed Mar 14 07:37:49 UTC 2007
On Mar 13, 2007, at 11:48 PM, Peter Hosey wrote:
> Found this in my referer logs:
>
> http://www.oreillynet.com/xml/blog/2007/03/parsing_xml_backwards.html
>
> It's an article about LMX and the various ways it's a Bad Idea. Some
> are better than others, but anyway, the article is definitely worth
> a read. Also, I have a comment in there.
A lot of his objections assume a pretty naive parser. There's no
reason you *can't* look at the doctype and such first. In fact, they
may be necessary.
"</hello-->" is actually pretty easily solved. Don't generate the "end
of comment" until you hit a space or <!--.
These edge cases are important
> He makes a good suggestion:
>
>> You can write multiple well-formed XML documents to a single file,
>> following each one by a binary trailer that gives the size of the
>> last chunk of XML. Then it is trivial for code to jump backwards
>> through the file, grabbing a little document each time and passing
>> it to a real XML parser.
>
> This is an interesting idea. It would, essentially, be an archive of
> mini-XML-documents (which I suppose would be a bit like Colloquy's
> envelope element), which we could easily seek in reverse.
>
> The downside is that it wouldn't work well with most existing XML
> tools—we couldn't simply slurp a log file and pass it to
> NSXMLParser, WebKit, or anything else, without preprocessing it to
> remove those size markers. OTOH, it wouldn't be terribly hard to
> write such a preprocessor. XSLT could do the job.
This is the main advantage of an XML log format, IMHO.
> The other downside is that we already have ULF and LMX; this would
> be yet another log format, whose main reason for existence would be
> the fact that LMX won't work 100% of the time with XML from the sort
> of people who name their elements “hello--”.
Creating markup that isn't well-formed isn't terribly useful, IMO. The
nice part about ULF is that other XML tools can process our the log
files. If we're going to write out XML that isn't well formed, we
might as well use a tab delimited format or something.
> I'm inclined to stay with ULF, but I wanted to bounce it off you guys.
ULF needs a standards process, but it's otherwise it's pretty good. It
also needs to be better supported in the log viewer :\
-Colin
More information about the devel
mailing list