[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?

Graham Booker adium at cod3r.com
Wed Mar 14 19:16:31 UTC 2007


On Mar 14, 2007, at 1:32 PM, Peter Hosey wrote:

> On Mar 14, 2007, at 06:31:08, Graham Booker wrote:
>> Putting binary at the end of an XML file?  What is the advantage  
>> of this?  You have a constant number of characters at the end of  
>> the file to read in order to know how many to read to get the last  
>> element?  I can't find any other.
>
> His suggestion, as I described it, is a basic archive format.
>
> Each thingy (for lack of a better word) in the archive is a  
> complete XML document (not simply an element—a complete document)  
> followed by a binary integer size, which is of the immediately- 
> preceding document. There would be such a pair for each message in  
> the transcript (that is, there'd be one or more). So, for example:
>
> 	<XML document, 42 characters long>
> 	42
> 	<XML document, 300 characters long>
> 	300
> 	<XML document, 6 characters long>
> 	6
>
> Thus, if those were the three messages in a chatlog, we would:
>
> 	1. Seek to -4
> 	2. Read 4
> 	3. Interpret as integer; seek backward by that amount + 4 (to skip  
> over the last size) + 4 (to get the next size) (6 + 4 + 4 = 14)
> 	4. Read 4
> 	5. Interpret as integer; seek backward by that amount + 4 + 4 (300  
> + 4 + 4 = 308)
> 	6. Read 4
> 	7. Interpret as integer; seek backward by that amount + 4 + 4 (42  
> + 4 + 4 = 50)
> 	8. Hit start-of-file—we're done
>
> And unlike ULF, each message would be a complete XML document.  
> These sizes aren't inserted into a monolithic XML document to  
> navigate to random elements within it; that would be stupid. As I  
> said, it's basically an archive format.
>

OK, this is even worse than I thought.  If the format can't be parsed  
by an run of the mill XML parser, what is the point of using XML in  
the first place?  In such a case, let's just serialize objects in a  
proprietary undocumented format to achieve the same interoperability  
with other programs?  If there was only a single binary value in  
there, then at least there was the hope of an XML parser barfing upon  
hitting it but still parsing the whole document correctly, but now,  
parsing this gets much more complex.  Either you have to feed in a  
fake root element in order to concatenate the xml documents (and skip  
the document declarations), or you have to create several XML  
documents and concatenate their root elements in the internal data  
structure.

On the whole, this defeats the whole purpose of having a ULF in the  
first place.  I declare his solution as an attempt to merge a binary  
format and an open text format and achieving the worst of both worlds.

>> He suggests using this to know the size of the last element or  
>> elements.
>
> Not so. The binary trailer would follow *each* message document in  
> the file. Thus, reading the file would be a bit like following a  
> linked list, as depicted above.

Yeah, this makes more sense, but then my solution still gains the  
same benefit without killing XML parsers.

> ___________________________________
> \ Peter Hosey / boredzo at adiumx.com
> PGP public key ID: C6550423 (since 2007-01-01)
>


- Graham






More information about the devel mailing list