[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?

Peter Hosey boredzo at gmail.com
Wed Mar 14 18:32:20 UTC 2007


On Mar 14, 2007, at 06:31:08, Graham Booker wrote:
> Putting binary at the end of an XML file?  What is the advantage of  
> this?  You have a constant number of characters at the end of the  
> file to read in order to know how many to read to get the last  
> element?  I can't find any other.

His suggestion, as I described it, is a basic archive format.

Each thingy (for lack of a better word) in the archive is a complete  
XML document (not simply an element—a complete document) followed by  
a binary integer size, which is of the immediately-preceding  
document. There would be such a pair for each message in the  
transcript (that is, there'd be one or more). So, for example:

	<XML document, 42 characters long>
	42
	<XML document, 300 characters long>
	300
	<XML document, 6 characters long>
	6

Thus, if those were the three messages in a chatlog, we would:

	1. Seek to -4
	2. Read 4
	3. Interpret as integer; seek backward by that amount + 4 (to skip  
over the last size) + 4 (to get the next size) (6 + 4 + 4 = 14)
	4. Read 4
	5. Interpret as integer; seek backward by that amount + 4 + 4 (300 +  
4 + 4 = 308)
	6. Read 4
	7. Interpret as integer; seek backward by that amount + 4 + 4 (42 +  
4 + 4 = 50)
	8. Hit start-of-file—we're done

And unlike ULF, each message would be a complete XML document. These  
sizes aren't inserted into a monolithic XML document to navigate to  
random elements within it; that would be stupid. As I said, it's  
basically an archive format.

> He suggests using this to know the size of the last element or  
> elements.

Not so. The binary trailer would follow *each* message document in  
the file. Thus, reading the file would be a bit like following a  
linked list, as depicted above.
___________________________________
\ Peter Hosey / boredzo at adiumx.com
PGP public key ID: C6550423 (since 2007-01-01)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
URL: <http://adium.im/pipermail/devel_adium.im/attachments/20070314/4dabacc2/attachment.sig>


More information about the devel mailing list