[Adium-devl] O'Reilly XML blog article: Parsing XML… backwards?
Peter Hosey
boredzo at gmail.com
Wed Mar 14 18:32:20 UTC 2007
On Mar 14, 2007, at 06:31:08, Graham Booker wrote:
> Putting binary at the end of an XML file? What is the advantage of
> this? You have a constant number of characters at the end of the
> file to read in order to know how many to read to get the last
> element? I can't find any other.
His suggestion, as I described it, is a basic archive format.
Each thingy (for lack of a better word) in the archive is a complete
XML document (not simply an element—a complete document) followed by
a binary integer size, which is of the immediately-preceding
document. There would be such a pair for each message in the
transcript (that is, there'd be one or more). So, for example:
<XML document, 42 characters long>
42
<XML document, 300 characters long>
300
<XML document, 6 characters long>
6
Thus, if those were the three messages in a chatlog, we would:
1. Seek to -4
2. Read 4
3. Interpret as integer; seek backward by that amount + 4 (to skip
over the last size) + 4 (to get the next size) (6 + 4 + 4 = 14)
4. Read 4
5. Interpret as integer; seek backward by that amount + 4 + 4 (300 +
4 + 4 = 308)
6. Read 4
7. Interpret as integer; seek backward by that amount + 4 + 4 (42 +
4 + 4 = 50)
8. Hit start-of-file—we're done
And unlike ULF, each message would be a complete XML document. These
sizes aren't inserted into a monolithic XML document to navigate to
random elements within it; that would be stupid. As I said, it's
basically an archive format.
> He suggests using this to know the size of the last element or
> elements.
Not so. The binary trailer would follow *each* message document in
the file. Thus, reading the file would be a bit like following a
linked list, as depicted above.
___________________________________
\ Peter Hosey / boredzo at adiumx.com
PGP public key ID: C6550423 (since 2007-01-01)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
URL: <http://adium.im/pipermail/devel_adium.im/attachments/20070314/4dabacc2/attachment.sig>
More information about the devel
mailing list