[Adium-devl] Unicode support in AIHTMLDecoder

Ryan Govostes rgovostes at gmail.com
Tue Jul 15 11:39:17 UTC 2008


I've noticed that when pasting text with line breaks, especially when  
copied from Safari or Word, sometimes shows up in the message view  
stripped of all breaks. More interestingly, though the problem shows  
up in the message view it does not appear in the logs or in Growl  
notifications.

Some playing around shows that the following HTML, rendered by Safari  
and pasted into Adium, triggers the issue:

<p>
This is on the first line.<br />
This is on the second line.
</p>

However, remove those paragraph tags and the resulting paste doesn't  
show the issue. And render it in Firefox and the paste will work fine  
too.

As it turns out, the paste is of type NSRTFPboardType, and the problem  
arises when Safari copies the line breaks as Unicode (U+2028, line  
separator) instead of ASCII (0x0A, line feed). It ends up bypassing  
AIHTMLDecoder's substitution routines, which are only set up to  
recognize \n and \r:

- When sending the message to AIM's server, thingsToInclude.nonASCII =  
false, so it does a very rudimentary find/replace of \r\n, \r, and \n.

- When sending the message to the message view,  
thingsToInclude.nonASCII = true, so we end up around line 620 being  
escaped as &#x2028;

Ideally this code would be updated to properly replace all Unicode  
line breaks as <br>. Wikipedia has the exhaustive list taken from the  
Unicode Standard 4.0 guidelines:

http://en.wikipedia.org/wiki/Newline#Unicode

Otherwise, Apple's character sets and Unicode utilities don't seem to  
include all of those (strangely enough).

Regards,
Ryan Govostes




More information about the devel mailing list