[Adium-devl] Improving BiDi support
Ofri
ofri.wolfus at gmail.com
Wed Sep 3 23:03:42 UTC 2008
Hi Everyone.
After thinking about http://trac.adiumx.com/ticket/10870 for a while,
I came up with some suggestions about how to improve BiDi support, but
I'd like to hear everyone else's opinions as it's rather complicated.
As you probably know, Adium already has a quite good BiDi support, but
it's not perfect. In fact, BiDi is so hard to get right that the -
ONLY- app that gets it right is the latest MS Word on Windows. Even
the Unicode standard itself doesn't get it right.
The fundamental issue with text that mixes more than one writing
direction is to figure the user's intention about:
1. the direction of each paragraph (in our case, the message), and
2. the direction of weak characters (weak characters are characters
like punctuation marks and numbers that are identical in both LTR and
RTL languages, and by themselves have no direction).
The current implementation is rather simple, and works in some usage
cases better than in others. What it does is loop through the
characters of a message until a strong character is found, and use
that character to define the direction of the entire message. As I
explained in the ticket, there's no way (at least that I'm aware of)
to correctly determine the base direction of a mixed text, so this
algorithm is good enough both in terms of speed and correctness.
The problem begins when sending/receiving messages made up of only
weak characters and/or emoticons. If the message contains only weak
characters, the algorithm uses NSWritingDirectionNatural as the
direction, which aligns the message however defined in the message
style (which left unless someone uses an RTL style). The problem with
emoticons is that the direction algorithm is scanning the message as a
simple string without knowing anything about them, so if for example I
send the message ":-P" I'll be aligned left because "P" is a strong
LTR character.
AFAIK, the only protocol aware of directionality is MSN, which has an
RTL flag that marks messages as RTL (Adium currently sends RTL
messages with this flag but ignores it for incoming messages). The
official client simply sets this flag based on the direction of the
input field, without even looking at the content of the message.
Obviously this is over-simplified, and fails for conversations in more
than one language (usually technical conversations where you explain
in your local RTL language and write terms in English). As simplified
as this algorithm is, it does have one advantage which is to correctly
display messages made up of only weak characters and/or emoticons in a
single language conversation.
Based on the above, my improved BiDi algorithm goes like this:
1. Remember the direction of the last sent and the last received
messages.
2. When a new message is written/received, scan it for emoticons and
remove them. Keep the stripped string.
3. Try to determine the direction of the stripped message, using the
algorithm described above (implemented as an NSString category in the
FriBiDi framework).
3.1 If the string has a direction, remember it and display the
complete message with that direction.
3.2 If the string doesn't have a direction, assume it's a
continuation of the previous message and use the stored direction.
3.3 The string doesn't have a direction, and there's no previous
direction.
3.3.1 If the message is outgoing, use the input field's direction .
3.3.2 If the message is incoming, use the previous outgoing
direction if available, and if not leave it up to the message style.
This assumes that if I initiated a conversation in one language, the
other side will usually answer in the same language.
In addition, I also suggest a behavior change to the writing direction
contextual menu, which will disable it when we know the direction and
enable it when we don't. This will allow the user to override our
algorithm for some edge cases. This will also make the input of the
user consistent with the way it's displayed in the message view.
This solution seems rather complex for something that's not critical
for the client to work, and that most users won't even notice (RTL
users are a minority AFAIK. BTW, why not to include this info with
Sparkle?). In addition, I don't know how easy it will be to strip
emoticons from a string (never touched that code), and if it'll slow
things down. Finally, I doubt I'll have the time to implement what I
described above anytime soon.
What do other people think? Sure it'd be nice to be able to say that
Adium is the only IM client on all platforms that properly supports
BiDi conversations, but does it worth the effort?
Ofri
More information about the devel
mailing list