[Adium-devl] Status on Parental Controls & Voice/Video

Graham Booker gbooker at cod3r.com
Sat Sep 1 15:00:02 UTC 2007


On Aug 31, 2007, at 11:35 AM, Alan Humpherys wrote:

>
> Parental Controls
> ==============
> The first area I am working on is a profanity filter.  I have finally
> started the effort to port the work done in Fire over to the Adium
> source tree.  As I was doing so, I decided to take a look at whether
> there is a more efficient way to do the comparison than what I am
> doing now...
>
> There are a number of time vs. memory trade-offs I can take in doing
> the comparison, and the current implementation errs on the side of
> taking more time and using less memory.  I want to look at seeing if
> I can find a middle of the road approach to doing the filtering so we
> don't delay the display of messages too much.  (Not that the current
> implementation is much of a delay,  but it can add 50 ms to the
> processing path if you are doing strict filtering and checking each
> word against the 500 phrase dictionary)

I can suggest a number of algorithms here.  I will use the following  
variable in discussing runtimes:
n = number of filters, m = length of message, l = max length of filter.
We have a full range from O(n•m•l) to O(m).

Naive (bet you don't want this):
O(n•m•l)
Use rangeOfString on each filter to find matches

Rabin Karp:
O(m•l•log n) (that is worst, but the average with a good hash  
function is O(m•log n)).
Very memory efficient:
This assumes you have a method to lookup a string from a hash in log  
n time (NSDictionary should do this)
Essentially you compute a rolling hash, when the hash matches the  
hash of a filter, you compare the filter to the substring of the  
message at that location
Computing the hash of the next substring from the previous substring  
is an O(1) operation.

Aho-Corasick
O(m)
Uses a fair amount of memory
You create a state machine.  Each character in the message  
corresponds to a state transition.  If you hit a match state, then  
the previous characters correspond to a complete match.
The emoticon matching within Fire is a partial implementation of  
this, but on a failed match, instead of transitioning to the correct  
state, it resets and starts again on the next character.  That means  
it falls short of the O(m) and instead becomes O(m•l), but on average  
it is O(m•k) where k is the average length of the substring that was  
matching a filter before it failed to match.  For emoticons this  
worked quite well since they tend to be short strings.

Unless you want to implement a full Aho-Corasick, you may be best off  
taking the emoticon parser and use it.  It has some optimizations in  
it to keep the memory size of the next pointers down while still  
being quite fast.  It is also speed independent of the number of  
filters.
If you are really concerned about memory usage, then the Rabin-Karp  
(or some variant) may be the best.

>
> - Alan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1603 bytes
Desc: not available
URL: <http://adium.im/pipermail/devel_adium.im/attachments/20070901/f1c7fd66/attachment.p7s>


More information about the devel mailing list