[Adium-devl] Status on Parental Controls & Voice/Video
Graham Booker
gbooker at cod3r.com
Sat Sep 1 15:00:02 UTC 2007
On Aug 31, 2007, at 11:35 AM, Alan Humpherys wrote:
>
> Parental Controls
> ==============
> The first area I am working on is a profanity filter. I have finally
> started the effort to port the work done in Fire over to the Adium
> source tree. As I was doing so, I decided to take a look at whether
> there is a more efficient way to do the comparison than what I am
> doing now...
>
> There are a number of time vs. memory trade-offs I can take in doing
> the comparison, and the current implementation errs on the side of
> taking more time and using less memory. I want to look at seeing if
> I can find a middle of the road approach to doing the filtering so we
> don't delay the display of messages too much. (Not that the current
> implementation is much of a delay, but it can add 50 ms to the
> processing path if you are doing strict filtering and checking each
> word against the 500 phrase dictionary)
I can suggest a number of algorithms here. I will use the following
variable in discussing runtimes:
n = number of filters, m = length of message, l = max length of filter.
We have a full range from O(n•m•l) to O(m).
Naive (bet you don't want this):
O(n•m•l)
Use rangeOfString on each filter to find matches
Rabin Karp:
O(m•l•log n) (that is worst, but the average with a good hash
function is O(m•log n)).
Very memory efficient:
This assumes you have a method to lookup a string from a hash in log
n time (NSDictionary should do this)
Essentially you compute a rolling hash, when the hash matches the
hash of a filter, you compare the filter to the substring of the
message at that location
Computing the hash of the next substring from the previous substring
is an O(1) operation.
Aho-Corasick
O(m)
Uses a fair amount of memory
You create a state machine. Each character in the message
corresponds to a state transition. If you hit a match state, then
the previous characters correspond to a complete match.
The emoticon matching within Fire is a partial implementation of
this, but on a failed match, instead of transitioning to the correct
state, it resets and starts again on the next character. That means
it falls short of the O(m) and instead becomes O(m•l), but on average
it is O(m•k) where k is the average length of the substring that was
matching a filter before it failed to match. For emoticons this
worked quite well since they tend to be short strings.
Unless you want to implement a full Aho-Corasick, you may be best off
taking the emoticon parser and use it. It has some optimizations in
it to keep the memory size of the next pointers down while still
being quite fast. It is also speed independent of the number of
filters.
If you are really concerned about memory usage, then the Rabin-Karp
(or some variant) may be the best.
>
> - Alan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1603 bytes
Desc: not available
URL: <http://adium.im/pipermail/devel_adium.im/attachments/20070901/f1c7fd66/attachment.p7s>
More information about the devel
mailing list