adium 3160:0148126ffd68: Correctly identify when the last charac...
commits at adium.im
commits at adium.im
Mon Mar 22 15:49:30 UTC 2010
details: http://hg.adium.im/adium/rev/0148126ffd68
revision: 3160:0148126ffd68
author: Stephen Holt <sholt at adium.im>
date: Mon Mar 22 11:00:00 2010 -0400
Correctly identify when the last character of a potential link is puncuation, and adjust our scan range accordingly.
Fixes the case where (<some enclosed link)[:punc:] has the trailing enclosure charater incorrectly included in the link.
Subject: adium 3161:782f2894fdf5: Be a little bit less libral for links without a specifier and require a TLD or a ccTLD with a second-level domain.
details: http://hg.adium.im/adium/rev/782f2894fdf5
revision: 3161:782f2894fdf5
author: Stephen Holt <sholt at adium.im>
date: Mon Mar 22 11:49:18 2010 -0400
Be a little bit less libral for links without a specifier and require a TLD or a ccTLD with a second-level domain.
ex: B.Sc wil not link, but B.co.sc would, and http://b.sc would.
diffs (49 lines):
diff -r ed57eaa7f328 -r 782f2894fdf5 Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m
--- a/Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m Sun Mar 21 00:06:56 2010 -0400
+++ b/Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m Mon Mar 22 11:49:18 2010 -0400
@@ -263,6 +263,13 @@
// main scanning loop
while([self _scanString:m_scanString upToCharactersFromSet:skipSet intoRange:&scannedRange fromIndex:&scannedLocation]) {
+
+ // back off if the last character is puncuation.
+ unichar s_char = [m_scanString characterAtIndex:(scannedRange.location + scannedRange.length - 1)];
+ if(4 < scannedRange.length && [puncSet characterIsMember:s_char]) {
+ scannedRange.length--;
+ }
+
BOOL foundUnpairedEnclosureCharacter = NO;
// Check for and filter enclosures. We can't add (, [, etc. to the skipSet as they may be in a URI
diff -r ed57eaa7f328 -r 782f2894fdf5 Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l
--- a/Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l Sun Mar 21 00:06:56 2010 -0400
+++ b/Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l Mon Mar 22 11:49:18 2010 -0400
@@ -49,7 +49,10 @@
sTLD (com|edu|gov|int|mil|net|org|biz|info|name|pro)
uTLD (aero|coop|museum|mobi|cat|jobs|travel)
+SLDs (ac|co|gov|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|sch|edu|asn|id)
TLDs ({ccTLD}|{sTLD}|{uTLD}|arpa|local)
+uTLDs (({SLDs}\.{ccTLD})|{sTLD}|{uTLD}|arpa|local)
+
%{
/*The Unicode standard, version 4.1, table 3-6, says that the highest byte that will occur in a valid UTF-8 sequence is 0xF4.*/
%}
@@ -57,7 +60,7 @@
singleDomain [_[:alnum:]\x80-\xf4-]+
urlPath \/[^[:space:]]*
-urlSpec ({singleDomain}\.)+{TLDs}(:[0-9]+)?{urlPath}?
+urlSpec ({singleDomain}\.)+{uTLDs}(:[0-9]+)?{urlPath}?
urlCSpec {singleDomain}(\.{singleDomain})*(:[0-9]+)?{urlPath}?
ipv4address ([0-9]{1,3}\.){3}([0-9]{1,3})
@@ -70,7 +73,7 @@
ipv6URL \[{ipv6Address}](:[0-9]+)?{urlPath}?
userAtDomain [^:@\/[:space:]]+\@{singleDomain}(\.{singleDomain})*
-mailSpec {userAtDomain}\.{TLDs}
+mailSpec {userAtDomain}\.({ccTLD}|{sTLD}|{uTLD}|arpa|local)
jabberSpec xmpp:{mailSpec}{urlPath}?(\?[^[:space:]]+)?
aolIMSpec aim:goim\?screenname=[^\ \t\n&]+(&message=.+)?
aolChatSpec aim:gochat\?roomname=[^\ \t\n&]+
More information about the commits
mailing list