adium 3160:0148126ffd68: Correctly identify when the last charac...

commits at adium.im commits at adium.im
Mon Mar 22 15:49:30 UTC 2010


details:	http://hg.adium.im/adium/rev/0148126ffd68
revision:	3160:0148126ffd68
author:		Stephen Holt <sholt at adium.im>
date:		Mon Mar 22 11:00:00 2010 -0400

Correctly identify when the last character of a potential link is puncuation, and adjust our scan range accordingly.

Fixes the case where (<some enclosed link)[:punc:] has the trailing enclosure charater incorrectly included in the link.
Subject: adium 3161:782f2894fdf5: Be a little bit less libral for links without a specifier and require a TLD or a ccTLD with a second-level domain.

details:	http://hg.adium.im/adium/rev/782f2894fdf5
revision:	3161:782f2894fdf5
author:		Stephen Holt <sholt at adium.im>
date:		Mon Mar 22 11:49:18 2010 -0400

Be a little bit less libral for links without a specifier and require a TLD or a ccTLD with a second-level domain.

ex: B.Sc wil not link, but B.co.sc would, and http://b.sc would.

diffs (49 lines):

diff -r ed57eaa7f328 -r 782f2894fdf5 Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m
--- a/Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m	Sun Mar 21 00:06:56 2010 -0400
+++ b/Frameworks/AutoHyperlinks Framework/Source/AHHyperlinkScanner.m	Mon Mar 22 11:49:18 2010 -0400
@@ -263,6 +263,13 @@
 	
 	// main scanning loop
 	while([self _scanString:m_scanString upToCharactersFromSet:skipSet intoRange:&scannedRange fromIndex:&scannedLocation]) {
+		
+		// back off if the last character is puncuation.
+		unichar s_char = [m_scanString characterAtIndex:(scannedRange.location + scannedRange.length - 1)];
+		if(4 < scannedRange.length && [puncSet characterIsMember:s_char]) {
+			scannedRange.length--;
+		}
+		
 		BOOL foundUnpairedEnclosureCharacter = NO;
 		
 		// Check for and filter enclosures.  We can't add (, [, etc. to the skipSet as they may be in a URI
diff -r ed57eaa7f328 -r 782f2894fdf5 Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l
--- a/Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l	Sun Mar 21 00:06:56 2010 -0400
+++ b/Frameworks/AutoHyperlinks Framework/Source/AHLinkLexer.l	Mon Mar 22 11:49:18 2010 -0400
@@ -49,7 +49,10 @@
 sTLD            (com|edu|gov|int|mil|net|org|biz|info|name|pro)
 uTLD            (aero|coop|museum|mobi|cat|jobs|travel)
 
+SLDs            (ac|co|gov|ltd|me|mod|net|nhs|nic|org|parliament|plc|police|sch|edu|asn|id)
 TLDs            ({ccTLD}|{sTLD}|{uTLD}|arpa|local)
+uTLDs           (({SLDs}\.{ccTLD})|{sTLD}|{uTLD}|arpa|local)
+
 %{
 /*The Unicode standard, version 4.1, table 3-6, says that the highest byte that will occur in a valid UTF-8 sequence is 0xF4.*/
 %}
@@ -57,7 +60,7 @@
 singleDomain    [_[:alnum:]\x80-\xf4-]+
 
 urlPath         \/[^[:space:]]*
-urlSpec         ({singleDomain}\.)+{TLDs}(:[0-9]+)?{urlPath}?
+urlSpec         ({singleDomain}\.)+{uTLDs}(:[0-9]+)?{urlPath}?
 urlCSpec        {singleDomain}(\.{singleDomain})*(:[0-9]+)?{urlPath}?
 
 ipv4address     ([0-9]{1,3}\.){3}([0-9]{1,3})
@@ -70,7 +73,7 @@
 ipv6URL         \[{ipv6Address}](:[0-9]+)?{urlPath}?
 
 userAtDomain	[^:@\/[:space:]]+\@{singleDomain}(\.{singleDomain})*
-mailSpec        {userAtDomain}\.{TLDs}
+mailSpec        {userAtDomain}\.({ccTLD}|{sTLD}|{uTLD}|arpa|local)
 jabberSpec      xmpp:{mailSpec}{urlPath}?(\?[^[:space:]]+)?
 aolIMSpec       aim:goim\?screenname=[^\ \t\n&]+(&message=.+)?
 aolChatSpec     aim:gochat\?roomname=[^\ \t\n&]+




More information about the commits mailing list