Anti-Phishing Update -- New data feed

Tue Jun 16 08:42:18 IST 2009

On 15/06/2009 21:35, Steve Freegard wrote:
> Julian Field wrote:
>    
>>
>> On 15/06/2009 21:02, Steve Freegard wrote:
>>      
>>> Alex Broens wrote:
>>>
>>>        
>>>>> I need to apply the rules to the entire message body and headers, as
>>>>> they frequently put the email address just in the body of the message
>>>>> inside some link or other. So how would creating separate header and
>>>>> body rules be any better?
>>>>>
>>>>>            
>>>> I'm not savvy enough in Perl&   SA to give you the scientific reason, but
>>>> its been common practive to avoid full rules if possible.
>>>>
>>>> You'd have to ask one of the core SA devs...  maybe Matt Kettler can
>>>> jump in and tell me I'm totally off and that my understanding is wrong.
>>>>
>>>>          
>>> 'full' rules are simply inefficient as IIRC the regexps have to be run
>>> multiple times across each block of text (IIRC: SA splits into paragraph
>>> style chunks) to prevent excessive memory use.  They also evaluate all
>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>
>>>        
>    
>> I don't think they include binary attachments, I had to add that
>> specifically for the MCP stuff with a patch to the SA code.
>>      
> > From 'man Mail::SpamAssassin::Conf':
>
>         full SYMBOLIC_TEST_NAME /pattern/modifiers
>             Define a full message pattern test.  "pattern" is a Perl regular
>             expression.  Note: as per the header tests, "#" must be escaped
>             ("\#") or else it is considered the beginning of a comment.
>
>             The full message is the pristine message headers plus the
> pristine
>             message body, including all MIME data such as images, other
>             attachments, MIME boundaries, etc.
>
> The reason it wouldn't work for MCP is that a 'full' rule is not going
> to decode base64/QP parts before evaluating the regexp (I think!).
>
>    
>>> If you are simply looking to get any e-mail addresses out of the message
>>> body; then a 'uri' rule is far more appropriate e.g.
>>>
>>> uri BLAH  /^mailto:email\@domain\.com$/
>>>
>>> (SA converts all e-mail URIs into mailto: types even those with no
>>> scheme).
>>>
>>>        
>> But surely that wouldn't work when email addresses just appear in the
>> text in text/plain bodies, would they?
>>      
> Sure does:
>
> [root at mail ~]# cat test.eml
> Return-path:<testfrom at example.com>
> To: test<test at example.com>
> From: test<testfrom at example.com>
> Subject: test
> Content-type: text/plain
>
> Test body
>
> bodytest at example.com this is a test bodytest2 at example.com
>
> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
> URI-Domain:example.com
> URI:mailto:bodytest2 at example.com
> URI:mailto:bodytest at example.com
>
> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
> rules do; I use this for testing amongst other things).
>    
Thanks for that lot, I stand corrected!

So I want to do
header PHISH_1H ALL =~ /huge|regexp|here/i
uri PHISH_1B /mailto:(huge|regexp|here)/i
And then do the meta rule to join them altogether.

Does that sound better to you?

Jules

-- 
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store
Follow me at twitter.com/JulesFM

MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
PGP public key: http://www.jules.fm/julesfm.asc

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.