Anti-Phishing Update -- New data feed

Tue Jun 16 09:40:45 IST 2009

Julian Field wrote:
> 
> 
> On 15/06/2009 21:35, Steve Freegard wrote:
>> Julian Field wrote:
>>   
>>>
>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>     
>>>> Alex Broens wrote:
>>>>
>>>>       
>>>>>> I need to apply the rules to the entire message body and headers, as
>>>>>> they frequently put the email address just in the body of the message
>>>>>> inside some link or other. So how would creating separate header and
>>>>>> body rules be any better?
>>>>>>
>>>>>>            
>>>>> I'm not savvy enough in Perl&   SA to give you the scientific
>>>>> reason, but
>>>>> its been common practive to avoid full rules if possible.
>>>>>
>>>>> You'd have to ask one of the core SA devs...  maybe Matt Kettler can
>>>>> jump in and tell me I'm totally off and that my understanding is
>>>>> wrong.
>>>>>
>>>>>          
>>>> 'full' rules are simply inefficient as IIRC the regexps have to be run
>>>> multiple times across each block of text (IIRC: SA splits into
>>>> paragraph
>>>> style chunks) to prevent excessive memory use.  They also evaluate all
>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>
>>>>        
>>   
>>> I don't think they include binary attachments, I had to add that
>>> specifically for the MCP stuff with a patch to the SA code.
>>>      
>> > From 'man Mail::SpamAssassin::Conf':
>>
>>         full SYMBOLIC_TEST_NAME /pattern/modifiers
>>             Define a full message pattern test.  "pattern" is a Perl
>> regular
>>             expression.  Note: as per the header tests, "#" must be
>> escaped
>>             ("\#") or else it is considered the beginning of a comment.
>>
>>             The full message is the pristine message headers plus the
>> pristine
>>             message body, including all MIME data such as images, other
>>             attachments, MIME boundaries, etc.
>>
>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>> to decode base64/QP parts before evaluating the regexp (I think!).
>>
>>   
>>>> If you are simply looking to get any e-mail addresses out of the
>>>> message
>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>
>>>> uri BLAH  /^mailto:email\@domain\.com$/
>>>>
>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>> scheme).
>>>>
>>>>        
>>> But surely that wouldn't work when email addresses just appear in the
>>> text in text/plain bodies, would they?
>>>      
>> Sure does:
>>
>> [root at mail ~]# cat test.eml
>> Return-path:<testfrom at example.com>
>> To: test<test at example.com>
>> From: test<testfrom at example.com>
>> Subject: test
>> Content-type: text/plain
>>
>> Test body
>>
>> bodytest at example.com this is a test bodytest2 at example.com
>>
>> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
>> URI-Domain:example.com
>> URI:mailto:bodytest2 at example.com
>> URI:mailto:bodytest at example.com
>>
>> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
>> rules do; I use this for testing amongst other things).
>>    
> Thanks for that lot, I stand corrected!
> 
> So I want to do
> header PHISH_1H ALL =~ /huge|regexp|here/i
> uri PHISH_1B /mailto:(huge|regexp|here)/i
> And then do the meta rule to join them altogether.
> 
> Does that sound better to you?
> 

Yup; sounds fine for now.   As the data volume grows a plug-in that uses
a SDBM database would be far better.

Cheers,
Steve.