Anti-Phishing Update -- New data feed

Julian Field MailScanner at
Wed Jun 17 08:41:55 IST 2009

On 17/06/2009 06:05, David Lee wrote:
> Julian Field wrote:
>> On 16/06/2009 08:42, Julian Field wrote:
>>> On 15/06/2009 21:35, Steve Freegard wrote:
>>>> Julian Field wrote:
>>>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>>>> Alex Broens wrote:
>>>>>>>> I need to apply the rules to the entire message body and 
>>>>>>>> headers, as
>>>>>>>> they frequently put the email address just in the body of the 
>>>>>>>> message
>>>>>>>> inside some link or other. So how would creating separate 
>>>>>>>> header and
>>>>>>>> body rules be any better?
>>>>>>> I'm not savvy enough in Perl&   SA to give you the scientific 
>>>>>>> reason, but
>>>>>>> its been common practive to avoid full rules if possible.
>>>>>>> You'd have to ask one of the core SA devs...  maybe Matt Kettler 
>>>>>>> can
>>>>>>> jump in and tell me I'm totally off and that my understanding is 
>>>>>>> wrong.
>>>>>> 'full' rules are simply inefficient as IIRC the regexps have to 
>>>>>> be run
>>>>>> multiple times across each block of text (IIRC: SA splits into 
>>>>>> paragraph
>>>>>> style chunks) to prevent excessive memory use.  They also 
>>>>>> evaluate all
>>>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>> I don't think they include binary attachments, I had to add that
>>>>> specifically for the MCP stuff with a patch to the SA code.
>>>> > From 'man Mail::SpamAssassin::Conf':
>>>>         full SYMBOLIC_TEST_NAME /pattern/modifiers
>>>>             Define a full message pattern test.  "pattern" is a 
>>>> Perl regular
>>>>             expression.  Note: as per the header tests, "#" must be 
>>>> escaped
>>>>             ("\#") or else it is considered the beginning of a 
>>>> comment.
>>>>             The full message is the pristine message headers plus the
>>>> pristine
>>>>             message body, including all MIME data such as images, 
>>>> other
>>>>             attachments, MIME boundaries, etc.
>>>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>>>> to decode base64/QP parts before evaluating the regexp (I think!).
>>>>>> If you are simply looking to get any e-mail addresses out of the 
>>>>>> message
>>>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>>> uri BLAH  /^mailto:email\@domain\.com$/
>>>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>>>> scheme).
>>>>> But surely that wouldn't work when email addresses just appear in the
>>>>> text in text/plain bodies, would they?
>>>> Sure does:
>>>> [root at mail ~]# cat test.eml
>>>> Return-path:<testfrom at>
>>>> To: test<test at>
>>>> From: test<testfrom at>
>>>> Subject: test
>>>> Content-type: text/plain
>>>> Test body
>>>> bodytest at this is a test bodytest2 at
>>>> [root at mail ~]# /mnt/jungledisk/smf/scripts/ test.eml
>>>> URI:mailto:bodytest2 at
>>>> URI:mailto:bodytest at
>>>> ( uses SA to extract URIs in the same way the eval()
>>>> rules do; I use this for testing amongst other things).
>>> Thanks for that lot, I stand corrected!
>>> So I want to do
>>> header PHISH_1H ALL =~ /huge|regexp|here/i
>>> uri PHISH_1B /mailto:(huge|regexp|here)/i
>>> And then do the meta rule to join them altogether.
>>> Does that sound better to you?
>> I have published an improved much faster version 2.01 which is 
>> available from
>> You might well want to upgrade...
>> Jules
> I assume the spamassassin rules generated by your improved script are 
> different to those obtained via the '' channel 
> using sa-update.
Indeed, I don't know of anyone else who has the same data feed I do.



Julian Field MEng CITP CEng
Buy the MailScanner book at
Follow me at

MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
PGP public key:

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the MailScanner mailing list