Anti-Phishing Update -- New data feed
David Lee
david at bass.net.au
Wed Jun 17 06:05:13 IST 2009
Julian Field wrote:
>
>
> On 16/06/2009 08:42, Julian Field wrote:
>>
>>
>> On 15/06/2009 21:35, Steve Freegard wrote:
>>> Julian Field wrote:
>>>>
>>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>>> Alex Broens wrote:
>>>>>
>>>>>>> I need to apply the rules to the entire message body and
>>>>>>> headers, as
>>>>>>> they frequently put the email address just in the body of the
>>>>>>> message
>>>>>>> inside some link or other. So how would creating separate header
>>>>>>> and
>>>>>>> body rules be any better?
>>>>>>>
>>>>>> I'm not savvy enough in Perl& SA to give you the scientific
>>>>>> reason, but
>>>>>> its been common practive to avoid full rules if possible.
>>>>>>
>>>>>> You'd have to ask one of the core SA devs... maybe Matt Kettler can
>>>>>> jump in and tell me I'm totally off and that my understanding is
>>>>>> wrong.
>>>>>>
>>>>> 'full' rules are simply inefficient as IIRC the regexps have to be
>>>>> run
>>>>> multiple times across each block of text (IIRC: SA splits into
>>>>> paragraph
>>>>> style chunks) to prevent excessive memory use. They also evaluate
>>>>> all
>>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>>
>>>> I don't think they include binary attachments, I had to add that
>>>> specifically for the MCP stuff with a patch to the SA code.
>>> > From 'man Mail::SpamAssassin::Conf':
>>>
>>> full SYMBOLIC_TEST_NAME /pattern/modifiers
>>> Define a full message pattern test. "pattern" is a Perl
>>> regular
>>> expression. Note: as per the header tests, "#" must be
>>> escaped
>>> ("\#") or else it is considered the beginning of a comment.
>>>
>>> The full message is the pristine message headers plus the
>>> pristine
>>> message body, including all MIME data such as images, other
>>> attachments, MIME boundaries, etc.
>>>
>>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>>> to decode base64/QP parts before evaluating the regexp (I think!).
>>>
>>>>> If you are simply looking to get any e-mail addresses out of the
>>>>> message
>>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>>
>>>>> uri BLAH /^mailto:email\@domain\.com$/
>>>>>
>>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>>> scheme).
>>>>>
>>>> But surely that wouldn't work when email addresses just appear in the
>>>> text in text/plain bodies, would they?
>>> Sure does:
>>>
>>> [root at mail ~]# cat test.eml
>>> Return-path:<testfrom at example.com>
>>> To: test<test at example.com>
>>> From: test<testfrom at example.com>
>>> Subject: test
>>> Content-type: text/plain
>>>
>>> Test body
>>>
>>> bodytest at example.com this is a test bodytest2 at example.com
>>>
>>> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
>>> URI-Domain:example.com
>>> URI:mailto:bodytest2 at example.com
>>> URI:mailto:bodytest at example.com
>>>
>>> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
>>> rules do; I use this for testing amongst other things).
>> Thanks for that lot, I stand corrected!
>>
>> So I want to do
>> header PHISH_1H ALL =~ /huge|regexp|here/i
>> uri PHISH_1B /mailto:(huge|regexp|here)/i
>> And then do the meta rule to join them altogether.
>>
>> Does that sound better to you?
> I have published an improved much faster version 2.01 which is
> available from
>
> http://www.jules.fm/Logbook/files/anti-phishing-v2.html
>
> You might well want to upgrade...
>
> Jules
>
I assume the spamassassin rules generated by your improved script are
different to those obtained via the 'spear.bastionmail.com' channel
using sa-update.
David
--
-----------------------------------------------------------------------
David Lee
Systems Administrator Tel: +61-8-8205-2467
BASS South Australia Fax: +61-8-8205-0550
GPO Box 1269, Adelaide 5000 http://www.bass.net.au/
-----------------------------------------------------------------------
More information about the MailScanner
mailing list