Anti-Phishing Update -- New data feed
Julian Field
MailScanner at ecs.soton.ac.uk
Tue Jun 16 10:32:45 IST 2009
On 16/06/2009 08:42, Julian Field wrote:
>
>
> On 15/06/2009 21:35, Steve Freegard wrote:
>> Julian Field wrote:
>>>
>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>> Alex Broens wrote:
>>>>
>>>>>> I need to apply the rules to the entire message body and headers, as
>>>>>> they frequently put the email address just in the body of the
>>>>>> message
>>>>>> inside some link or other. So how would creating separate header and
>>>>>> body rules be any better?
>>>>>>
>>>>> I'm not savvy enough in Perl& SA to give you the scientific
>>>>> reason, but
>>>>> its been common practive to avoid full rules if possible.
>>>>>
>>>>> You'd have to ask one of the core SA devs... maybe Matt Kettler can
>>>>> jump in and tell me I'm totally off and that my understanding is
>>>>> wrong.
>>>>>
>>>> 'full' rules are simply inefficient as IIRC the regexps have to be run
>>>> multiple times across each block of text (IIRC: SA splits into
>>>> paragraph
>>>> style chunks) to prevent excessive memory use. They also evaluate all
>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>
>>> I don't think they include binary attachments, I had to add that
>>> specifically for the MCP stuff with a patch to the SA code.
>> > From 'man Mail::SpamAssassin::Conf':
>>
>> full SYMBOLIC_TEST_NAME /pattern/modifiers
>> Define a full message pattern test. "pattern" is a Perl
>> regular
>> expression. Note: as per the header tests, "#" must be
>> escaped
>> ("\#") or else it is considered the beginning of a comment.
>>
>> The full message is the pristine message headers plus the
>> pristine
>> message body, including all MIME data such as images, other
>> attachments, MIME boundaries, etc.
>>
>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>> to decode base64/QP parts before evaluating the regexp (I think!).
>>
>>>> If you are simply looking to get any e-mail addresses out of the
>>>> message
>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>
>>>> uri BLAH /^mailto:email\@domain\.com$/
>>>>
>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>> scheme).
>>>>
>>> But surely that wouldn't work when email addresses just appear in the
>>> text in text/plain bodies, would they?
>> Sure does:
>>
>> [root at mail ~]# cat test.eml
>> Return-path:<testfrom at example.com>
>> To: test<test at example.com>
>> From: test<testfrom at example.com>
>> Subject: test
>> Content-type: text/plain
>>
>> Test body
>>
>> bodytest at example.com this is a test bodytest2 at example.com
>>
>> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
>> URI-Domain:example.com
>> URI:mailto:bodytest2 at example.com
>> URI:mailto:bodytest at example.com
>>
>> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
>> rules do; I use this for testing amongst other things).
> Thanks for that lot, I stand corrected!
>
> So I want to do
> header PHISH_1H ALL =~ /huge|regexp|here/i
> uri PHISH_1B /mailto:(huge|regexp|here)/i
> And then do the meta rule to join them altogether.
>
> Does that sound better to you?
I have published an improved much faster version 2.01 which is available
from
http://www.jules.fm/Logbook/files/anti-phishing-v2.html
You might well want to upgrade...
Jules
--
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store
Need help customising MailScanner?
Contact me!
Need help fixing or optimising your systems?
Contact me!
Need help getting you started solving new requirements from your boss?
Contact me!
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
Follow me at twitter.com/JulesFM and twitter.com/MailScanner
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the MailScanner
mailing list