Anti-Phishing Update -- New data feed
Steve Freegard
steve.freegard at fsl.com
Tue Jun 16 09:40:45 IST 2009
Julian Field wrote:
>
>
> On 15/06/2009 21:35, Steve Freegard wrote:
>> Julian Field wrote:
>>
>>>
>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>
>>>> Alex Broens wrote:
>>>>
>>>>
>>>>>> I need to apply the rules to the entire message body and headers, as
>>>>>> they frequently put the email address just in the body of the message
>>>>>> inside some link or other. So how would creating separate header and
>>>>>> body rules be any better?
>>>>>>
>>>>>>
>>>>> I'm not savvy enough in Perl& SA to give you the scientific
>>>>> reason, but
>>>>> its been common practive to avoid full rules if possible.
>>>>>
>>>>> You'd have to ask one of the core SA devs... maybe Matt Kettler can
>>>>> jump in and tell me I'm totally off and that my understanding is
>>>>> wrong.
>>>>>
>>>>>
>>>> 'full' rules are simply inefficient as IIRC the regexps have to be run
>>>> multiple times across each block of text (IIRC: SA splits into
>>>> paragraph
>>>> style chunks) to prevent excessive memory use. They also evaluate all
>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>
>>>>
>>
>>> I don't think they include binary attachments, I had to add that
>>> specifically for the MCP stuff with a patch to the SA code.
>>>
>> > From 'man Mail::SpamAssassin::Conf':
>>
>> full SYMBOLIC_TEST_NAME /pattern/modifiers
>> Define a full message pattern test. "pattern" is a Perl
>> regular
>> expression. Note: as per the header tests, "#" must be
>> escaped
>> ("\#") or else it is considered the beginning of a comment.
>>
>> The full message is the pristine message headers plus the
>> pristine
>> message body, including all MIME data such as images, other
>> attachments, MIME boundaries, etc.
>>
>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>> to decode base64/QP parts before evaluating the regexp (I think!).
>>
>>
>>>> If you are simply looking to get any e-mail addresses out of the
>>>> message
>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>
>>>> uri BLAH /^mailto:email\@domain\.com$/
>>>>
>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>> scheme).
>>>>
>>>>
>>> But surely that wouldn't work when email addresses just appear in the
>>> text in text/plain bodies, would they?
>>>
>> Sure does:
>>
>> [root at mail ~]# cat test.eml
>> Return-path:<testfrom at example.com>
>> To: test<test at example.com>
>> From: test<testfrom at example.com>
>> Subject: test
>> Content-type: text/plain
>>
>> Test body
>>
>> bodytest at example.com this is a test bodytest2 at example.com
>>
>> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
>> URI-Domain:example.com
>> URI:mailto:bodytest2 at example.com
>> URI:mailto:bodytest at example.com
>>
>> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
>> rules do; I use this for testing amongst other things).
>>
> Thanks for that lot, I stand corrected!
>
> So I want to do
> header PHISH_1H ALL =~ /huge|regexp|here/i
> uri PHISH_1B /mailto:(huge|regexp|here)/i
> And then do the meta rule to join them altogether.
>
> Does that sound better to you?
>
Yup; sounds fine for now. As the data volume grows a plug-in that uses
a SDBM database would be far better.
Cheers,
Steve.
More information about the MailScanner
mailing list