Anti-Phishing Update -- New data feed
Julian Field
MailScanner at ecs.soton.ac.uk
Wed Jun 17 08:41:55 IST 2009
On 17/06/2009 06:05, David Lee wrote:
> Julian Field wrote:
>>
>>
>> On 16/06/2009 08:42, Julian Field wrote:
>>>
>>>
>>> On 15/06/2009 21:35, Steve Freegard wrote:
>>>> Julian Field wrote:
>>>>>
>>>>> On 15/06/2009 21:02, Steve Freegard wrote:
>>>>>> Alex Broens wrote:
>>>>>>
>>>>>>>> I need to apply the rules to the entire message body and
>>>>>>>> headers, as
>>>>>>>> they frequently put the email address just in the body of the
>>>>>>>> message
>>>>>>>> inside some link or other. So how would creating separate
>>>>>>>> header and
>>>>>>>> body rules be any better?
>>>>>>>>
>>>>>>> I'm not savvy enough in Perl& SA to give you the scientific
>>>>>>> reason, but
>>>>>>> its been common practive to avoid full rules if possible.
>>>>>>>
>>>>>>> You'd have to ask one of the core SA devs... maybe Matt Kettler
>>>>>>> can
>>>>>>> jump in and tell me I'm totally off and that my understanding is
>>>>>>> wrong.
>>>>>>>
>>>>>> 'full' rules are simply inefficient as IIRC the regexps have to
>>>>>> be run
>>>>>> multiple times across each block of text (IIRC: SA splits into
>>>>>> paragraph
>>>>>> style chunks) to prevent excessive memory use. They also
>>>>>> evaluate all
>>>>>> other MIME structures e.g. attachments, images etc. as per the docs.
>>>>>>
>>>>> I don't think they include binary attachments, I had to add that
>>>>> specifically for the MCP stuff with a patch to the SA code.
>>>> > From 'man Mail::SpamAssassin::Conf':
>>>>
>>>> full SYMBOLIC_TEST_NAME /pattern/modifiers
>>>> Define a full message pattern test. "pattern" is a
>>>> Perl regular
>>>> expression. Note: as per the header tests, "#" must be
>>>> escaped
>>>> ("\#") or else it is considered the beginning of a
>>>> comment.
>>>>
>>>> The full message is the pristine message headers plus the
>>>> pristine
>>>> message body, including all MIME data such as images,
>>>> other
>>>> attachments, MIME boundaries, etc.
>>>>
>>>> The reason it wouldn't work for MCP is that a 'full' rule is not going
>>>> to decode base64/QP parts before evaluating the regexp (I think!).
>>>>
>>>>>> If you are simply looking to get any e-mail addresses out of the
>>>>>> message
>>>>>> body; then a 'uri' rule is far more appropriate e.g.
>>>>>>
>>>>>> uri BLAH /^mailto:email\@domain\.com$/
>>>>>>
>>>>>> (SA converts all e-mail URIs into mailto: types even those with no
>>>>>> scheme).
>>>>>>
>>>>> But surely that wouldn't work when email addresses just appear in the
>>>>> text in text/plain bodies, would they?
>>>> Sure does:
>>>>
>>>> [root at mail ~]# cat test.eml
>>>> Return-path:<testfrom at example.com>
>>>> To: test<test at example.com>
>>>> From: test<testfrom at example.com>
>>>> Subject: test
>>>> Content-type: text/plain
>>>>
>>>> Test body
>>>>
>>>> bodytest at example.com this is a test bodytest2 at example.com
>>>>
>>>> [root at mail ~]# /mnt/jungledisk/smf/scripts/uri-extractor.pl test.eml
>>>> URI-Domain:example.com
>>>> URI:mailto:bodytest2 at example.com
>>>> URI:mailto:bodytest at example.com
>>>>
>>>> (uri-extractor.pl uses SA to extract URIs in the same way the eval()
>>>> rules do; I use this for testing amongst other things).
>>> Thanks for that lot, I stand corrected!
>>>
>>> So I want to do
>>> header PHISH_1H ALL =~ /huge|regexp|here/i
>>> uri PHISH_1B /mailto:(huge|regexp|here)/i
>>> And then do the meta rule to join them altogether.
>>>
>>> Does that sound better to you?
>> I have published an improved much faster version 2.01 which is
>> available from
>>
>> http://www.jules.fm/Logbook/files/anti-phishing-v2.html
>>
>> You might well want to upgrade...
>>
>> Jules
>>
> I assume the spamassassin rules generated by your improved script are
> different to those obtained via the 'spear.bastionmail.com' channel
> using sa-update.
Indeed, I don't know of anyone else who has the same data feed I do.
Jules
Jules
--
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store
Follow me at twitter.com/JulesFM
MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
PGP public key: http://www.jules.fm/julesfm.asc
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the MailScanner
mailing list