Anti-spear-phishing, round 2

Julian Field MailScanner at
Sun Jan 11 18:09:27 GMT 2009

On 11/1/09 13:13, Denis Beauchemin wrote:
> Guy Story KC5GOI a écrit :
>> Kai,  I need to clarify my question then.  I did read over the script 
>> and if I understand it, please bear in mind I do not pretend to 
>> program, that it downloads the data from Google and turns it into a 
>> rule for SA.  The rule itself provides inbound, outbound and content 
>> filtering using the email addresses that are provided by the Google 
>> list.  Between Jules postings and the comments in the script, if I am 
>> understanding it correctly, then that is a huge testimony on Jules 
>> commenting in the file.  That is a huge help for non-programmers and 
>> I thank him.
Thanks! I tried to make it pretty clear to non-programmers. I don't add 
comments as an after-thought, I document as I go.
>> I understand that since I do not have the current release of MS that 
>> I can not take full advantage of what Jule has done.  I am currently 
>> using 7.10 of Ubuntu so I need to make sure that I can satisfy the 
>> dependencies to preform the upgrade.  This is a time issue since I am 
>> a one man department.
>> As a temporary solution I downloaded the list and used it to create a 
>> list that I added to my spam blacklist rule with FromOrTo so I can 
>> filter on two of three points.
>> The downside to my current approach is lack of content scanning and a 
>> manual updating process instead of using Jules script in 
>> cron.hourly.  Not ideal but a start.  It takes me 5 minutes to do 
>> this where Jules script probably does in in less than 30 seconds 
>> (download, convert, copy and restart MS)  and is more current.  I 
>> might do this once a week.  I understand that the address list could 
>> update literally on an hourly basis.  The rate of updates is up to 
>> Google and I have not read through the project fully yet.
It's not up to Google. As far as I am aware, they don't have any 
connection with the project other than merely providing a place to host 
it, rather like Sourceforge does for many other people.
>> My original and poorly worded question was more along the lines of 
>> how much work MS has to do using the list of addresses in the spam 
>> blacklist verses a SA rule.  It it more work processing the blacklist 
>> than the SA rule?
Due to the way I wrote the script, the cost of running that file in SA 
is actually pretty minimal. One large pattern containing many 
alternatives is hugely more efficient in SA (and in Perl) than having a 
separate SA rule for each address, which would be the naive implementation.

The way SA works is that every rule gets turned into the Perl source 
code for a function, and then SA calls each function (i.e. rule) with 
the text of each message. So if you cram 20 alternatives into 1 rule, 
it's only 1 function call per message instead of 20, so 20 times less 

Additionally, the addresses are listed alphabetically sorted, so that 
when Perl is trying to match the huge expression, if all the alternative 
addresses in the expression (rule) start with an "a" then it will only 
check the first character. If that isn't an "a" then none of the 
alternatives can match and it can bail out instantly. It's not actually 
as simple as that, but the theory basically still holds true.

So it turns into (on my systems) about 100 SA rules, each of which can 
be processed very quickly compared with many other SA rules you may use. 
Most systems have many thousands of rules, so an extra 100 is a tiny 
cost for the benefit you get from them.

I did put quite a bit of thought into my code, it is very far from a 
naive implementation, and contains a lot of measures to try to ensure 
that a rogue entry in the Google-hosted file cannot cause all your mail 
to get binned. If someone put "s at" in the file, it would *not* 
hit every message from "thomas at" for example!

> Guy,
> I'm pretty sure you can use Julian's script in an older version of MS 
> but you will have to use it to add to the SA score and then rely on 
> your Required SpamAssassin Score or High SpamAssassin Score to 
> quarantine/delete the emails.
Correct. Just use the SA score (which you can set at the top of the 
script) and make it count towards your normal Spam Actions or 
High-Scoring Spam Actions, just the same as you would for any other 
SpamAssassin rule.

I chose to use the "SpamAssassin Rule Actions", and a very low score, as 
I want to handle this mail in a very different way to normal spam, 
partly because it makes it easier for me to develop the code and to see 
how well it is working and if there are ways I could improve it.
> I you were to assing a score of, let's say, 15 to $SA_score in 
> Julian's Spear.Phishing.Rules script, you could bump those emails into 
> high scoring spam and then do whatever you want to them without having 
> to use SpamAssassin Rule Actions at all.
Yes, that would work just fine. Just not the way *I* choose to use it. 
But you are more than welcome to :-)


Julian Field MEng CITP CEng
Buy the MailScanner book at

MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
PGP public key:

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the MailScanner mailing list