idea for next version

Scott Silva ssilva at sgvwater.com
Wed Oct 11 18:14:02 IST 2006


mailscanner at berger.nl spake the following on 10/11/2006 1:35 AM:
> Scott Silva wrote ..
>> Logan Shaw spake the following on 10/10/2006 3:12 PM:
>>> Roger wrote:
>>>>> So I was checking mailwatch this evening and I found out that the
>>>>> spam / ham percentage is 60% / 40% at daytime and 95% / 5% at night.
>>>>> This is quiet logical because at daytime everybody is working and at
>>>>> night (well here in europe) only spammers are working. This can be
>>>>> used for the spamfiltering. I think if it is possible to f.e. do,
>>>>> "spamscore * 1.2" between 11:00 pm and 7:00 am, it will hit more
>>>>> highscoring spam at night. Offcourse it will also hit ham, but as
>>>>> there is much less ham at night the possibility is less.
>>> On Tue, 10 Oct 2006, Steve Campbell wrote:
>>>> I tend to look at this in a different light. Spam is spam, and should
>>>> be caught by rules, etc regardless of the time it arrives. Ham is the
>>>> same also regardless of it's arrival time. A good set of rules should
>>>> work fine any time of the day. The percentages only indicate when
>>>> people are sending mail, so this is a useless figure for comparing
>>>> day/night averages.
>>> True enough, but every other rule that SpamAssassin uses
>>> is a heuristic as well.  They're all based on particular
>>> characteristics of the messages (or servers that send them)
>>> and some kind of statistical correlation between those
>>> characteristics and spamminess.
>>>
>>>> For instance, if the same message that came in at night were resent
>>>> during the day, how should the mail be treated? Different score and
>>>> action?
>>> While I share the feeling that it is a little bit odd that the
>>> time a message arrives could sway its score, this is already
>>> true to some extent:  real-time blacklists change over time
>>> (otherwise they wouldn't be real-time), and the score a message
>>> gets can be different one hour from what it is at the next hour.
>>>
>>> Overall, I think time of arrival could be safely used as
>>> yet another heuristic for determining if something is spam.
>>> The key thing is that the scores would need to be right, which
>>> I suspect means they'd need to be fairly low, something like
>>> 0.5 or so.  SpamAssassin already handles setting scores by
>>> running a genetic algorithm (or whatever it is that it uses
>>> that replaced the GA in 3.x), but since this varies so much
>>> by site (what time zone the site is located in, what type
>>> of usage patterns it sees, etc.), there would need to be a
>>> reliable method of determining site-specific scores for this.
>>>
>>> To go in a different direction, as long as we're talking about
>>> time, another possibility is to apply time other places.
>>> For instance, you might have a time-dependent greylist.
>>> Make the greylist's delay much longer at night and shorter
>>> during the day.  You'd get a lot of the effectiveness of
>>> greylisting but without as much delay during the active periods.
>>>
>>> Overall, though, I think although looking at time does give
>>> you additional information, it is not clear at all that
>>> the positives of going with it will outweigh the negatives.
>>> Time is a trait of a message (or message delivery) that has a
>>> strong correlation with spamminess, but there is also a steady
>>> stream of exceptions.  So getting value out of looking at the
>>> time is likely to be that much harder because of that.
>>>
>>>   - Logan
>> But many companies regularly have exec's and others working late, or from
>> home. So you will be placing these people in the spammer class just because
>> they work late?
>> Or how about someone in Hawaii mailing something to New York at 5:00 Pm
>> Hawaii
>> time. That would be in the wee hours in New York, but not necessarily spam.
>> Or if Julian sent me a message at 8:00AM in the UK, it would be about midnight
>> here in the west coast of the US.
>>
>> -- 
>>
> Well, as long as you can change the time. If you set 11:00Pm till 7:00 am I think you won't hit many people working late and even companies 5 hours away will be mainly closed at 6 pm. 
> The idea is based on what I see for myself. This morning I had 51 spam mails which hit between 4(low) and 9(high). These were all real spam. Beside that I had 2 normal emails which had a score of -2,50 and whitelisted. The problem is that I had still 51  messages tagged as {Spam?} which I had to check manually. I checked a few of them and they mostly hit a score about 7 or 8. 
> If I could multiply the spam score with f.e. 1.2 between 11pm an 7am it would 'upgrade' about 20 messages to highscoring which means I receive about 40% less spam in the morning.
> I won't try this at daytime because the chance of hitting ham is too big.
> Offcourse these are my findings.
> 
> Maybe, the real thought behind it is that I have a very different ratio of spam/ham at night and at daytime, and this can be used to filter spam somehow.
> 
> Or maybe, mailscanner spoiled me so far that I want too much ;-)
> 
> Roger 
> 
My setup is just so different. Maybe it is the rules I have, or the use of
razor - DCC - pyzor, but I have a very small percentage of mail in the normal
spam range. Most is either high scoring on ham.
Looking at the current stats, I have 38.1% clean, 58.8% High scoring spam, and
only  3.1% spam. I have only had one false positive in the last 2 weeks, and
that was only a technicality. The sender was forwarding a joke from a yahoo
mail account. I said spam, the receiver didn't care either way, and the sender
 probably didn't think it was spam. But I win, 'cause I'm root!

Between Razor, the uribl's and the sare rules, It is pretty close to making me
happy, and my bosses are happy, so I still tweak things, but not as often as I
used to.
I even got a message that scored 114.

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!



More information about the MailScanner mailing list