idea for next version
mailscanner at berger.nl
mailscanner at berger.nl
Wed Oct 11 09:35:51 IST 2006
Scott Silva wrote ..
> Logan Shaw spake the following on 10/10/2006 3:12 PM:
> > Roger wrote:
> >>> So I was checking mailwatch this evening and I found out that the
> >>> spam / ham percentage is 60% / 40% at daytime and 95% / 5% at night.
> >>> This is quiet logical because at daytime everybody is working and at
> >>> night (well here in europe) only spammers are working. This can be
> >>> used for the spamfiltering. I think if it is possible to f.e. do,
> >>> "spamscore * 1.2" between 11:00 pm and 7:00 am, it will hit more
> >>> highscoring spam at night. Offcourse it will also hit ham, but as
> >>> there is much less ham at night the possibility is less.
> >
> > On Tue, 10 Oct 2006, Steve Campbell wrote:
> >> I tend to look at this in a different light. Spam is spam, and should
> >> be caught by rules, etc regardless of the time it arrives. Ham is the
> >> same also regardless of it's arrival time. A good set of rules should
> >> work fine any time of the day. The percentages only indicate when
> >> people are sending mail, so this is a useless figure for comparing
> >> day/night averages.
> >
> > True enough, but every other rule that SpamAssassin uses
> > is a heuristic as well. They're all based on particular
> > characteristics of the messages (or servers that send them)
> > and some kind of statistical correlation between those
> > characteristics and spamminess.
> >
> >> For instance, if the same message that came in at night were resent
> >> during the day, how should the mail be treated? Different score and
> >> action?
> >
> > While I share the feeling that it is a little bit odd that the
> > time a message arrives could sway its score, this is already
> > true to some extent: real-time blacklists change over time
> > (otherwise they wouldn't be real-time), and the score a message
> > gets can be different one hour from what it is at the next hour.
> >
> > Overall, I think time of arrival could be safely used as
> > yet another heuristic for determining if something is spam.
> > The key thing is that the scores would need to be right, which
> > I suspect means they'd need to be fairly low, something like
> > 0.5 or so. SpamAssassin already handles setting scores by
> > running a genetic algorithm (or whatever it is that it uses
> > that replaced the GA in 3.x), but since this varies so much
> > by site (what time zone the site is located in, what type
> > of usage patterns it sees, etc.), there would need to be a
> > reliable method of determining site-specific scores for this.
> >
> > To go in a different direction, as long as we're talking about
> > time, another possibility is to apply time other places.
> > For instance, you might have a time-dependent greylist.
> > Make the greylist's delay much longer at night and shorter
> > during the day. You'd get a lot of the effectiveness of
> > greylisting but without as much delay during the active periods.
> >
> > Overall, though, I think although looking at time does give
> > you additional information, it is not clear at all that
> > the positives of going with it will outweigh the negatives.
> > Time is a trait of a message (or message delivery) that has a
> > strong correlation with spamminess, but there is also a steady
> > stream of exceptions. So getting value out of looking at the
> > time is likely to be that much harder because of that.
> >
> > - Logan
> But many companies regularly have exec's and others working late, or from
> home. So you will be placing these people in the spammer class just because
> they work late?
> Or how about someone in Hawaii mailing something to New York at 5:00 Pm
> Hawaii
> time. That would be in the wee hours in New York, but not necessarily spam.
> Or if Julian sent me a message at 8:00AM in the UK, it would be about midnight
> here in the west coast of the US.
>
> --
>
Well, as long as you can change the time. If you set 11:00Pm till 7:00 am I think you won't hit many people working late and even companies 5 hours away will be mainly closed at 6 pm.
The idea is based on what I see for myself. This morning I had 51 spam mails which hit between 4(low) and 9(high). These were all real spam. Beside that I had 2 normal emails which had a score of -2,50 and whitelisted. The problem is that I had still 51 messages tagged as {Spam?} which I had to check manually. I checked a few of them and they mostly hit a score about 7 or 8.
If I could multiply the spam score with f.e. 1.2 between 11pm an 7am it would 'upgrade' about 20 messages to highscoring which means I receive about 40% less spam in the morning.
I won't try this at daytime because the chance of hitting ham is too big.
Offcourse these are my findings.
Maybe, the real thought behind it is that I have a very different ratio of spam/ham at night and at daytime, and this can be used to filter spam somehow.
Or maybe, mailscanner spoiled me so far that I want too much ;-)
Roger
More information about the MailScanner
mailing list