idea for next version
Scott Silva
ssilva at sgvwater.com
Wed Oct 11 00:39:39 IST 2006
Logan Shaw spake the following on 10/10/2006 3:12 PM:
> Roger wrote:
>>> So I was checking mailwatch this evening and I found out that the
>>> spam / ham percentage is 60% / 40% at daytime and 95% / 5% at night.
>>> This is quiet logical because at daytime everybody is working and at
>>> night (well here in europe) only spammers are working. This can be
>>> used for the spamfiltering. I think if it is possible to f.e. do,
>>> "spamscore * 1.2" between 11:00 pm and 7:00 am, it will hit more
>>> highscoring spam at night. Offcourse it will also hit ham, but as
>>> there is much less ham at night the possibility is less.
>
> On Tue, 10 Oct 2006, Steve Campbell wrote:
>> I tend to look at this in a different light. Spam is spam, and should
>> be caught by rules, etc regardless of the time it arrives. Ham is the
>> same also regardless of it's arrival time. A good set of rules should
>> work fine any time of the day. The percentages only indicate when
>> people are sending mail, so this is a useless figure for comparing
>> day/night averages.
>
> True enough, but every other rule that SpamAssassin uses
> is a heuristic as well. They're all based on particular
> characteristics of the messages (or servers that send them)
> and some kind of statistical correlation between those
> characteristics and spamminess.
>
>> For instance, if the same message that came in at night were resent
>> during the day, how should the mail be treated? Different score and
>> action?
>
> While I share the feeling that it is a little bit odd that the
> time a message arrives could sway its score, this is already
> true to some extent: real-time blacklists change over time
> (otherwise they wouldn't be real-time), and the score a message
> gets can be different one hour from what it is at the next hour.
>
> Overall, I think time of arrival could be safely used as
> yet another heuristic for determining if something is spam.
> The key thing is that the scores would need to be right, which
> I suspect means they'd need to be fairly low, something like
> 0.5 or so. SpamAssassin already handles setting scores by
> running a genetic algorithm (or whatever it is that it uses
> that replaced the GA in 3.x), but since this varies so much
> by site (what time zone the site is located in, what type
> of usage patterns it sees, etc.), there would need to be a
> reliable method of determining site-specific scores for this.
>
> To go in a different direction, as long as we're talking about
> time, another possibility is to apply time other places.
> For instance, you might have a time-dependent greylist.
> Make the greylist's delay much longer at night and shorter
> during the day. You'd get a lot of the effectiveness of
> greylisting but without as much delay during the active periods.
>
> Overall, though, I think although looking at time does give
> you additional information, it is not clear at all that
> the positives of going with it will outweigh the negatives.
> Time is a trait of a message (or message delivery) that has a
> strong correlation with spamminess, but there is also a steady
> stream of exceptions. So getting value out of looking at the
> time is likely to be that much harder because of that.
>
> - Logan
But many companies regularly have exec's and others working late, or from
home. So you will be placing these people in the spammer class just because
they work late?
Or how about someone in Hawaii mailing something to New York at 5:00 Pm Hawaii
time. That would be in the wee hours in New York, but not necessarily spam.
Or if Julian sent me a message at 8:00AM in the UK, it would be about midnight
here in the west coast of the US.
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the MailScanner
mailing list