Is this really how bayes+autolearn works?
Scott Silva
ssilva at sgvwater.com
Wed Dec 13 17:38:56 GMT 2006
Furnish, Trever G spake the following on 12/13/2006 6:37 AM:
>
>
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>> Of Scott Silva
>> Sent: Tuesday, December 12, 2006 5:45 PM
>> To: mailscanner at lists.mailscanner.info
>> Subject: Re: Is this really how bayes+autolearn works?
>
>> Furnish, Trever G spake the following on 12/12/2006 1:59 PM:
>>> So Bayes is getting lots of messages that SA doesn't detect
>> as spam,
>>> and only a few similar messages that I train it to treat as
>> spam. Is
>>> this a plausible explanation for why Bayes would consistently be
>>> misclassifying this mail?
>>>
>>> So far the floods start in the afternoon and the subject
>> strings are
>>> consistent enough that I'm able to correct the damage by:
>>> - removing my bayes database and retraining from archived spam
>>> corpus (slow)
>>> - creating custom rules to, for example, filter out "Subject =~
>>> /Good Morning/" (dangerous)
>
>> I also see a lot of spam coming from bots, but I consistently
>> catch most of it. Are you using some good add-on rules?
>> Do you have any samples that some of us could run through our
>> systems to see what we get?
>
> Requested samples are attached. They're very simple messages -- but
> they're flooding in without being caught and then Bayes starts to assign
> -2.60 to them. :-(
>
> --
> Trever
>
>
> ------------------------------------------------------------------------
>
> Subject:
> It ready
> From:
> "Sal Oakes" <predestineborn at abz1.freeserve.co.uk>
> Date:
> Wed, 13 Dec 2006 10:25:17 -0500
> To:
> <lavelez at herff-jones.com>
>
> To:
> <lavelez at herff-jones.com>
>
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> inex1.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:25:37 -0500
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:25:37 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from Host (dsl54025CFD.pool.t-online.hu [84.2.92.253]) by
> relay2.public.herff-jones.com (8.12.11/8.12.11) with ESMTP id
> kBDEPI6h022465; Wed, 13 Dec 2006 09:25:20 -0500
> Received:
> from 193.252.22.141 (HELO mail-in.freeserve.com) by herff-jones.com with
> esmtp (H; 3DY5// >0N)) id 4/3*,V-N2*),Z-:, for lavelez at herff-jones.com;
> Wed, 13 Dec 2006 14:25:17 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec2$830d63d0$6c822ecf at predestineborn>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6Q941UUZ9J7Z4L5A317Q=
>
>
> News Alert!
>
> Fueled by the possibility of an upcoming merger, Wild Brush
> Energy (WBRS) is gearing up for an explosion. Tension is
> building and soon the scramble to take a position will push
> this one off the charts.
>
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
>
> WBRS is engaged in some of the most lucrative gas regions in North
> America. Major discoveries are happening all the time and WBRS is in
> the thick of it.
>
> With the array of drilling projects Wild Brush has going on at the moment
> tension is building. As the drilling gets closer to completion insiders are
> accumulating ahead of that major discovery announcement.
>
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> It ready
> From:
> "Young Gill" <Aesop'sLima's at acadia.eng.sun.com>
> Date:
> Wed, 13 Dec 2006 10:23:13 -0500
> To:
> <kokuehl at herff-jones.com>
>
> To:
> <kokuehl at herff-jones.com>
>
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> inex1.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:24:10 -0500
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="windows-1250"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:24:11 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from pD9FFEAD5.dip.t-dialin.net (pD9FFEF55.dip.t-dialin.net
> [217.255.239.85]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDEN0rx021231; Wed, 13 Dec 2006 09:23:05 -0500
> Received:
> from 192.5.209.6 (HELO btmx4.sun.com) by herff-jones.com with esmtp
> (+5YBR6(>K 03; >H) id +4*841-UQ4+74-)- for kokuehl at herff-jones.com; Wed,
> 13 Dec 2006 14:23:13 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec2$395db640$6c822ecf at Aesop'sLima's>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6Q0Mwseft1NU=
>
>
> News Alert!
>
> Fueled by the possibility of an upcoming merger, Wild Brush
> Energy (WBRS) is gearing up for an explosion. Tension is
> building and soon the scramble to take a position will push
> this one off the charts.
>
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
>
> WBRS is engaged in some of the most lucrative gas regions in North
> America. Major discoveries are happening all the time and WBRS is in
> the thick of it.
>
> With the array of drilling projects Wild Brush has going on at the moment
> tension is building. As the drilling gets closer to completion insiders are
> accumulating ahead of that major discovery announcement.
>
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> It ready
> From:
> "Hillary Raines" <incensesshoddiest at aais.org.uk>
> Date:
> Wed, 13 Dec 2006 10:31:25 -0500
> To:
> "Reeck, Alyssa A" <aareeck at herffjones.com>
>
> To:
> "Reeck, Alyssa A" <aareeck at herffjones.com>
>
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> leex1.herffjones.hj-int with Microsoft SMTPSVC(5.0.2195.6713); Wed, 13
> Dec 2006 08:30:10 -0600
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="windows-1250"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:30:07 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from p548B7FC7.dip.t-dialin.net (p548B7FC7.dip.t-dialin.net
> [84.139.127.199]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDETifE025143; Wed, 13 Dec 2006 09:29:45 -0500
> Received:
> from 80.243.184.9 (HELO mail.aais.org.uk) by herffjones.com with esmtp
> ((770?4RNA 860LT) id 12<06.-E8/9/E-9- for aareeck at herffjones.com; Wed,
> 13 Dec 2006 14:31:25 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec3$5ea3af30$6c822ecf at incensesshoddiest>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6QT06H1P/9M1B12T10R54NL4=
>
>
> News Alert!
>
> Fueled by the possibility of an upcoming merger, Wild Brush
> Energy (WBRS) is gearing up for an explosion. Tension is
> building and soon the scramble to take a position will push
> this one off the charts.
>
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
>
> WBRS is engaged in some of the most lucrative gas regions in North
> America. Major discoveries are happening all the time and WBRS is in
> the thick of it.
>
> With the array of drilling projects Wild Brush has going on at the moment
> tension is building. As the drilling gets closer to completion insiders are
> accumulating ahead of that major discovery announcement.
>
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
>
>
>
> ------------------------------------------------------------------------
>
> Subject:
> It ready
> From:
> "Kelsey Gillis" <cuisine'sBiscay at accesswv.com>
> Date:
> Wed, 13 Dec 2006 10:31:53 -0500
> To:
> <aareeck at herff-jones.com>
>
> To:
> <aareeck at herff-jones.com>
>
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> leex1.herffjones.hj-int with Microsoft SMTPSVC(5.0.2195.6713); Wed, 13
> Dec 2006 08:31:07 -0600
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:31:05 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from p548B7FC7.dip.t-dialin.net (p548B7FC7.dip.t-dialin.net
> [84.139.127.199]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDEUCco025494; Wed, 13 Dec 2006 09:30:13 -0500
> Received:
> from 198.185.2.67 (HELO mx2.business.mindspring.com) by herff-jones.com
> with esmtp (W/9UT0)(0 (H8)DO) id 9G65X2-OPQ/F at -C= for
> aareeck at herff-jones.com; Wed, 13 Dec 2006 14:31:53 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec3$6edf4350$6c822ecf at cuisine'sBiscay>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6QQL6573SZ/1I7Y35KVU2
>
>
> News Alert!
>
> Fueled by the possibility of an upcoming merger, Wild Brush
> Energy (WBRS) is gearing up for an explosion. Tension is
> building and soon the scramble to take a position will push
> this one off the charts.
>
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
>
> WBRS is engaged in some of the most lucrative gas regions in North
> America. Major discoveries are happening all the time and WBRS is in
> the thick of it.
>
> With the array of drilling projects Wild Brush has going on at the moment
> tension is building. As the drilling gets closer to completion insiders are
> accumulating ahead of that major discovery announcement.
>
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
>
>
Content analysis details: (33.4 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
0.0 BOTNET_CLIENTWORDS Hostname contains client-like substrings
0.0 BOTNET_IPINHOSTNAME Hostname contains its own IP address
1.7 SARE_MLB_Stock1 BODY: SARE_MLB_Stock1
1.7 SARE_MLB_Stock2 BODY: SARE_MLB_Stock2
1.0 SARE_LWHUGE BODY: SARE_LWHUGE
0.8 SARE_LWSHORTT BODY: SARE_LWSHORTT
1.7 SARE_MLB_Stock6 BODY: ML obfuscated ticker symbols
2.4 TVD_STOCK1 BODY: Message looks like it's pushing a stock...
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5000]
1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
above 50%
[cf: 100]
1.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
[cf: 100]
3.7 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
[84.2.92.253 listed in dnsbl.sorbs.net]
2.0 RCVD_IN_NJABL_DUL RBL: NJABL: dialup sender did non-local SMTP
[84.2.92.253 listed in combined.njabl.org]
2.5 DIGEST_MULTIPLE Message hits more than one network digest check
2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name)
found
0.0 BOTNET_CLIENT Hostname looks like a client hostname
1.9 RATWARE_MS_HASH Bulk email fingerprint (msgid ms hash) found
1.7 MSGID_DOLLARS Message-Id has pattern used in spam
2.0 BOTNET The submitting mail server looks like part of a Botnet
Content analysis details: (33.4 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
2.6 HELO_DYNAMIC_DIALIN Relay HELO'd using suspicious hostname
(T-Dialin)
0.0 BOTNET_CLIENTWORDS Hostname contains client-like substrings
0.0 BOTNET_IPINHOSTNAME Hostname contains its own IP address
1.7 SARE_MLB_Stock1 BODY: SARE_MLB_Stock1
1.7 SARE_MLB_Stock2 BODY: SARE_MLB_Stock2
1.0 SARE_LWHUGE BODY: SARE_LWHUGE
0.8 SARE_LWSHORTT BODY: SARE_LWSHORTT
1.7 SARE_MLB_Stock6 BODY: ML obfuscated ticker symbols
2.4 TVD_STOCK1 BODY: Message looks like it's pushing a stock...
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.5005]
1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
above 50%
[cf: 100]
1.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
[cf: 100]
2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
[217.255.239.85 listed in dnsbl.sorbs.net]
2.0 RCVD_IN_NJABL_DUL RBL: NJABL: dialup sender did non-local SMTP
[217.255.239.85 listed in combined.njabl.org]
2.5 DIGEST_MULTIPLE Message hits more than one network digest check
2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name)
found
0.0 BOTNET_CLIENT Hostname looks like a client hostname
1.9 RATWARE_MS_HASH Bulk email fingerprint (msgid ms hash) found
1.7 MSGID_DOLLARS Message-Id has pattern used in spam
2.0 BOTNET The submitting mail server looks like part of a Botnet
That is just 2 of them
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the MailScanner
mailing list