Is this really how bayes+autolearn works?

Scott Silva ssilva at sgvwater.com
Wed Dec 13 17:38:56 GMT 2006


Furnish, Trever G spake the following on 12/13/2006 6:37 AM:
>  
> 
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info 
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>> Of Scott Silva
>> Sent: Tuesday, December 12, 2006 5:45 PM
>> To: mailscanner at lists.mailscanner.info
>> Subject: Re: Is this really how bayes+autolearn works?
> 
>> Furnish, Trever G spake the following on 12/12/2006 1:59 PM:
>>> So Bayes is getting lots of messages that SA doesn't detect 
>> as spam, 
>>> and only a few similar messages that I train it to treat as 
>> spam.  Is 
>>> this a plausible explanation for why Bayes would consistently be 
>>> misclassifying this mail?
>>>  
>>> So far the floods start in the afternoon and the subject 
>> strings are 
>>> consistent enough that I'm able to correct the damage by:
>>>     - removing my bayes database and retraining from archived spam 
>>> corpus (slow)
>>>     - creating custom rules to, for example, filter out "Subject =~ 
>>> /Good Morning/" (dangerous)
>  
>> I also see a lot of spam coming from bots, but I consistently 
>> catch most of it. Are you using some good add-on rules?
>> Do you have any samples that some of us could run through our 
>> systems to see what we get?
> 
> Requested samples are attached.  They're very simple messages -- but
> they're flooding in without being caught and then Bayes starts to assign
> -2.60 to them. :-(
> 
> --
> Trever
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> It ready
> From:
> "Sal Oakes" <predestineborn at abz1.freeserve.co.uk>
> Date:
> Wed, 13 Dec 2006 10:25:17 -0500
> To:
> <lavelez at herff-jones.com>
> 
> To:
> <lavelez at herff-jones.com>
> 
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> inex1.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:25:37 -0500
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:25:37 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from Host (dsl54025CFD.pool.t-online.hu [84.2.92.253]) by
> relay2.public.herff-jones.com (8.12.11/8.12.11) with ESMTP id
> kBDEPI6h022465; Wed, 13 Dec 2006 09:25:20 -0500
> Received:
> from 193.252.22.141 (HELO mail-in.freeserve.com) by herff-jones.com with
> esmtp (H; 3DY5// >0N)) id 4/3*,V-N2*),Z-:, for lavelez at herff-jones.com;
> Wed, 13 Dec 2006 14:25:17 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec2$830d63d0$6c822ecf at predestineborn>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6Q941UUZ9J7Z4L5A317Q=
> 
> 
> News Alert!
> 
> Fueled by the possibility of an upcoming merger, Wild Brush 
> Energy (WBRS) is gearing up for an explosion.  Tension is 
> building and soon the scramble to take a position will push 
> this one off the charts.
> 
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
> 
> WBRS is engaged in some of the most lucrative gas regions in North 
> America.  Major discoveries are happening all the time and WBRS is in 
> the thick of it.
> 
> With the array of drilling projects Wild Brush has going on at the moment 
> tension is building.  As the drilling gets closer to completion insiders are 
> accumulating ahead of that major discovery announcement.
> 
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> It ready
> From:
> "Young Gill" <Aesop'sLima's at acadia.eng.sun.com>
> Date:
> Wed, 13 Dec 2006 10:23:13 -0500
> To:
> <kokuehl at herff-jones.com>
> 
> To:
> <kokuehl at herff-jones.com>
> 
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> inex1.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:24:10 -0500
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="windows-1250"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:24:11 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from pD9FFEAD5.dip.t-dialin.net (pD9FFEF55.dip.t-dialin.net
> [217.255.239.85]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDEN0rx021231; Wed, 13 Dec 2006 09:23:05 -0500
> Received:
> from 192.5.209.6 (HELO btmx4.sun.com) by herff-jones.com with esmtp
> (+5YBR6(>K 03; >H) id +4*841-UQ4+74-)- for kokuehl at herff-jones.com; Wed,
> 13 Dec 2006 14:23:13 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec2$395db640$6c822ecf at Aesop'sLima's>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6Q0Mwseft1NU=
> 
> 
> News Alert!
> 
> Fueled by the possibility of an upcoming merger, Wild Brush 
> Energy (WBRS) is gearing up for an explosion.  Tension is 
> building and soon the scramble to take a position will push 
> this one off the charts.
> 
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
> 
> WBRS is engaged in some of the most lucrative gas regions in North 
> America.  Major discoveries are happening all the time and WBRS is in 
> the thick of it.
> 
> With the array of drilling projects Wild Brush has going on at the moment 
> tension is building.  As the drilling gets closer to completion insiders are 
> accumulating ahead of that major discovery announcement.
> 
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> It ready
> From:
> "Hillary Raines" <incensesshoddiest at aais.org.uk>
> Date:
> Wed, 13 Dec 2006 10:31:25 -0500
> To:
> "Reeck, Alyssa A" <aareeck at herffjones.com>
> 
> To:
> "Reeck, Alyssa A" <aareeck at herffjones.com>
> 
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> leex1.herffjones.hj-int with Microsoft SMTPSVC(5.0.2195.6713); Wed, 13
> Dec 2006 08:30:10 -0600
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="windows-1250"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:30:07 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from p548B7FC7.dip.t-dialin.net (p548B7FC7.dip.t-dialin.net
> [84.139.127.199]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDETifE025143; Wed, 13 Dec 2006 09:29:45 -0500
> Received:
> from 80.243.184.9 (HELO mail.aais.org.uk) by herffjones.com with esmtp
> ((770?4RNA 860LT) id 12<06.-E8/9/E-9- for aareeck at herffjones.com; Wed,
> 13 Dec 2006 14:31:25 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec3$5ea3af30$6c822ecf at incensesshoddiest>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6QT06H1P/9M1B12T10R54NL4=
> 
> 
> News Alert!
> 
> Fueled by the possibility of an upcoming merger, Wild Brush 
> Energy (WBRS) is gearing up for an explosion.  Tension is 
> building and soon the scramble to take a position will push 
> this one off the charts.
> 
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
> 
> WBRS is engaged in some of the most lucrative gas regions in North 
> America.  Major discoveries are happening all the time and WBRS is in 
> the thick of it.
> 
> With the array of drilling projects Wild Brush has going on at the moment 
> tension is building.  As the drilling gets closer to completion insiders are 
> accumulating ahead of that major discovery announcement.
> 
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> It ready
> From:
> "Kelsey Gillis" <cuisine'sBiscay at accesswv.com>
> Date:
> Wed, 13 Dec 2006 10:31:53 -0500
> To:
> <aareeck at herff-jones.com>
> 
> To:
> <aareeck at herff-jones.com>
> 
> Received:
> from inex3.herffjones.hj-int ([192.168.10.41]) by
> leex1.herffjones.hj-int with Microsoft SMTPSVC(5.0.2195.6713); Wed, 13
> Dec 2006 08:31:07 -0600
> MIME-Version:
> 1.0
> Content-Type:
> text/plain; charset="iso-8859-1"
> Content-Transfer-Encoding:
> quoted-printable
> Received:
> from relay2.public.herff-jones.com ([192.168.252.241]) by
> inex3.herffjones.hj-int with Microsoft SMTPSVC(6.0.3790.1830); Wed, 13
> Dec 2006 09:31:05 -0500
> X-MimeOLE:
> Produced By Microsoft Exchange V6.5
> Received:
> from p548B7FC7.dip.t-dialin.net (p548B7FC7.dip.t-dialin.net
> [84.139.127.199]) by relay2.public.herff-jones.com (8.12.11/8.12.11)
> with ESMTP id kBDEUCco025494; Wed, 13 Dec 2006 09:30:13 -0500
> Received:
> from 198.185.2.67 (HELO mx2.business.mindspring.com) by herff-jones.com
> with esmtp (W/9UT0)(0 (H8)DO) id 9G65X2-OPQ/F at -C= for
> aareeck at herff-jones.com; Wed, 13 Dec 2006 14:31:53 -0060
> Content-class:
> urn:content-classes:message
> Message-ID:
> <01c71ec3$6edf4350$6c822ecf at cuisine'sBiscay>
> Thread-Topic:
> It ready
> Thread-Index:
> Aca6QQL6573SZ/1I7Y35KVU2
> 
> 
> News Alert!
> 
> Fueled by the possibility of an upcoming merger, Wild Brush 
> Energy (WBRS) is gearing up for an explosion.  Tension is 
> building and soon the scramble to take a position will push 
> this one off the charts.
> 
> Wild Brush Energy
> Symbol: WBRS
> Current Price: $0.05
> Short Term Target: $0.32
> Long Term Target: $0.80
> 
> WBRS is engaged in some of the most lucrative gas regions in North 
> America.  Major discoveries are happening all the time and WBRS is in 
> the thick of it.
> 
> With the array of drilling projects Wild Brush has going on at the moment 
> tension is building.  As the drilling gets closer to completion insiders are 
> accumulating ahead of that major discovery announcement.
> 
> Finally the market is ready for explosion
> Wednesday December 13 2006. will be a huge growth of WBRS at 1.00 am
> Get ready to make some cash today!
> 
> 
Content analysis details:   (33.4 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 BOTNET_CLIENTWORDS     Hostname contains client-like substrings
 0.0 BOTNET_IPINHOSTNAME    Hostname contains its own IP address
 1.7 SARE_MLB_Stock1        BODY: SARE_MLB_Stock1
 1.7 SARE_MLB_Stock2        BODY: SARE_MLB_Stock2
 1.0 SARE_LWHUGE            BODY: SARE_LWHUGE
 0.8 SARE_LWSHORTT          BODY: SARE_LWSHORTT
 1.7 SARE_MLB_Stock6        BODY: ML obfuscated ticker symbols
 2.4 TVD_STOCK1             BODY: Message looks like it's pushing a stock...
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5000]
 1.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
                            above 50%
                            [cf: 100]
 1.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                            [cf: 100]
 3.7 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
 2.2 DCC_CHECK              Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
 2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP address
                            [84.2.92.253 listed in dnsbl.sorbs.net]
 2.0 RCVD_IN_NJABL_DUL      RBL: NJABL: dialup sender did non-local SMTP
                            [84.2.92.253 listed in combined.njabl.org]
 2.5 DIGEST_MULTIPLE        Message hits more than one network digest check
 2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name)
                            found
 0.0 BOTNET_CLIENT          Hostname looks like a client hostname
 1.9 RATWARE_MS_HASH        Bulk email fingerprint (msgid ms hash) found
 1.7 MSGID_DOLLARS          Message-Id has pattern used in spam
 2.0 BOTNET                 The submitting mail server looks like part of a Botnet


Content analysis details:   (33.4 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 2.6 HELO_DYNAMIC_DIALIN    Relay HELO'd using suspicious hostname
                            (T-Dialin)
 0.0 BOTNET_CLIENTWORDS     Hostname contains client-like substrings
 0.0 BOTNET_IPINHOSTNAME    Hostname contains its own IP address
 1.7 SARE_MLB_Stock1        BODY: SARE_MLB_Stock1
 1.7 SARE_MLB_Stock2        BODY: SARE_MLB_Stock2
 1.0 SARE_LWHUGE            BODY: SARE_LWHUGE
 0.8 SARE_LWSHORTT          BODY: SARE_LWSHORTT
 1.7 SARE_MLB_Stock6        BODY: ML obfuscated ticker symbols
 2.4 TVD_STOCK1             BODY: Message looks like it's pushing a stock...
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5005]
 1.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
                            above 50%
                            [cf: 100]
 1.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                            [cf: 100]
 2.2 DCC_CHECK              Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
 2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP address
                            [217.255.239.85 listed in dnsbl.sorbs.net]
 2.0 RCVD_IN_NJABL_DUL      RBL: NJABL: dialup sender did non-local SMTP
                            [217.255.239.85 listed in combined.njabl.org]
 2.5 DIGEST_MULTIPLE        Message hits more than one network digest check
 2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name)
                            found
 0.0 BOTNET_CLIENT          Hostname looks like a client hostname
 1.9 RATWARE_MS_HASH        Bulk email fingerprint (msgid ms hash) found
 1.7 MSGID_DOLLARS          Message-Id has pattern used in spam
 2.0 BOTNET                 The submitting mail server looks like part of a Botnet
That is just 2 of them

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!



More information about the MailScanner mailing list