MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient

Glenn Steen glenn.steen at gmail.com
Tue Aug 5 18:43:24 IST 2014


Ah, that explains it! In reality, you work with a set of rather small
database files, which of course has a lot of impact on timeebased (and
indeed most!:-) queries... A very sensible design (that wasn't possible for
Steve F at the inseption even of MaiLWatch 1.0:-) and probably not that
much work implementing in my old setup (I confess, I've been...  slow... in
adapting to the latest/greatest:-).
If/when time permits experimentation...:-)

Cheers!
-- 
-- Glenn
Den 5 aug 2014 17:58 skrev "Jerry Benton" <jerry.benton at mailborder.com>:

> Caveat: You should partition the database by time. This is the Mailborder
> cp_maillog, which is slightly different than MailWatch, but the bit near
> the end is what you are looking for. You can adapt it for your table with
> an alter statement.
>
>
> CREATE TABLE IF NOT EXISTS `cp_maillog` (
>   `db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
>   `timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
> CURRENT_TIMESTAMP,
>   `id` varchar(30) NOT NULL,
>   `size` bigint(20) DEFAULT '0',
>   `from_address` varchar(255) DEFAULT NULL,
>   `from_domain` varchar(255) DEFAULT NULL,
>   `to_address` varchar(255) DEFAULT NULL,
>   `to_domain` varchar(255) DEFAULT NULL,
>   `subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
>   `clientip` varchar(15) DEFAULT NULL,
>   `archive` varchar(100) DEFAULT NULL,
>   `isspam` tinyint(1) DEFAULT '0',
>   `ishighspam` tinyint(1) DEFAULT '0',
>   `issaspam` tinyint(1) DEFAULT '0',
>   `isrblspam` tinyint(1) DEFAULT '0',
>   `spamwhitelisted` tinyint(1) DEFAULT '0',
>   `spamblacklisted` tinyint(1) DEFAULT '0',
>   `sascore` decimal(7,2) DEFAULT '0.00',
>   `spamreport` text,
>   `virusinfected` tinyint(1) DEFAULT '0',
>   `nameinfected` tinyint(1) DEFAULT '0',
>   `sizeinfected` tinyint(1) DEFAULT '0',
>   `otherinfected` tinyint(1) DEFAULT '0',
>   `report` text,
>   `ismcp` tinyint(1) DEFAULT '0',
>   `ishighmcp` tinyint(1) DEFAULT '0',
>   `issamcp` tinyint(1) DEFAULT '0',
>   `mcpwhitelisted` tinyint(1) DEFAULT '0',
>   `mcpblacklisted` tinyint(1) DEFAULT '0',
>   `mcpsascore` decimal(7,2) DEFAULT '0.00',
>   `mcpreport` text,
>   `hostname` varchar(100) DEFAULT NULL,
>   `date` date NOT NULL DEFAULT '0000-00-00',
>   `time` time DEFAULT NULL,
>   `headers` text,
>   `quarantined` tinyint(1) DEFAULT '0',
>   `released` tinyint(1) DEFAULT '0',
>   `guid` varchar(40) NOT NULL,
>   PRIMARY KEY (`db_id`,`date`),
>   KEY `id` (`id`),
>   KEY `timestamp` (`timestamp`),
>   KEY `from_address` (`from_address`),
>   KEY `from_domain` (`from_domain`),
>   KEY `to_address` (`to_address`),
>   KEY `to_domain` (`to_domain`),
>   KEY `guid` (`guid`),
>   KEY `isspam` (`isspam`),
>   KEY `ishighspam` (`ishighspam`),
>   KEY `issaspam` (`issaspam`),
>   KEY `isrblspam` (`isrblspam`),
>   KEY `spamwhitelisted` (`spamwhitelisted`),
>   KEY `spamblacklisted` (`spamblacklisted`),
>   KEY `virusinfected` (`virusinfected`),
>   KEY `nameinfected` (`nameinfected`),
>   KEY `otherinfected` (`otherinfected`),
>   KEY `quarantined` (`quarantined`),
>   KEY `sizeinfected` (`sizeinfected`),
>   KEY `ismcp` (`ismcp`),
>   KEY `ishighmcp` (`ishighmcp`),
>   KEY `issamcp` (`issamcp`),
>   KEY `mcpwhitelisted` (`mcpwhitelisted`),
>   KEY `mcpblacklisted` (`mcpblacklisted`),
>   KEY `released` (`released`),
>   KEY `size` (`size`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) +
> MONTH(`date`) )) PARTITIONS 70;
>
>
> -
> Jerry Benton
> www.mailborder.com
>
> On Aug 5, 2014, at 11:16 AM, Jerry Benton <jerry.benton at mailborder.com>
> wrote:
>
> Based on Mailborder design and testing, which the DB structure of
> Mailwatch is very similar, MyISAM has better performance when you start
> hitting millions of records.
>
> -
> Jerry Benton
> www.mailborder.com
>
> On Aug 5, 2014, at 10:23 AM, Randal, Phil <phil.randal at hoopleltd.co.uk>
> wrote:
>
> Does converting the MailWatch databases to InnoDB make a big difference in
> MailWatch performance?
>
> Just curious.
>
> Phil
>
>
> *From:* mailscanner-bounces at lists.mailscanner.info [
> mailto:mailscanner-bounces at lists.mailscanner.info
> <mailscanner-bounces at lists.mailscanner.info>] *On Behalf Of *Glenn Steen
> *Sent:* 05 August 2014 14:51
> *To:* MailScanner discussion
> *Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
> Recipient
>
> Can only agree with Martin and Alex, there is no way around either
> splitting mails per recipient (very feasible), or som major rework of both
> the MailScanner and mailWatch code (very infeasible).
> But I also have to agree that the increase in hardware seem quite
> excessive... i suppose you arrived at that figure by analysing the number
> of recipients per mail (and frequency of multi-recipient emails)? Well, the
> number isnät everything:-)
> Provided you use the normal caching-dns-thingy and also use "Cache
> SpamAssassin Results = yes", the actual processing time and resource use
> will be minimized (not to mention that the normal batch-processing style of
> MailScanner will ... help...:-).
> Introducing a "splitting MX" between the internet and your regular
> MailScanner hosts should be rather simple, as well as adjusting which
> Received: lines your MailScanner hosts should ignore (since they otherwise
> will perceive all messages as originating from the "splitting MX" host)...
> So why not try that, with the gear you have ATM, and see where that leads
> you? Depending on what mailstore hosts you eventually deliver to, the
> storage impact should be minimal or even non-existant, since even
> M-Sexchange has abandioned "single store" since ... way back... so every
> recipient would eventually have their own copy in their own mailbox
> anyway;-).
>
> As Alex says, we know nothing about your actual mail volume, but my money
> is on there being much less of a problem than you think, even if you do
> have ... serious traffic... (more than a few thousand mails/hour). the
> likeliest problem point/bottleneck is likely your MailWatch database so...
> keep an eye on that one, make sure you run it as InnoDB etc.
>
> Cheers!
> --
> -- Glenn
>
>
> On 11 July 2014 15:49, Martin Hepworth <maxsec at gmail.com> wrote:
> Might want to also consider having a more flexible approach as Alex had
> mentioned.
> Will also help with some of the hardware requirements as you can also
> reject non-valid recipients at MTA as well as splitting the emails up, so
> the core MailScanner farm has less to do.
>
> --
> Martin Hepworth, CISSP
> Oxford, UK
>
>
> On 11 July 2014 09:51, Sam Gelbart <samg at synaq.com> wrote:
> Hi All,
>
> We at SYNAQ use and have used Mailscanner for many years. As an Email
> Hygiene provider MailScanner has served us very well.
> However, as we have grown (very rapidly in the past 6 months, to many more
> customer domains) we have noticed some deficiencies in MailScanner.
>
> Below is a brief description covering our problem areas:
>
> Overview
> The issue has arisen due to SYNAQ's ever growing client base and the fact
> that we're provisioning more and more customers (and email domains) on our
> hygiene platform, and that more than one of these customer
> recipients/domains (and their applicable rulesets) are being addressed in
> the same email.
>
> Problem 1
> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
> 2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does
> not.
> 3) Mailscanner accepts the message for processing but "chooses"
> user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
> 4) MailScanner determines that the message is SPAM and because it has
> "chosen" @abc.co.za as the email domain it deletes the message as the
> configured spam action for @abc.coz.a is to delete.
> 5) However the rule for xyz.co.za is to store/quarantine spam. This does
> not happen because of the actions above and data is also never logged via
> MailWatch.
> 6) The example above is a based on very simple scenario, and as you are
> aware this applies to many more complex rulesets (size, File Type etc)
> across the system.
>
> Problem 2
> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
> 2) A third party emails both user at abc.co.za and user at xyz.co.za in a
> single email message.
> 3) Mailscanner accepts the message for processing but "chooses"
> user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
> 4) When the message is processed, the MailWatch.pm script receives a
> message object for SQL logging with data only for user at abc.co.za and
> abc.co.za; xyz.co.za is never logged.
>
> Finally we have considered splitting incoming messages by recipient at an
> MTA level to address this problem, but our calculations show that it would
> require 3.5x more hardware to process this increased mail load. So for us a
> MailsScanner solution is ideal.
>
> Based on the above, could you tell me if there is anything that can be
> done from a MailScanner community point of view to help develop MailScanner
> functionality to address these issues?
> We'd be very happy to give a nice donation for a fix or patch.
>
> Also if the community has any ideas on other ways we can remedy this
> problem we welcome your feedback.
>
> Thanks and regards,
>
> Sam Gelbart
> SYNAQ
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
> --
> -- Glenn
> email: glenn < dot > steen < at > gmail < dot > com
> work: glenn < dot > steen < at > ap1 < dot > se
> Hoople Ltd, Registered in England and Wales No. 7556595
> Registered office: Plough Lane, Hereford, HR4 0LE
>
> "Any opinion expressed in this e-mail or any attached files are those of
> the individual and not necessarily those of Hoople Ltd. You should be aware
> that Hoople Ltd. monitors its email service. This e-mail and any attached
> files are confidential and intended solely for the use of the addressee.
> This communication may contain material protected by law from being passed
> on. If you are not the intended recipient and have received this e-mail in
> error, you are advised that any use, dissemination, forwarding, printing or
> copying of this e-mail is strictly prohibited. If you have received this
> e-mail in error please contact the sender immediately and destroy all
> copies of it." --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/c14f0026/attachment.html 


More information about the MailScanner mailing list