MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient
Glenn Steen
glenn.steen at gmail.com
Wed Aug 6 10:06:40 IST 2014
Yes jerry, quite true.
I thought I went InnoDB for performance, but it may well have been for
stabuility... As said, this was done quite a few years ago.
For the discussion at hand, this minor point has truly taken a far to large
place though... The dominant factor, vis-a-vis performance, is most likely
the execution of SpamAssassin on the message batch (and possibly the
AV-scanning, if you use something reeeaaally slow/cumbersome), and that
will be dealt with nicely by the SA results cache.
I'll still nick your partitioning idea (and revert to myISAM) and see what
that gives me, when I get the time:-).
Cheers
--
-- Glenn
On 5 August 2014 20:06, Jerry Benton <jerry.benton at mailborder.com> wrote:
> From Baron Schwartz the author of High Performance MySQL:
>
> "The reason is very simple. When you insert a row into MyISAM, it just
> puts it into the server's memory and hopes that the server will flush it to
> disk at some point in the future. Good luck if the server crashes.
>
> When you insert a row into InnoDB it syncs the transaction durably to
> disk, and that requires it to wait for the disk to spin. Do the math on
> your system and see how long that takes.
>
> You can improve this by relaxing innodb_flush_log_at_trx_commit or by
> batching rows within a transaction instead of doing one transaction per
> row."
>
> In short, myisam is faster for inserts but InnoDB is more reliable. All of
> that ACID compliance and transaction rollback comes with an overhead cost.
> InnoDB also provides row level locking instead of table level like myisam
> and InnoDB can automatically recover from crashes. So, if you want
> reliability over performance, go with InnoDB. If you want faster inserts
> and quite often faster search results, go with MyISAM.
>
> These are mail logs and not bank records. But I suppose the level of
> important is relative.
>
>
> -
> Jerry Benton
> www.mailborder.com
>
> On Aug 5, 2014, at 11:27 AM, Jerry Benton <jerry.benton at mailborder.com>
> wrote:
>
> Caveat: You should partition the database by time. This is the Mailborder
> cp_maillog, which is slightly different than MailWatch, but the bit near
> the end is what you are looking for. You can adapt it for your table with
> an alter statement.
>
>
> CREATE TABLE IF NOT EXISTS `cp_maillog` (
> `db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
> `timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
> CURRENT_TIMESTAMP,
> `id` varchar(30) NOT NULL,
> `size` bigint(20) DEFAULT '0',
> `from_address` varchar(255) DEFAULT NULL,
> `from_domain` varchar(255) DEFAULT NULL,
> `to_address` varchar(255) DEFAULT NULL,
> `to_domain` varchar(255) DEFAULT NULL,
> `subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
> `clientip` varchar(15) DEFAULT NULL,
> `archive` varchar(100) DEFAULT NULL,
> `isspam` tinyint(1) DEFAULT '0',
> `ishighspam` tinyint(1) DEFAULT '0',
> `issaspam` tinyint(1) DEFAULT '0',
> `isrblspam` tinyint(1) DEFAULT '0',
> `spamwhitelisted` tinyint(1) DEFAULT '0',
> `spamblacklisted` tinyint(1) DEFAULT '0',
> `sascore` decimal(7,2) DEFAULT '0.00',
> `spamreport` text,
> `virusinfected` tinyint(1) DEFAULT '0',
> `nameinfected` tinyint(1) DEFAULT '0',
> `sizeinfected` tinyint(1) DEFAULT '0',
> `otherinfected` tinyint(1) DEFAULT '0',
> `report` text,
> `ismcp` tinyint(1) DEFAULT '0',
> `ishighmcp` tinyint(1) DEFAULT '0',
> `issamcp` tinyint(1) DEFAULT '0',
> `mcpwhitelisted` tinyint(1) DEFAULT '0',
> `mcpblacklisted` tinyint(1) DEFAULT '0',
> `mcpsascore` decimal(7,2) DEFAULT '0.00',
> `mcpreport` text,
> `hostname` varchar(100) DEFAULT NULL,
> `date` date NOT NULL DEFAULT '0000-00-00',
> `time` time DEFAULT NULL,
> `headers` text,
> `quarantined` tinyint(1) DEFAULT '0',
> `released` tinyint(1) DEFAULT '0',
> `guid` varchar(40) NOT NULL,
> PRIMARY KEY (`db_id`,`date`),
> KEY `id` (`id`),
> KEY `timestamp` (`timestamp`),
> KEY `from_address` (`from_address`),
> KEY `from_domain` (`from_domain`),
> KEY `to_address` (`to_address`),
> KEY `to_domain` (`to_domain`),
> KEY `guid` (`guid`),
> KEY `isspam` (`isspam`),
> KEY `ishighspam` (`ishighspam`),
> KEY `issaspam` (`issaspam`),
> KEY `isrblspam` (`isrblspam`),
> KEY `spamwhitelisted` (`spamwhitelisted`),
> KEY `spamblacklisted` (`spamblacklisted`),
> KEY `virusinfected` (`virusinfected`),
> KEY `nameinfected` (`nameinfected`),
> KEY `otherinfected` (`otherinfected`),
> KEY `quarantined` (`quarantined`),
> KEY `sizeinfected` (`sizeinfected`),
> KEY `ismcp` (`ismcp`),
> KEY `ishighmcp` (`ishighmcp`),
> KEY `issamcp` (`issamcp`),
> KEY `mcpwhitelisted` (`mcpwhitelisted`),
> KEY `mcpblacklisted` (`mcpblacklisted`),
> KEY `released` (`released`),
> KEY `size` (`size`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) +
> MONTH(`date`) )) PARTITIONS 70;
>
>
> -
> Jerry Benton
> www.mailborder.com
>
> On Aug 5, 2014, at 11:16 AM, Jerry Benton <jerry.benton at mailborder.com>
> wrote:
>
> Based on Mailborder design and testing, which the DB structure of
> Mailwatch is very similar, MyISAM has better performance when you start
> hitting millions of records.
>
> -
> Jerry Benton
> www.mailborder.com
>
> On Aug 5, 2014, at 10:23 AM, Randal, Phil <phil.randal at hoopleltd.co.uk>
> wrote:
>
> Does converting the MailWatch databases to InnoDB make a big difference in
> MailWatch performance?
>
> Just curious.
>
> Phil
>
>
> *From:* mailscanner-bounces at lists.mailscanner.info [
> mailto:mailscanner-bounces at lists.mailscanner.info
> <mailscanner-bounces at lists.mailscanner.info>] *On Behalf Of *Glenn Steen
> *Sent:* 05 August 2014 14:51
> *To:* MailScanner discussion
> *Subject:* Re: MailScanner Deficiency: Multi-Ruleset Processing per Email
> Recipient
>
> Can only agree with Martin and Alex, there is no way around either
> splitting mails per recipient (very feasible), or som major rework of both
> the MailScanner and mailWatch code (very infeasible).
> But I also have to agree that the increase in hardware seem quite
> excessive... i suppose you arrived at that figure by analysing the number
> of recipients per mail (and frequency of multi-recipient emails)? Well, the
> number isnät everything:-)
> Provided you use the normal caching-dns-thingy and also use "Cache
> SpamAssassin Results = yes", the actual processing time and resource use
> will be minimized (not to mention that the normal batch-processing style of
> MailScanner will ... help...:-).
> Introducing a "splitting MX" between the internet and your regular
> MailScanner hosts should be rather simple, as well as adjusting which
> Received: lines your MailScanner hosts should ignore (since they otherwise
> will perceive all messages as originating from the "splitting MX" host)...
> So why not try that, with the gear you have ATM, and see where that leads
> you? Depending on what mailstore hosts you eventually deliver to, the
> storage impact should be minimal or even non-existant, since even
> M-Sexchange has abandioned "single store" since ... way back... so every
> recipient would eventually have their own copy in their own mailbox
> anyway;-).
>
> As Alex says, we know nothing about your actual mail volume, but my money
> is on there being much less of a problem than you think, even if you do
> have ... serious traffic... (more than a few thousand mails/hour). the
> likeliest problem point/bottleneck is likely your MailWatch database so...
> keep an eye on that one, make sure you run it as InnoDB etc.
>
> Cheers!
> --
> -- Glenn
>
>
> On 11 July 2014 15:49, Martin Hepworth <maxsec at gmail.com> wrote:
> Might want to also consider having a more flexible approach as Alex had
> mentioned.
> Will also help with some of the hardware requirements as you can also
> reject non-valid recipients at MTA as well as splitting the emails up, so
> the core MailScanner farm has less to do.
>
> --
> Martin Hepworth, CISSP
> Oxford, UK
>
>
> On 11 July 2014 09:51, Sam Gelbart <samg at synaq.com> wrote:
> Hi All,
>
> We at SYNAQ use and have used Mailscanner for many years. As an Email
> Hygiene provider MailScanner has served us very well.
> However, as we have grown (very rapidly in the past 6 months, to many more
> customer domains) we have noticed some deficiencies in MailScanner.
>
> Below is a brief description covering our problem areas:
>
> Overview
> The issue has arisen due to SYNAQ's ever growing client base and the fact
> that we're provisioning more and more customers (and email domains) on our
> hygiene platform, and that more than one of these customer
> recipients/domains (and their applicable rulesets) are being addressed in
> the same email.
>
> Problem 1
> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
> 2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does
> not.
> 3) Mailscanner accepts the message for processing but "chooses"
> user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
> 4) MailScanner determines that the message is SPAM and because it has
> "chosen" @abc.co.za as the email domain it deletes the message as the
> configured spam action for @abc.coz.a is to delete.
> 5) However the rule for xyz.co.za is to store/quarantine spam. This does
> not happen because of the actions above and data is also never logged via
> MailWatch.
> 6) The example above is a based on very simple scenario, and as you are
> aware this applies to many more complex rulesets (size, File Type etc)
> across the system.
>
> Problem 2
> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
> 2) A third party emails both user at abc.co.za and user at xyz.co.za in a
> single email message.
> 3) Mailscanner accepts the message for processing but "chooses"
> user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
> 4) When the message is processed, the MailWatch.pm script receives a
> message object for SQL logging with data only for user at abc.co.za and
> abc.co.za; xyz.co.za is never logged.
>
> Finally we have considered splitting incoming messages by recipient at an
> MTA level to address this problem, but our calculations show that it would
> require 3.5x more hardware to process this increased mail load. So for us a
> MailsScanner solution is ideal.
>
> Based on the above, could you tell me if there is anything that can be
> done from a MailScanner community point of view to help develop MailScanner
> functionality to address these issues?
> We'd be very happy to give a nice donation for a fix or patch.
>
> Also if the community has any ideas on other ways we can remedy this
> problem we welcome your feedback.
>
> Thanks and regards,
>
> Sam Gelbart
> SYNAQ
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
> --
> -- Glenn
> email: glenn < dot > steen < at > gmail < dot > com
> work: glenn < dot > steen < at > ap1 < dot > se
> Hoople Ltd, Registered in England and Wales No. 7556595
> Registered office: Plough Lane, Hereford, HR4 0LE
>
> "Any opinion expressed in this e-mail or any attached files are those of
> the individual and not necessarily those of Hoople Ltd. You should be aware
> that Hoople Ltd. monitors its email service. This e-mail and any attached
> files are confidential and intended solely for the use of the addressee.
> This communication may contain material protected by law from being passed
> on. If you are not the intended recipient and have received this e-mail in
> error, you are advised that any use, dissemination, forwarding, printing or
> copying of this e-mail is strictly prohibited. If you have received this
> e-mail in error please contact the sender immediately and destroy all
> copies of it." --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
>
>
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>
>
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140806/d4f57508/attachment-0001.html
More information about the MailScanner
mailing list