MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient

Jerry Benton jerry.benton at mailborder.com
Tue Aug 5 16:27:27 IST 2014


Caveat: You should partition the database by time. This is the Mailborder cp_maillog, which is slightly different than MailWatch, but the bit near the end is what you are looking for. You can adapt it for your table with an alter statement. 


CREATE TABLE IF NOT EXISTS `cp_maillog` (
  `db_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `id` varchar(30) NOT NULL,
  `size` bigint(20) DEFAULT '0',
  `from_address` varchar(255) DEFAULT NULL,
  `from_domain` varchar(255) DEFAULT NULL,
  `to_address` varchar(255) DEFAULT NULL,
  `to_domain` varchar(255) DEFAULT NULL,
  `subject` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `clientip` varchar(15) DEFAULT NULL,
  `archive` varchar(100) DEFAULT NULL,
  `isspam` tinyint(1) DEFAULT '0',
  `ishighspam` tinyint(1) DEFAULT '0',
  `issaspam` tinyint(1) DEFAULT '0',
  `isrblspam` tinyint(1) DEFAULT '0',
  `spamwhitelisted` tinyint(1) DEFAULT '0',
  `spamblacklisted` tinyint(1) DEFAULT '0',
  `sascore` decimal(7,2) DEFAULT '0.00',
  `spamreport` text,
  `virusinfected` tinyint(1) DEFAULT '0',
  `nameinfected` tinyint(1) DEFAULT '0',
  `sizeinfected` tinyint(1) DEFAULT '0',
  `otherinfected` tinyint(1) DEFAULT '0',
  `report` text,
  `ismcp` tinyint(1) DEFAULT '0',
  `ishighmcp` tinyint(1) DEFAULT '0',
  `issamcp` tinyint(1) DEFAULT '0',
  `mcpwhitelisted` tinyint(1) DEFAULT '0',
  `mcpblacklisted` tinyint(1) DEFAULT '0',
  `mcpsascore` decimal(7,2) DEFAULT '0.00',
  `mcpreport` text,
  `hostname` varchar(100) DEFAULT NULL,
  `date` date NOT NULL DEFAULT '0000-00-00',
  `time` time DEFAULT NULL,
  `headers` text,
  `quarantined` tinyint(1) DEFAULT '0',
  `released` tinyint(1) DEFAULT '0',
  `guid` varchar(40) NOT NULL,
  PRIMARY KEY (`db_id`,`date`),
  KEY `id` (`id`),
  KEY `timestamp` (`timestamp`),
  KEY `from_address` (`from_address`),
  KEY `from_domain` (`from_domain`),
  KEY `to_address` (`to_address`),
  KEY `to_domain` (`to_domain`),
  KEY `guid` (`guid`),
  KEY `isspam` (`isspam`),
  KEY `ishighspam` (`ishighspam`),
  KEY `issaspam` (`issaspam`),
  KEY `isrblspam` (`isrblspam`),
  KEY `spamwhitelisted` (`spamwhitelisted`),
  KEY `spamblacklisted` (`spamblacklisted`),
  KEY `virusinfected` (`virusinfected`),
  KEY `nameinfected` (`nameinfected`),
  KEY `otherinfected` (`otherinfected`),
  KEY `quarantined` (`quarantined`),
  KEY `sizeinfected` (`sizeinfected`),
  KEY `ismcp` (`ismcp`),
  KEY `ishighmcp` (`ishighmcp`),
  KEY `issamcp` (`issamcp`),
  KEY `mcpwhitelisted` (`mcpwhitelisted`),
  KEY `mcpblacklisted` (`mcpblacklisted`),
  KEY `released` (`released`),
  KEY `size` (`size`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH (( YEAR(`date`) + MONTH(`date`) )) PARTITIONS 70;


-
Jerry Benton
www.mailborder.com

On Aug 5, 2014, at 11:16 AM, Jerry Benton <jerry.benton at mailborder.com> wrote:

> Based on Mailborder design and testing, which the DB structure of Mailwatch is very similar, MyISAM has better performance when you start hitting millions of records.
> 
> -
> Jerry Benton
> www.mailborder.com
> 
> On Aug 5, 2014, at 10:23 AM, Randal, Phil <phil.randal at hoopleltd.co.uk> wrote:
> 
>> Does converting the MailWatch databases to InnoDB make a big difference in MailWatch performance?
>>  
>> Just curious.
>>  
>> Phil
>>  
>>  
>> From: mailscanner-bounces at lists.mailscanner.info [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf Of Glenn Steen
>> Sent: 05 August 2014 14:51
>> To: MailScanner discussion
>> Subject: Re: MailScanner Deficiency: Multi-Ruleset Processing per Email Recipient
>>  
>> Can only agree with Martin and Alex, there is no way around either splitting mails per recipient (very feasible), or som major rework of both the MailScanner and mailWatch code (very infeasible).
>> But I also have to agree that the increase in hardware seem quite excessive... i suppose you arrived at that figure by analysing the number of recipients per mail (and frequency of multi-recipient emails)? Well, the number isnät everything:-)
>> Provided you use the normal caching-dns-thingy and also use "Cache SpamAssassin Results = yes", the actual processing time and resource use will be minimized (not to mention that the normal batch-processing style of MailScanner will ... help...:-).
>> Introducing a "splitting MX" between the internet and your regular MailScanner hosts should be rather simple, as well as adjusting which Received: lines your MailScanner hosts should ignore (since they otherwise will perceive all messages as originating from the "splitting MX" host)... So why not try that, with the gear you have ATM, and see where that leads you? Depending on what mailstore hosts you eventually deliver to, the storage impact should be minimal or even non-existant, since even M-Sexchange has abandioned "single store" since ... way back... so every recipient would eventually have their own copy in their own mailbox anyway;-).
>>  
>> As Alex says, we know nothing about your actual mail volume, but my money is on there being much less of a problem than you think, even if you do have ... serious traffic... (more than a few thousand mails/hour). the likeliest problem point/bottleneck is likely your MailWatch database so... keep an eye on that one, make sure you run it as InnoDB etc.
>>  
>> Cheers!
>> -- 
>> -- Glenn
>>  
>> 
>> On 11 July 2014 15:49, Martin Hepworth <maxsec at gmail.com> wrote:
>> Might want to also consider having a more flexible approach as Alex had mentioned.
>> Will also help with some of the hardware requirements as you can also reject non-valid recipients at MTA as well as splitting the emails up, so the core MailScanner farm has less to do.
>> 
>> -- 
>> Martin Hepworth, CISSP
>> Oxford, UK
>>  
>> 
>> On 11 July 2014 09:51, Sam Gelbart <samg at synaq.com> wrote:
>> Hi All,
>> 
>> We at SYNAQ use and have used Mailscanner for many years. As an Email Hygiene provider MailScanner has served us very well.
>> However, as we have grown (very rapidly in the past 6 months, to many more customer domains) we have noticed some deficiencies in MailScanner.
>> 
>> Below is a brief description covering our problem areas:
>> 
>> Overview
>> The issue has arisen due to SYNAQ's ever growing client base and the fact that we're provisioning more and more customers (and email domains) on our hygiene platform, and that more than one of these customer recipients/domains (and their applicable rulesets) are being addressed in the same email.
>> 
>> Problem 1
>> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
>> 2) abc.co.za has quarantining of SPAM configured, while xyz.co.za does not.
>> 3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
>> 4) MailScanner determines that the message is SPAM and because it has "chosen" @abc.co.za as the email domain it deletes the message as the configured spam action for @abc.coz.a is to delete.
>> 5) However the rule for xyz.co.za is to store/quarantine spam. This does not happen because of the actions above and data is also never logged via MailWatch.
>> 6) The example above is a based on very simple scenario, and as you are aware this applies to many more complex rulesets (size, File Type etc) across the system.
>> 
>> Problem 2
>> 1) abc.co.za and xyz.co.za are both provisioned on our platform.
>> 2) A third party emails both user at abc.co.za and user at xyz.co.za in a single email message.
>> 3) Mailscanner accepts the message for processing but "chooses" user at abc.co.za andabc.co.za as the Message's "to_address" and "to_domain".
>> 4) When the message is processed, the MailWatch.pm script receives a message object for SQL logging with data only for user at abc.co.za and abc.co.za; xyz.co.za is never logged.
>> 
>> Finally we have considered splitting incoming messages by recipient at an MTA level to address this problem, but our calculations show that it would require 3.5x more hardware to process this increased mail load. So for us a MailsScanner solution is ideal.
>> 
>> Based on the above, could you tell me if there is anything that can be done from a MailScanner community point of view to help develop MailScanner functionality to address these issues?
>> We'd be very happy to give a nice donation for a fix or patch.
>> 
>> Also if the community has any ideas on other ways we can remedy this problem we welcome your feedback.
>> 
>> Thanks and regards,
>> 
>> Sam Gelbart
>> SYNAQ
>> 
>> 
>> --
>> MailScanner mailing list
>> mailscanner at lists.mailscanner.info
>> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>> 
>> Before posting, read http://wiki.mailscanner.info/posting
>> 
>> Support MailScanner development - buy the book off the website!
>>  
>> 
>> --
>> MailScanner mailing list
>> mailscanner at lists.mailscanner.info
>> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>> 
>> Before posting, read http://wiki.mailscanner.info/posting
>> 
>> Support MailScanner development - buy the book off the website!
>> 
>> 
>> 
>>  
>> -- 
>> -- Glenn
>> email: glenn < dot > steen < at > gmail < dot > com
>> work: glenn < dot > steen < at > ap1 < dot > se
>> Hoople Ltd, Registered in England and Wales No. 7556595
>> Registered office: Plough Lane, Hereford, HR4 0LE
>> 
>> "Any opinion expressed in this e-mail or any attached files are those of the individual and not necessarily those of Hoople Ltd. You should be aware that Hoople Ltd. monitors its email service. This e-mail and any attached files are confidential and intended solely for the use of the addressee. This communication may contain material protected by law from being passed on. If you are not the intended recipient and have received this e-mail in error, you are advised that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. If you have received this e-mail in error please contact the sender immediately and destroy all copies of it." -- 
>> MailScanner mailing list
>> mailscanner at lists.mailscanner.info
>> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>> 
>> Before posting, read http://wiki.mailscanner.info/posting
>> 
>> Support MailScanner development - buy the book off the website! 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20140805/06553749/attachment.html 


More information about the MailScanner mailing list