zmailer performance improvement

Leonardo Helman mailscanner at lists.com.ar
Thu May 11 14:27:29 IST 2006


Hi, I've been working on the zmailer part (and several things I encountered
on the way)

I've been working on the MailScanner-4.54.1 version

I'm attaching a patch (if you prefer any other format, please let me know)

Here is a ChangeLog

* ZMailer: Queue files access optimization
* ZMailer: New ReadMessageHandle function, to be used in 
           Message::Explode optimization like other MTA's
* Message.pm unused variables cleanup
* Moved chunks of code about MTA internal files to the *DiskStores.pm
* Adding of uba.ar and educ.ar to country.domains.conf 
  There should be an esc.edu.ar also, but I will not add it without 
  checking that part of the code. 
  For a more detailed discussion look http://article.gmane.org/gmane.mail.spam.rbl.surbl/2718
* The perl module Filesys::Df it's actualy not used. I remove it from MailScanner (not from the installer)


There is a little bug I've moved (not corrected) that you may fix
quicker than me.

In Message.pm, there are a few occurrences of $this->{dpath}
but 
  * dpath is an mta internal file (so Message.pm, should not know nothing about)
  * $this->{dpath} is always false because is never defined in this object.

So, each time a "$this->{dpath}" appears, I change it for store->spaceAvailable
and every time a "$this->{dpath}" is used in a "print" was changed for 
getDataFilename

I added spaceAvailable and getDataFilename to each *DiskStores.pm.


But $this->{dpath} IN store is always true, so there should be
a change in the way it works
after the patch:
  unless( $this->{dpath} ) {} # $this->{dpath} is not defined, so it will always enter the block

before the patch:
  unless( $this->{store}->spaceAvailable() ) {} # This returns $this->{dpath} (of the store) witch is always true, so it will never enter the block



I also have a patch for breaking a mail into several mails each one
with recipients with similar actions. 
I remember seeing that question for certain mailers sometimes on the list
but I'll have to work a little on that patch to addapt it to the newer 
MailScanners


Saludos
--  
Leonardo Helman
Pert Consultores
Argentina
-------------- next part --------------
diff -Naur MailScanner-4.54.1.ORIG/bin/MailScanner MailScanner-4.54.1/bin/MailScanner
--- MailScanner-4.54.1.ORIG/bin/MailScanner	Mon May  8 18:39:57 2006
+++ MailScanner-4.54.1/bin/MailScanner	Wed May 10 12:28:48 2006
@@ -63,7 +63,6 @@
 use IO::Handle;
 use Getopt::Long;
 use Time::HiRes qw ( time );
-use Filesys::Df;
 use MailScanner::Config;
 use MailScanner::CustomConfig;
 use MailScanner::GenericSpam;
@@ -162,7 +161,7 @@
 
 # Are we just printing version numbers and exiting?
 if ($Versions) {
-  my @Modules = qw/AnyDBM_File Archive::Zip Carp Convert::BinHex Convert::TNEF Data::Dumper DirHandle Fcntl File::Basename File::Copy FileHandle File::Path File::Temp Filesys::Df HTML::Entities HTML::Parser HTML::TokeParser IO IO::File IO::Pipe Mail::ClamAV Mail::Header Mail::SpamAssassin MIME::Base64 MIME::Decoder MIME::Decoder::UU MIME::Head MIME::Parser MIME::QuotedPrint MIME::Tools MIME::WordDecoder Net::CIDR POSIX SAVI Socket Sys::Syslog Time::HiRes Time::localtime/;
+  my @Modules = qw/AnyDBM_File Archive::Zip Carp Convert::BinHex Convert::TNEF Data::Dumper DirHandle Fcntl File::Basename File::Copy FileHandle File::Path File::Temp HTML::Entities HTML::Parser HTML::TokeParser IO IO::File IO::Pipe Mail::ClamAV Mail::Header Mail::SpamAssassin MIME::Base64 MIME::Decoder MIME::Decoder::UU MIME::Head MIME::Parser MIME::QuotedPrint MIME::Tools MIME::WordDecoder Net::CIDR POSIX SAVI Socket Sys::Syslog Time::HiRes Time::localtime/;
   my @Optional = qw#Convert/TNEF.pm DB_File.pm DBD/SQLite.pm DBI.pm Digest.pm Digest/HMAC.pm Digest/MD5.pm Digest/SHA1.pm Inline.pm Mail/ClamAV.pm Mail/SpamAssassin.pm Mail/SPF/Query.pm Net/CIDR/Lite.pm Net/DNS.pm Net/LDAP.pm Parse/RecDescent.pm SAVI.pm Sys/Hostname/Long.pm Test/Harness.pm Test/Simple.pm Text/Balanced.pm URI.pm#;
   my($module, $s, $v, $m);
 
diff -Naur MailScanner-4.54.1.ORIG/etc/country.domains.conf MailScanner-4.54.1/etc/country.domains.conf
--- MailScanner-4.54.1.ORIG/etc/country.domains.conf	Wed Apr 12 06:45:21 2006
+++ MailScanner-4.54.1/etc/country.domains.conf	Wed May 10 11:03:28 2006
@@ -43,11 +43,13 @@
 org.am
 com.ar
 edu.ar
+educ.ar
 gov.ar
 int.ar
 mil.ar
 net.ar
 org.ar
+uba.ar
 e164.arpa
 in-addr.arpa
 ip6.arpa
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/EximDiskStore.pm MailScanner-4.54.1/lib/MailScanner/EximDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/EximDiskStore.pm	Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/EximDiskStore.pm	Tue May  9 16:54:49 2006
@@ -589,4 +589,23 @@
 #         $this->{hpath} . "\" \"$dir\"");
 #}
 
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when 
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
+
 1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/Message.pm MailScanner-4.54.1/lib/MailScanner/Message.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/Message.pm	Fri May  5 09:54:12 2006
+++ MailScanner-4.54.1/lib/MailScanner/Message.pm	Tue May  9 16:44:25 2006
@@ -177,9 +177,9 @@
   my $type = shift;
   my($id, $queuedirname, $fake) = @_;
   my $this = {};
-  my ($queue, $workarea, $mta, $hpath, $dpath, $addr, $user, $domain);
+  # PERT-LEOH: 2006/05/09 Cleanup of unused variables: queue, workarea, hpath, dpath hfile
+  my ($mta, $addr, $user, $domain);
   my ($archiveplaces);
-  my $hfile = new FileHandle;
 
   #print STDERR "Creating message $id\n";
 
@@ -1431,7 +1431,7 @@
   
   # The whole parsing thing is totally different for sendmail & Exim for speed.
   # Many thanks for those who know themselves for this great improvement!
-  if (MailScanner::Config::Value('mta') =~ /sendmail|exim|postfix/i) {
+  if (MailScanner::Config::Value('mta') =~ /sendmail|exim|postfix|zmailer/i) {
 
     #
     # This is for sendmail and Exim systems
@@ -1454,7 +1454,10 @@
     close($handle);
 
     if (!$entity && !MIME::Entity::MailScannerCounter()>=$maxparts) {
-      unless ($this->{dpath}) {
+      # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to 
+      #                 the store if it ran out of disk space (besides dpath 
+      #                 is never set in $this)
+      unless ($this->{store}->spaceAvailable()) {
         # It probably ran out of disk space, drop this message from the batch
         MailScanner::Log::WarnLog("Failed to create message structures for %s" .
           ", dropping it from the batch", $this->{id});
@@ -1464,8 +1467,11 @@
         return;                                                         
       }
 
+      # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to 
+      #                 the store for a distinctive file name (besides dpath 
+      #                 is never set in $this)
       MailScanner::Log::WarnLog("Cannot parse " . $this->{headerspath} . " and " .
-                   $this->{dpath} . ", $@");
+                   $this->{store}->getDataFilename() . ", $@");
       $this->{entity} = $entity; # In case it failed due to too many attachments
       $this->{cantparse} = 1;
       $this->{otherinfected} = 1;
@@ -1514,7 +1520,10 @@
       #print STDERR "Found an error!\n";
       $pipe->close() if $pipe; # Don't close a pipe that failed to exist
       waitpid $pid, 0;
-      unless ($this->{dpath}) {
+      # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to 
+      #                 the store if it ran out of disk space (besides dpath 
+      #                 is never set in $this)
+      unless ($this->{store}->spaceAvailable()) {
         # It probably ran out of disk space, drop this message from the batch
         MailScanner::Log::WarnLog("Failed to create message structures for %s" .
           ", dropping it from the batch", $this->{id});
@@ -1524,8 +1533,11 @@
         return;                                                         
       }
 
+       # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to 
+       #                 the store if it ran out of disk space (besides dpath 
+       #                 is never set in $this)
        MailScanner::Log::WarnLog("Cannot parse " . $this->{headerspath} .
-                                 " and " .  $this->{dpath} . ", $@");
+                                 " and " .  $this->{store}->getDataFilename() . ", $@");
        $this->{entity} = $entity;# In case it failed due to too many attachments
        $this->{cantparse} = 1;
        $this->{otherinfected} = 1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/PFDiskStore.pm MailScanner-4.54.1/lib/MailScanner/PFDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/PFDiskStore.pm	Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/PFDiskStore.pm	Tue May  9 16:55:24 2006
@@ -668,6 +668,24 @@
   return "$dir/$hdfile";
 }
 
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when 
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
 package Body;
 
 # Stefan Baltus, October 2003
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/SMDiskStore.pm MailScanner-4.54.1/lib/MailScanner/SMDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/SMDiskStore.pm	Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/SMDiskStore.pm	Tue May  9 16:56:05 2006
@@ -543,4 +543,22 @@
 #         $this->{hpath} . "\" \"$dir\"");
 #}
 
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when 
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+  my $this= shift;
+
+  return $this->{dpath};
+}
+
 1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/ZMDiskStore.pm MailScanner-4.54.1/lib/MailScanner/ZMDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/ZMDiskStore.pm	Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/ZMDiskStore.pm	Wed May 10 13:59:40 2006
@@ -221,11 +221,11 @@
                                 "message %s, %s", $message->{id}, $!);
 
   if( $this->{body}[0] eq "ORIGINAL" ) {
-    my $b= Body->new( $this->{hdpath} );
+    my $b= Body->new( $this->{inhdhandle} );
     $b->Start();
     my $line;
     #print STDERR "originalBody\n";
-    while( $line= $b->Next() ) {
+    while( defined ($line= $b->Next()) ) {
       $Tf->print($line);
     #print STDERR "BODY:  $line";
     }
@@ -300,7 +300,7 @@
   my $this = shift;
   my($body, $max) = @_;
 
-  my $b= Body->new( $this->{hdpath} );
+  my $b= Body->new( $this->{inhdhandle} );
   $b->Start();
   my $line;
   if ($max) {
@@ -423,38 +423,93 @@
   return "$dir/$hdfile";
 }
 
+
+# Writes the whole message to a handle.
+# Need to be passed the message to find the headers path
+# as it's not part of the DiskStore.
+# PERT-LEOH: We could optimize saving file position in ReadQf, 
+# and accessing the queue file, directly
+sub ReadMessageHandle {
+  my $this = shift;
+  my ($message, $handle) = @_;
+
+  # Where did we start?
+  my $oldpos = $this->{inhdhandle}->getpos();
+
+  # Write the whole message in RFC822 format to the handle.
+  # That means 1 CR-terminated line for every N record in the file.
+  my $b= Body->new( $this->{inhdhandle} );
+  $b->Start(1);
+  my $line;
+  #print STDERR "originalBody\n";
+  while(defined($line = $b->Next())) {
+    print $handle $line or MailScanner::Log::DieLog("Cannot print " . $this->{hdpath} . " into handle $!, $^E" );
+    #print STDERR "BODY:  $line";
+  }
+  $b->Done();
+
+  # rewind tmpfile to read it later
+  $handle->seek(0,0) or MailScanner::Log::DieLog("Cannot rewind handle $!, $^E" );
+
+  # rewind source files
+  $this->{inhdhandle}->setpos($oldpos);
+
+  #print STDERR "Done ReadMessageHandle\n";
+  return 1;
+}
+
+
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when 
+# the disk is full (I couldn't find where)
+# In the case of ZMailer, I have to write it myself
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+  my $this= shift;
+
+  return $this->{hdpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+  my $this= shift;
+
+  return $this->{hdpath};
+}
+
 package Body;
 
 use FileHandle;
 
 sub new {
   my $type = shift;
-  my ( $hdpathname )=@_;
+  my ( $handle )=@_;
   my $self=();
-  my $handle= new FileHandle "<$hdpathname";
   if( defined $handle ) {
-    $self={ _hdpathname => $hdpathname, 
+    $self={ 
             _handle     => $handle,
             _startpos   => -1 };
 
     bless $self, $type;
   } else {
-    MailScanner::Log::DieLog("Cannot open %s, %s",
-                                $hdpathname, $!);
+    MailScanner::Log::DieLog("Cannot open handle, %s", $!);
   }
   return $self;
 }
 
 sub Start {
-  my ( $this )=@_;
+  my ( $this, $entiremessage )=@_;
   if( $$this{_startpos} == -1 ) {
+    seek $$this{_handle}, 0, 0;         # reset the handle
     my $InHeader = 0;
     #print STDERR "Start\n";
+    #TODO: OPTIMIZATION: Save startpos to header/body beginning in ReadQf, so we don't have to search it every time
     while($_=$$this{_handle}->getline) {
       #print STDERR "Start LEIDO: $_";
       chomp; # Chomp everything now. We can easily add it back later.
-      s/\015/ /g; # Sanitise everything by removing all embedded <CR>s
-      if ( /^env-end$|^env-eof$/i ) { # The envelope ends here, starting hdr
+      if ( /^env-end\015?$|^env-eof\015?$/i ) { # The envelope ends here, starting hdr
+	last if ($entiremessage);
         $InHeader=1;
         #print STDERR "InHeader\n";
         next;


More information about the MailScanner mailing list