zmailer performance improvement
Leonardo Helman
mailscanner at lists.com.ar
Thu May 11 14:27:29 IST 2006
Hi, I've been working on the zmailer part (and several things I encountered
on the way)
I've been working on the MailScanner-4.54.1 version
I'm attaching a patch (if you prefer any other format, please let me know)
Here is a ChangeLog
* ZMailer: Queue files access optimization
* ZMailer: New ReadMessageHandle function, to be used in
Message::Explode optimization like other MTA's
* Message.pm unused variables cleanup
* Moved chunks of code about MTA internal files to the *DiskStores.pm
* Adding of uba.ar and educ.ar to country.domains.conf
There should be an esc.edu.ar also, but I will not add it without
checking that part of the code.
For a more detailed discussion look http://article.gmane.org/gmane.mail.spam.rbl.surbl/2718
* The perl module Filesys::Df it's actualy not used. I remove it from MailScanner (not from the installer)
There is a little bug I've moved (not corrected) that you may fix
quicker than me.
In Message.pm, there are a few occurrences of $this->{dpath}
but
* dpath is an mta internal file (so Message.pm, should not know nothing about)
* $this->{dpath} is always false because is never defined in this object.
So, each time a "$this->{dpath}" appears, I change it for store->spaceAvailable
and every time a "$this->{dpath}" is used in a "print" was changed for
getDataFilename
I added spaceAvailable and getDataFilename to each *DiskStores.pm.
But $this->{dpath} IN store is always true, so there should be
a change in the way it works
after the patch:
unless( $this->{dpath} ) {} # $this->{dpath} is not defined, so it will always enter the block
before the patch:
unless( $this->{store}->spaceAvailable() ) {} # This returns $this->{dpath} (of the store) witch is always true, so it will never enter the block
I also have a patch for breaking a mail into several mails each one
with recipients with similar actions.
I remember seeing that question for certain mailers sometimes on the list
but I'll have to work a little on that patch to addapt it to the newer
MailScanners
Saludos
--
Leonardo Helman
Pert Consultores
Argentina
-------------- next part --------------
diff -Naur MailScanner-4.54.1.ORIG/bin/MailScanner MailScanner-4.54.1/bin/MailScanner
--- MailScanner-4.54.1.ORIG/bin/MailScanner Mon May 8 18:39:57 2006
+++ MailScanner-4.54.1/bin/MailScanner Wed May 10 12:28:48 2006
@@ -63,7 +63,6 @@
use IO::Handle;
use Getopt::Long;
use Time::HiRes qw ( time );
-use Filesys::Df;
use MailScanner::Config;
use MailScanner::CustomConfig;
use MailScanner::GenericSpam;
@@ -162,7 +161,7 @@
# Are we just printing version numbers and exiting?
if ($Versions) {
- my @Modules = qw/AnyDBM_File Archive::Zip Carp Convert::BinHex Convert::TNEF Data::Dumper DirHandle Fcntl File::Basename File::Copy FileHandle File::Path File::Temp Filesys::Df HTML::Entities HTML::Parser HTML::TokeParser IO IO::File IO::Pipe Mail::ClamAV Mail::Header Mail::SpamAssassin MIME::Base64 MIME::Decoder MIME::Decoder::UU MIME::Head MIME::Parser MIME::QuotedPrint MIME::Tools MIME::WordDecoder Net::CIDR POSIX SAVI Socket Sys::Syslog Time::HiRes Time::localtime/;
+ my @Modules = qw/AnyDBM_File Archive::Zip Carp Convert::BinHex Convert::TNEF Data::Dumper DirHandle Fcntl File::Basename File::Copy FileHandle File::Path File::Temp HTML::Entities HTML::Parser HTML::TokeParser IO IO::File IO::Pipe Mail::ClamAV Mail::Header Mail::SpamAssassin MIME::Base64 MIME::Decoder MIME::Decoder::UU MIME::Head MIME::Parser MIME::QuotedPrint MIME::Tools MIME::WordDecoder Net::CIDR POSIX SAVI Socket Sys::Syslog Time::HiRes Time::localtime/;
my @Optional = qw#Convert/TNEF.pm DB_File.pm DBD/SQLite.pm DBI.pm Digest.pm Digest/HMAC.pm Digest/MD5.pm Digest/SHA1.pm Inline.pm Mail/ClamAV.pm Mail/SpamAssassin.pm Mail/SPF/Query.pm Net/CIDR/Lite.pm Net/DNS.pm Net/LDAP.pm Parse/RecDescent.pm SAVI.pm Sys/Hostname/Long.pm Test/Harness.pm Test/Simple.pm Text/Balanced.pm URI.pm#;
my($module, $s, $v, $m);
diff -Naur MailScanner-4.54.1.ORIG/etc/country.domains.conf MailScanner-4.54.1/etc/country.domains.conf
--- MailScanner-4.54.1.ORIG/etc/country.domains.conf Wed Apr 12 06:45:21 2006
+++ MailScanner-4.54.1/etc/country.domains.conf Wed May 10 11:03:28 2006
@@ -43,11 +43,13 @@
org.am
com.ar
edu.ar
+educ.ar
gov.ar
int.ar
mil.ar
net.ar
org.ar
+uba.ar
e164.arpa
in-addr.arpa
ip6.arpa
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/EximDiskStore.pm MailScanner-4.54.1/lib/MailScanner/EximDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/EximDiskStore.pm Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/EximDiskStore.pm Tue May 9 16:54:49 2006
@@ -589,4 +589,23 @@
# $this->{hpath} . "\" \"$dir\"");
#}
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
+
1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/Message.pm MailScanner-4.54.1/lib/MailScanner/Message.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/Message.pm Fri May 5 09:54:12 2006
+++ MailScanner-4.54.1/lib/MailScanner/Message.pm Tue May 9 16:44:25 2006
@@ -177,9 +177,9 @@
my $type = shift;
my($id, $queuedirname, $fake) = @_;
my $this = {};
- my ($queue, $workarea, $mta, $hpath, $dpath, $addr, $user, $domain);
+ # PERT-LEOH: 2006/05/09 Cleanup of unused variables: queue, workarea, hpath, dpath hfile
+ my ($mta, $addr, $user, $domain);
my ($archiveplaces);
- my $hfile = new FileHandle;
#print STDERR "Creating message $id\n";
@@ -1431,7 +1431,7 @@
# The whole parsing thing is totally different for sendmail & Exim for speed.
# Many thanks for those who know themselves for this great improvement!
- if (MailScanner::Config::Value('mta') =~ /sendmail|exim|postfix/i) {
+ if (MailScanner::Config::Value('mta') =~ /sendmail|exim|postfix|zmailer/i) {
#
# This is for sendmail and Exim systems
@@ -1454,7 +1454,10 @@
close($handle);
if (!$entity && !MIME::Entity::MailScannerCounter()>=$maxparts) {
- unless ($this->{dpath}) {
+ # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to
+ # the store if it ran out of disk space (besides dpath
+ # is never set in $this)
+ unless ($this->{store}->spaceAvailable()) {
# It probably ran out of disk space, drop this message from the batch
MailScanner::Log::WarnLog("Failed to create message structures for %s" .
", dropping it from the batch", $this->{id});
@@ -1464,8 +1467,11 @@
return;
}
+ # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to
+ # the store for a distinctive file name (besides dpath
+ # is never set in $this)
MailScanner::Log::WarnLog("Cannot parse " . $this->{headerspath} . " and " .
- $this->{dpath} . ", $@");
+ $this->{store}->getDataFilename() . ", $@");
$this->{entity} = $entity; # In case it failed due to too many attachments
$this->{cantparse} = 1;
$this->{otherinfected} = 1;
@@ -1514,7 +1520,10 @@
#print STDERR "Found an error!\n";
$pipe->close() if $pipe; # Don't close a pipe that failed to exist
waitpid $pid, 0;
- unless ($this->{dpath}) {
+ # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to
+ # the store if it ran out of disk space (besides dpath
+ # is never set in $this)
+ unless ($this->{store}->spaceAvailable()) {
# It probably ran out of disk space, drop this message from the batch
MailScanner::Log::WarnLog("Failed to create message structures for %s" .
", dropping it from the batch", $this->{id});
@@ -1524,8 +1533,11 @@
return;
}
+ # PERT-LEOH: 2006/05/09 We do not always have dpath file, so we ask to
+ # the store if it ran out of disk space (besides dpath
+ # is never set in $this)
MailScanner::Log::WarnLog("Cannot parse " . $this->{headerspath} .
- " and " . $this->{dpath} . ", $@");
+ " and " . $this->{store}->getDataFilename() . ", $@");
$this->{entity} = $entity;# In case it failed due to too many attachments
$this->{cantparse} = 1;
$this->{otherinfected} = 1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/PFDiskStore.pm MailScanner-4.54.1/lib/MailScanner/PFDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/PFDiskStore.pm Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/PFDiskStore.pm Tue May 9 16:55:24 2006
@@ -668,6 +668,24 @@
return "$dir/$hdfile";
}
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
package Body;
# Stefan Baltus, October 2003
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/SMDiskStore.pm MailScanner-4.54.1/lib/MailScanner/SMDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/SMDiskStore.pm Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/SMDiskStore.pm Tue May 9 16:56:05 2006
@@ -543,4 +543,22 @@
# $this->{hpath} . "\" \"$dir\"");
#}
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when
+# the disk is full (I couldn't find where)
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+ my $this= shift;
+
+ return $this->{dpath};
+}
+
1;
diff -Naur MailScanner-4.54.1.ORIG/lib/MailScanner/ZMDiskStore.pm MailScanner-4.54.1/lib/MailScanner/ZMDiskStore.pm
--- MailScanner-4.54.1.ORIG/lib/MailScanner/ZMDiskStore.pm Wed Apr 12 06:45:15 2006
+++ MailScanner-4.54.1/lib/MailScanner/ZMDiskStore.pm Wed May 10 13:59:40 2006
@@ -221,11 +221,11 @@
"message %s, %s", $message->{id}, $!);
if( $this->{body}[0] eq "ORIGINAL" ) {
- my $b= Body->new( $this->{hdpath} );
+ my $b= Body->new( $this->{inhdhandle} );
$b->Start();
my $line;
#print STDERR "originalBody\n";
- while( $line= $b->Next() ) {
+ while( defined ($line= $b->Next()) ) {
$Tf->print($line);
#print STDERR "BODY: $line";
}
@@ -300,7 +300,7 @@
my $this = shift;
my($body, $max) = @_;
- my $b= Body->new( $this->{hdpath} );
+ my $b= Body->new( $this->{inhdhandle} );
$b->Start();
my $line;
if ($max) {
@@ -423,38 +423,93 @@
return "$dir/$hdfile";
}
+
+# Writes the whole message to a handle.
+# Need to be passed the message to find the headers path
+# as it's not part of the DiskStore.
+# PERT-LEOH: We could optimize saving file position in ReadQf,
+# and accessing the queue file, directly
+sub ReadMessageHandle {
+ my $this = shift;
+ my ($message, $handle) = @_;
+
+ # Where did we start?
+ my $oldpos = $this->{inhdhandle}->getpos();
+
+ # Write the whole message in RFC822 format to the handle.
+ # That means 1 CR-terminated line for every N record in the file.
+ my $b= Body->new( $this->{inhdhandle} );
+ $b->Start(1);
+ my $line;
+ #print STDERR "originalBody\n";
+ while(defined($line = $b->Next())) {
+ print $handle $line or MailScanner::Log::DieLog("Cannot print " . $this->{hdpath} . " into handle $!, $^E" );
+ #print STDERR "BODY: $line";
+ }
+ $b->Done();
+
+ # rewind tmpfile to read it later
+ $handle->seek(0,0) or MailScanner::Log::DieLog("Cannot rewind handle $!, $^E" );
+
+ # rewind source files
+ $this->{inhdhandle}->setpos($oldpos);
+
+ #print STDERR "Done ReadMessageHandle\n";
+ return 1;
+}
+
+
+# PERT-LEOH 2006/05/09 Julian Field said dpath should be unset when
+# the disk is full (I couldn't find where)
+# In the case of ZMailer, I have to write it myself
+# The idea is that this function returns false if there was some kind
+# of problem with the space
+sub spaceAvailable {
+ my $this= shift;
+
+ return $this->{hdpath};
+}
+
+# PERT-LEOH 2006/05/09 This function must return a printable name for
+# the data file, it should be here because not all the mailers use dpath
+sub getDataFilename {
+ my $this= shift;
+
+ return $this->{hdpath};
+}
+
package Body;
use FileHandle;
sub new {
my $type = shift;
- my ( $hdpathname )=@_;
+ my ( $handle )=@_;
my $self=();
- my $handle= new FileHandle "<$hdpathname";
if( defined $handle ) {
- $self={ _hdpathname => $hdpathname,
+ $self={
_handle => $handle,
_startpos => -1 };
bless $self, $type;
} else {
- MailScanner::Log::DieLog("Cannot open %s, %s",
- $hdpathname, $!);
+ MailScanner::Log::DieLog("Cannot open handle, %s", $!);
}
return $self;
}
sub Start {
- my ( $this )=@_;
+ my ( $this, $entiremessage )=@_;
if( $$this{_startpos} == -1 ) {
+ seek $$this{_handle}, 0, 0; # reset the handle
my $InHeader = 0;
#print STDERR "Start\n";
+ #TODO: OPTIMIZATION: Save startpos to header/body beginning in ReadQf, so we don't have to search it every time
while($_=$$this{_handle}->getline) {
#print STDERR "Start LEIDO: $_";
chomp; # Chomp everything now. We can easily add it back later.
- s/\015/ /g; # Sanitise everything by removing all embedded <CR>s
- if ( /^env-end$|^env-eof$/i ) { # The envelope ends here, starting hdr
+ if ( /^env-end\015?$|^env-eof\015?$/i ) { # The envelope ends here, starting hdr
+ last if ($entiremessage);
$InHeader=1;
#print STDERR "InHeader\n";
next;
More information about the MailScanner
mailing list