Automatic download of extra SA rule sets

Gerry Doris gdoris at ROGERS.COM
Tue Jan 20 01:45:36 GMT 2004


On Mon, 19 Jan 2004, Stephen Swaney wrote:

> Chris Thielen has written a VERY complete and well thought out script to
> download the most commonly used SA rules files and posted a link to his
> script on the SA mail list:
>
>         http://sandgnat.com/cmos/rules_du_jour
>
> I have tested this script and it required only minor configuration changes
> to work with MailScanner. It would also be very easy to extend the script to
> get additional Rule Sets.
>
> A couple of caveats:
>
> 1. Test first with the Debug flag set.
> 2. my /etc/mail/spamassassin/local.cf was very old (and not needed). This
> kept spamassassin --lint from running with out errors. I removed the file
> and all was well.
>
> 3. Saving the file from a web browser created some problems, run:
>
>         wget    http://sandgnat.com/cmos/rules_du_jour
>
> to get the file.
>
> Steve

For what it's worth I've made a couple of changes to the script...
- there was a small typo in one of weeds.cf download sections.  You got
weeds.cf instead of weeds_2.cf if you activated weeds_2.
- I changed the spamassassin restart to MailScanner reload
- and since we're going spamassassin rule crazy I added their latest
evilnumbers.cf rule set.

The updated script is attached.

--
Gerry

"The lyfe so short, the craft so long to learne"  Chaucer
-------------- next part --------------
#!/bin/bash
# Version 1.04

## This file updates SpamAssassin RuleSet files from the internet.
## 
## It is important that you *only* automatically update 
## RuleSet files from people that you trust and that you
## *TEST* this.
## 
## Note: When running this script interactively, debug mode is enable to allow you to view the results.


# Usage instructions:
# 1) Choose rulesets to update (TRUSTED_RULESETS below)
# 2) Configure Local SpamAssassin settings (SA_DIR, MAIL_ADDRESS, SA_RESTART below)
# 3) Run this script periodically (manually or crontab)
# 3a) To run manually, first make it executable (chmod +x rules_du_jour) then execute (./rules_du_jour)
# 3b) To run via cron, edit your cron (crontab -e) and add a line such as this:
#     28 2 * * *                                      /root/bin/rules_du_jour
#     The crontab line above runs /root/bin/rules_du_jour at 2:28AM every day. (choose a different time, please)
#     Make sure the user who's crontab you are editing has permission to write files to the SA config dir.


# Choose Rulesets from this list: 
# BIGEVIL TRIPWIRE POPCORN BACKHAIR WEEDS1 WEEDS2 CHICKENPOX

# IMPORTANT: Edit this line to choose which RuleSets to update
TRUSTED_RULESETS="BIGEVIL TRIPWIRE POPCORN BACKHAIR WEEDS2 CHICKENPOX EVILNUMBERS";


#### Local SpamAssassin/system Settings ####
#### Modify these to match your system. ####
SA_DIR="/etc/mail/spamassassin";                     # Change this to your SA local config 
                                                # directory, probably /etc/mail/spamassassin.
						# For amavisd chrooted, this may be:
						# /var/amavisd/etc/mail/spamassassin
MAIL_ADDRESS="root";                            # Where do Email notifications go
SA_RESTART="/etc/rc.d/init.d/MailScanner reload";  # Command used to restart spamd
                                                # May be /etc/rc.d/init.d/spamassassin restart
						# For amavisd, may be /etc/init.d/amavisd restart

# DEBUG="true";                                   # Uncomment this to turn debug mode on (or use -D)
#### End Local SpamAssassin Settings    ####



TMPDIR="${SA_DIR}/RulesDuJour";                 # Where we store old rulesets.  If you delete
                                                # this directory, RuleSets may be detected as
						# out of date the next time you run rules_du_jour.

#### CF Files information ####
# These are bash Array Variables ("man bash" for more information)
declare -a CF_URLS;                             # Array that contains URLs of the files.
declare -a CF_FILES;                            # Local name of the CF file; eg: bigevil.cf
declare -a CF_NAMES;                            # Happy Name of CF file; eg: "Big Evil"
declare -a PARSE_NEW_VER_SCRIPTS;               # Command to run on the file to retrieve new version info
declare -a CF_MUNGE_SCRIPTS;                    # This (optionally) modifies the file; eg: lower scores


#########################################
####     Begin Rules File Registry   ####
#########################################

# If you add more RuleSets to your own registry, please contribute the settings to the www.exit0.us wiki
# http://www.exit0.us/index.php/RulesDuJourRuleSets

#### Here are settings for Tripwire. ####

TRIPWIRE=0; # Index of Tripwire data into the arrays is 0

              CF_URLS[0]="http://www.merchantsoverseas.com/wwwroot/gorilla/99_FVGT_Tripwire.cf";
             CF_FILES[0]="tripwire.cf";
             CF_NAMES[0]="TripWire";
PARSE_NEW_VER_SCRIPTS[0]="grep -i '^[         ]*#.*version' | sort | tail -n1";
     CF_MUNGE_SCRIPTS[0]="sed -e s/FVGT_TRIPWIRE_/TW_/g"; # shorten long names to workaround large mail header length

#### Here are settings for Big Evil. ####

BIGEVIL=1; # Index of Big Evil is 1

              CF_URLS[1]="http://www.merchantsoverseas.com/wwwroot/gorilla/bigevil.cf";
             CF_FILES[1]="bigevil.cf";
             CF_NAMES[1]="Big Evil";
PARSE_NEW_VER_SCRIPTS[1]="head -n1";

#### Here are settings for Popcorn. ####

POPCORN=2; # Index of Popcorn is 2

              CF_URLS[2]="http://www.emtinc.net/includes/popcorn.cf";
             CF_FILES[2]="popcorn.cf";
             CF_NAMES[2]="Jennifer's Popcorn";
PARSE_NEW_VER_SCRIPTS[2]="grep -i '^[         ]*#.*version[         ]*[0-9]' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[2]="nothing, yet";
# TODO: Manipulate the scores.

#### Here are settings for Backhair. ####

BACKHAIR=3; # Index of Backhair is 3

              CF_URLS[3]="http://www.emtinc.net/includes/backhair.cf";
             CF_FILES[3]="backhair.cf";
             CF_NAMES[3]="Jennifer's Backhair"; # ;-)
PARSE_NEW_VER_SCRIPTS[3]="grep -i '^[         ]*#.*version[         ]*[0-9]' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[3]="nothing, yet";
# TODO: Manipulate the scores.

#### Here are settings for Weeds 1. ####

WEEDS1=4; # Index of Weeds Set 1 is 4

              CF_URLS[4]="http://www.emtinc.net/includes/weeds.cf";
             CF_FILES[4]="weeds.cf";
             CF_NAMES[4]="Jennifer's Weeds Set (1)";
PARSE_NEW_VER_SCRIPTS[4]="grep -i '^[         ]*#.*version[         ]*[0-9]' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[4]="nothing, yet";
# TODO: Manipulate the scores.

#### Here are settings for Weeds 2. ####

WEEDS2=5; # Index of Weeds Set 2 is 5

              CF_URLS[5]="http://www.emtinc.net/includes/weeds_2.cf";
             CF_FILES[5]="weeds_2.cf";
             CF_NAMES[5]="Jennifer's Weeds Set (2)";
PARSE_NEW_VER_SCRIPTS[5]="grep -i '^[         ]*#.*version[         ]*[0-9]' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[5]="nothing, yet";
# TODO: Manipulate the scores.

#### Here are settings for ChickenPox. ####

CHICKENPOX=6; # Index of ChickenPox is 6

              CF_URLS[6]="http://www.emtinc.net/includes/chickenpox.cf";
             CF_FILES[6]="chickenpox.cf";
             CF_NAMES[6]="Jennifer's ChickenPox";
PARSE_NEW_VER_SCRIPTS[6]="grep -i '^[         ]*#.*version[         ]*[0-9]' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[6]="nothing, yet";
# TODO: Manipulate the scores.


#### Here are settings for EvilNumbers. ####

EVILNUMBERS=7; # Index of EvilNumbers is 7

              CF_URLS[7]="http://www.merchantsoverseas.com/wwwroot/gorilla/evilnumbers.cf";
             CF_FILES[7]="evilnumbers.cf";
             CF_NAMES[7]="Yackley's EvilNumbers";
PARSE_NEW_VER_SCRIPTS[7]="grep -i '^[         ]*#.*version' | sort | tail -n1";
#    CF_MUNGE_SCRIPTS[7]="nothing, yet";
# TODO: Manipulate the scores.


#########################################
####     End Rules File Registry     ####
#########################################

# Do not update beyond this line unless you know what you are doing.






#########################################
####     Begin rules update code     ####
#########################################

# if invoked with -D, enable DEBUG here.
[ "$1" = "-D" ] && DEBUG="true";
# if running interactively, enable DEBUG here.
[ -t 0 ] && DEBUG="true";

# If we're not running interactively, add a random delay here. This should
# help reduce spikes on the servers hosting the rulesets (Thanks, Bob)
MAXDELAY=3600;
DELAY=0;
[ ! -t 0 ] && [ ${MAXDELAY} -gt 0 ] && let DELAY="${RANDOM} % ${MAXDELAY}";
[ "${DEBUG}" ] && [ ${DELAY} -gt 0 ] && echo "Probably running from cron... sleeping for a random interval (${DELAY} seconds)";
[ ${DELAY} -gt 0 ] && sleep ${DELAY};


# Save old working dir
OLDDIR=`pwd`;

# This variable is used to indicate if we should restart spamd. Currently empty (false).
RESTART_REQUIRED="";

[ "${DEBUG}" ] && [ -e ${TMPDIR} ] && echo "Temporary directory already existed: ${TMPDIR}";
[ "${DEBUG}" ] && [ ! -e ${TMPDIR} ] && echo "Temporary directory doesn't exist; creating: ${TMPDIR}";
[ ! -e ${TMPDIR} ] && mkdir ${TMPDIR};

[ "${DEBUG}" ] && echo "Changing to temporary directory: ${TMPDIR}";
cd ${TMPDIR};

for RULESET_NAME in ${TRUSTED_RULESETS} ; do

    INDEX=${!RULESET_NAME};
    CF_URL=${CF_URLS[${INDEX}]};
    CF_FILE=${CF_FILES[${INDEX}]};
    CF_NAME=${CF_NAMES[${INDEX}]};
    PARSE_NEW_VER_SCRIPT=${PARSE_NEW_VER_SCRIPTS[${INDEX}]};
    CF_MUNGE_SCRIPT=${CF_MUNGE_SCRIPTS[${INDEX}]};

    CF_BASENAME=`basename ${CF_URL}`;

    DATE=`date +"%Y%m%d-%H%M"`

    if [ "${DEBUG}" ] ; then
        echo "";
        echo "------ ${RULESET_NAME} ------";
        echo "RULESET_NAME=${RULESET_NAME}";
        echo "INDEX=${INDEX}";
        echo "CF_URL=${CF_URL}";
        echo "CF_FILE=${CF_FILE}";
        echo "CF_NAME=${CF_NAME}";
        echo "PARSE_NEW_VER_SCRIPT=${PARSE_NEW_VER_SCRIPT}";
        echo "CF_MUNGE_SCRIPT=${CF_MUNGE_SCRIPT}";
    fi

    [ "${DEBUG}" ] && [ -f ${TMPDIR}/${CF_BASENAME} ] && echo "Old ${CF_BASENAME} already existed in ${TMPDIR}...";
    [ "${DEBUG}" ] && [ ! -f ${TMPDIR}/${CF_BASENAME} ] && [ ! -f ${SA_DIR}/${CF_FILE} ] && \
        echo "This is the first time downloading ${CF_BASENAME}...";

    [ "${DEBUG}" ] && [ ! -f ${TMPDIR}/${CF_BASENAME} ] && [ -f ${SA_DIR}/${CF_FILE} ] && \
        echo "Copying from ${SA_DIR}/${CF_FILE} to ${TMPDIR}/${CF_BASENAME}...";
    [ ! -f ${TMPDIR}/${CF_BASENAME} ] && [ -f ${SA_DIR}/${CF_FILE} ] && cp ${SA_DIR}/${CF_FILE} ${TMPDIR}/${CF_BASENAME} && touch -r ${SA_DIR}/${CF_FILE} ${TMPDIR}/${CF_BASENAME};

    [ "${DEBUG}" ] && echo "Retrieving file from ${CF_URL}...";
    wget -N ${CF_URL} > ${TMPDIR}/wget.log 2>&1;
    grep -q 'saved' ${TMPDIR}/wget.log;
    DOWNLOADED=$?;

    # Check for 4xx
    grep -q 'ERROR 4[0-9][0-9]' ${TMPDIR}/wget.log;
    WAS404=$?;

    # Check for random failure (dns doesn't exist, etc)
    grep -i -q 'failed: ' ${TMPDIR}/wget.log;
    FAILED=$?;

    # Unset WAS404 if the file didn't return 404.
    [ ! ${WAS404} = 0 ] && WAS404=;

    # Unset FAILED if wget succeded
    [ ! ${FAILED} = 0 ] && FAILED=;

    [ "${FAILED}" ] && RULES_THAT_404ED="${RULES_THAT_404ED}\n${CF_NAME} had an unknown error: `cat ${TMPDIR}/wget.log`";
    [ "${WAS404}" ] && RULES_THAT_404ED="${RULES_THAT_404ED}\n${CF_NAME} not found at ${CF_URL}";

    [ "${DEBUG}" ] && [ ${WAS404} ] && echo "Got 404 from ${CF_NAME} (${CF_URL})...";
    [ "${DEBUG}" ] && [ ! ${WAS404} ] && ([ ${DOWNLOADED} = 0 ] && echo "New version downloaded..." || echo "${CF_BASENAME} was up to date (skipped downloading of ${CF_URL})...");

    if [ ${DOWNLOADED} = 0 ] ; then
       if [ "${CF_MUNGE_SCRIPT}" ] ; then
         [ "${DEBUG}" ] && echo "Munging output using command: ${CF_MUNGE_SCRIPT}";
         sh -c "${CF_MUNGE_SCRIPT}" < ${TMPDIR}/${CF_BASENAME} > ${TMPDIR}/${CF_BASENAME}.2;
       else 
         cp ${TMPDIR}/${CF_BASENAME} ${TMPDIR}/${CF_BASENAME}.2;
       fi

       # Set munged file to same timestamp as downloaded file...
       touch -r ${TMPDIR}/${CF_BASENAME} ${TMPDIR}/${CF_BASENAME}.2;
       [ -f ${SA_DIR}/${CF_FILE} ] && cmp -s ${TMPDIR}/${CF_BASENAME}.2 ${SA_DIR}/${CF_FILE} || {
         [ "${DEBUG}" ] && echo "Old version ${SA_DIR}/${CF_FILE} differs from new version ${TMPDIR}/${CF_BASENAME}.2" ;

         [ "${DEBUG}" ] && [ -f ${SA_DIR}/${CF_FILE} ] && echo "Backing up old version...";
         [ -f ${SA_DIR}/${CF_FILE} ] && mv -f ${SA_DIR}/${CF_FILE} ${TMPDIR}/${CF_FILE}.${DATE};

	 # Save the command that can be used to undo this change, if rules won't --lint
	 [ -f ${TMPDIR}/${CF_FILE}.${DATE} ] && UNDO_COMMAND="${UNDO_COMMAND} mv -f ${TMPDIR}/${CF_FILE}.${DATE} ${SA_DIR}/${CF_FILE};";
	 [ ! -f ${TMPDIR}/${CF_FILE}.${DATE} ] && UNDO_COMMAND="${UNDO_COMMAND} rm -f ${SA_DIR}/${CF_FILE};";

         [ "${DEBUG}" ] && [ -f ${TMPDIR}/${CF_BASENAME}.2 ] && echo "Installing new version...";
         [ -f ${TMPDIR}/${CF_BASENAME}.2 ] && mv -f ${TMPDIR}/${CF_BASENAME}.2 ${SA_DIR}/${CF_FILE};

         NEWVER=`sh -c "cat ${SA_DIR}/${CF_FILE} | ${PARSE_NEW_VER_SCRIPT}"`;

         [ "${DEBUG}" ] && echo "${CF_NAME} has changed on `hostname`.  The new ${CF_NAME} is ${NEWVER}";
         echo -e "${CF_NAME} has changed on `hostname`.  The new ${CF_NAME} is ${NEWVER}" \
            | mail -s "RulesDuJour/`hostname`: ${CF_NAME} RuleSet has been updated" ${MAIL_ADDRESS}

         RESTART_REQUIRED="true";
       }
       [ -f ${TMPDIR}/${CF_BASENAME}.2 ] && rm -f ${TMPDIR}/${CF_BASENAME}.2;
    fi
done

[ "${DEBUG}" ] && echo "" && echo "";
[ "${RULES_THAT_404ED}" ] && echo -e "The following rules had 404 errors:${RULES_THAT_404ED}" | mail -s "RulesDuJour/`hostname`: 404 errors" ${MAIL_ADDRESS};
[ "${DEBUG}" ] && [ "${RULES_THAT_404ED}" ] && echo -e "The following rules had 404 errors:${RULES_THAT_404ED}" && echo "";

[ "${RESTART_REQUIRED}" ] && {
    sleep 1
    [ "${DEBUG}" ] && echo "Attempting to --lint the rules.";
    spamassassin --lint > /dev/null 2>&1 ;
    LINTFAILED=$?;

    # Unset LINTFAILED if lint didn't fail.
    [ "${LINTFAILED}" = "0" ] && LINTFAILED=;

    [ "${DEBUG}" ] && [ "${LINTFAILED}" ] && echo "WARNING: spamassassin --lint failed." && echo "Rolling configuration files back, not restarting SpamAssassin." && echo "Rollback command is: ${UNDO_COMMAND}";
    [ "${LINTFAILED}" ] && RESTART_REQUIRED= && sh -c "${UNDO_COMMAND}";
    [ "${LINTFAILED}" ] && echo "spamassassin --lint failed. Rolling configuration files back, not restarting SpamAssassin.  Rollback command was: ${UNDO_COMMAND}" | mail -s "RulesDuJour/`hostname`: lint failed. Updates rolled back." ${MAIL_ADDRESS};
    
    [ "${DEBUG}" ] && [ "${RESTART_REQUIRED}" ] && echo "Restarting SpamAssassin using: ${SA_RESTART}";
    [ "${RESTART_REQUIRED}" ] && ${SA_RESTART} > /dev/null 2>&1
}
[ "${DEBUG}" ] && [ ! "${RESTART_REQUIRED}" ] && echo "No files updated; No restart required.";

    [ "${DEBUG}" ] && echo "Changing back to old working directory: ${OLDDIR}";
cd ${OLDDIR};


More information about the MailScanner mailing list