Best Training Practices

  • 7020527
  • 29-Jul-2008
  • 07-Aug-2017


GWAVA 4 on NetWare and Linux (build 103 and later)


What do I use for ham and spam to train off of?


In order for GWAVA's Probability spam scanning engine to be efficient, you need to feed it ham (good mail) and spam (unwanted mail).  While we do recommend some manual gathering of ham/spam (see the GWAVA 4 Spam Guide for more information), a large percentage of the corpus can be gathered automatically by GWAVA.  To do this you can choose certain events in GWAVA to learn from.

Let's start with spam.  In your GWAVA 4 Configuration Manager, browse to Server/Scanner Management --> [your server] --> Manage Scanners --> [your scanner] --> Scanning Configuration --> Antispam --> Spam Autolearn.

If you click on 'Training sources for Spam', you'll see that we can choose any event in GWAVA to train from.  Some events, like the filters are also expandable, so that you can select specific filters to learn from, and not the whole set.  The main events that we recommend using are:  SURBL and RBL.  We have found that if you use our recommended SURBL and RBL services (see for more information:, you will get very few false positives.  SURBL is especially accurate and is the most reliable source for spam.  RBL is highly accurate as well, but may occasionally have false positives.  Depending on your experience with RBL in your work environment, you may or may not decide to use RBL for spam training.  Keep in mind that a few false positives are not going to have a drastic impact on your system.  Some other common events to use for training are filters (message, body, etc.) and address blocks.  You'll want to be very particular with the filters and blocks you create, if you decide to use these events.


You'll find the Non-Spam Autolearn page directly below the previous page in Server/Scanner Management --> [your server] --> Manage Scanners --> [your scanner] --> Scanning Configuration --> Antispam --> Non-Spam Autolearn.


If you click on 'Training sources for Ham', you'll see the same events listed as was in the Spam sources.  Because of GWAVA's new Conversation Tracking (more information below), it isn't necessary to select any events here to train for ham.  Some users will choose to use Outbound mail to train off of for ham.  While this can generate a lot of ham for your corpus, you would only want to use this option if you are comfortable with what your users are sending outbound.  If there is a lot of non-business e-mail traffic in your work environment, then this is not a good option.  Other events can be used here as well.  For instance, some users will setup "good" filters that they use for this purpose.  The filters aren't set to block the messages when triggered, but can be used here for training purposes.  Once again, be careful if you decide to do this, as your filters should be very specific, to avoid false positives.


The best way to automatically gather ham is using a new GWAVA 4 feature called Conversation Tracking.  See the following article for more information on how Conversation Tracking works and how to enable it:


See also these related articles:

Outbound scanning and Training (Important) -">

Conversation Tracking -

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 353.