Regular Expression Syntax in GWAVA

  • 7019785
  • 14-Apr-2010
  • 11-Sep-2017

Environment

GWAVA

Situation

Problem/Question: How do I leverage regular expression when creating text filters?

Resolution

This article is designed to show you how to use regular expression (regex) and show how regex has been implemented in GWAVA.
GWAVA support will not write these regex patterns (rules) for you. If you have an issue with a rule and you believe GWAVA is malfunctioning, support will assist you in testing your regex pattern.

Regex is designed to find specific text patterns quickly and account for variables that may occur in the desired text string. If you are searching for a simple word you do NOT need to use regular expression. ***Note: This feature is provided as a convenience to customers. GWAVA does not provide technical support for the use of regular expressions.

1) Regular expression basic syntax (supported)


The following regular expression special characters are implemented in GWAVA and will be recognized when entered in a field.
() Option group

Usage: (ItemA|ItemB|ItemC)

[] Character class

  • Usage: [ABCabc] [A-Za-z]

[^] Inverse character class

  • Usage: [^0-9A-Za-z]

Special Characters

  • ^ Anchor start of line
  • $ Anchor end of line
  • | Or
  • \ Escape
  • ? Optional
  • . Any character
  • + Count one or more
  • * Count zero of more

Macros

  • \s Whitespace
  • \S Non-whitespace
  • \b Word boundary
  • \d Numeric
  • \D Non-numeric
  • \w Text character
  • \W Non-text character
  • \A Anchor to start of file
  • \z Anchor to end of file
  • \r Carriage return
  • \n Line feed
  • \f Form feed
  • \t Tab
  • \x Hex number follows

Optional Switches

  • i Ignore case

--Note-- For those familiar with regex you'll notice the quantifier {} is not included, this is due to memory and performance constraints with GWAVA


2) Simple example blocking the word "viagra"


To enter a pattern in as regex it must be preceded by a '/' and end with a '/' followed by any options.

/<pattern>/<options>

The created regex pattern or rule can then be entered in any text filter field (subject, subject + body, body) or mime filter field (raw message, header). The difference between these filters is simply where it is looking in the message.

/viagra/ looks for just the word viagra, that is the word viagra in lower case. That's all it will find. If that is all you want to do then there is no point in using regular expression and you can just enter the word 'viagra' in as the filter.

Adding the i switch makes the rule case insensitive. '/viagra/i' will match any of these words - 'Viagra', 'vIaGrA', 'VIAGRA', 'viagra'. However, this still doesn't leverage any of the capabilities or show the purpose of regex. If you entered the word 'viagra' in without using regex we would already be checking for all of those words. To see an example of leveraging regex see the next example. It is important to distinguish when regex is actually necessary, which is why this example is given.


3) Blocking additional variations of the word "viagra"


Since almost every anti-spam product on the planet blocks the word viagra outright, spammers have started adding odd spaces or replacing the characters with ones that look similar.
Consider the following groups of viagra text.

V-i-a-g-r-a
V.i.a.g.r.a
v1@gra

The filter we created in section 2 would not block these words. Some text has been injected to fool the filter but the word is still readable. This is when regular expression comes in handy.
Modification of the rule from step 2 that will account for these variations.

/v[.-]?[i1][.-]?[a@][.-]?g[.-]?r[.-]?a/i

First notice how quickly the rule becomes complicated, this is why support is unable to write the expressions for you. One small mistake and you will get many false positives. Using regular expression should be done after careful consideration and testing.
This expression looks for something that starts with v and is followed by either a .or - ([]), but that . or - doesn't necessarily need to be there (?). Then either a 1 or i will follow, then possibly a . or - and so on and so forth.


4) Additional resources

It is not the intention of this article to teach you regex completely. Here are a few resources that can teach you more on how to use the regular expression syntax.
http://regexlib.com/CheatSheet.aspx  <--has basic syntax and easy to understand explanations
http://www.regular-expressions.info/tutorialcnt.html  <-- tutorial that shows more specifics about the special characters

After you have learned more you will definitely want to test your expressions and here are a few options (free).
http://www.radsoftware.com.au/regexdesigner/  <--RAD software regex designer
https://addons.mozilla.org/en-US/firefox/addon/2077  <-- firefox addon

There are hundreds of resources available online for regular expression. Feel free to browse around and find tools that are to your liking. Remember though that GWAVA only recognizes the items outlined in section 1.

Additional Information

This article was originally published in the GWAVA knowledgebase as article ID 1687.