Regular Expression Searches


What Are Regular Expressions?

Here is the situation, you are working on your homepage late at night when Netscape complains your HTML file is missing an ending tag! "Blasted piece of crap," you curse, "what does it mean I am missing a tag?" If you know a little something about regular expressions, you know that the expression listed below will produce the lines in the HTML file with zero, or one tag.

    
(<[^>]*>)?[^<]*

This is a nice expression, but remember this section is by no means a complete, or even half-hearted, description of regular expressions. Instead, it will introduce you to the concept of them as well as introduce the syntax of some commonly used expressions. For a complete text on regular expressions, see the reference given below.

Mastering Regular Expressions
Jeffrey E. F. Freidl
O'Reilly & Associates, Inc.
ISBN 1-56592-257-3

If you use regular expressions all the time, either through grep, perl, Tcl, or some other tool, you need this book. It will be a great addition to your library. The following book also has a complete chapter devoted to grep, which you might find useful.

Sed & Awk
Dale Dougherty
O'Reilly & Associates, Inc.
ISBN 0-937175-59-5
Chapter 3, Pgs. 27-55

Syntax

Before we get started with regular expressions, we must define the syntax. Listed below are some of the most common meta-characters which form regular expressions.

.      
Dot - Matches any single character except the newline character "\n"

*      
Star - Matches zero or more copies of the preceding expression.

+      
Plus - Match one or more copies of the preceding expression.

?      
Question - Match zero or one copies of the preceding expression.

[...]  
Character Class - Matches any character within the brackets

[^...] 
Negated Character Class - Match characters not in brackets.

^      
Caret - Matches the beginning of a line (first character).

$      
Dollar - Matches the end of a line (last character).

\{n,m\}
Reference - Indicates how many times the previous pattern is allowed to match.

\      
Escape - Used to escape meta-characters.

|      
Pipe - Matches either the expression to the left or right of the pipe.

"..."  
Quotes - Interprets everything literally.

()     
Parenthesis - Groups a series of regexps together in a new regexp.

Lets begin with the simplest example, lets match any single character (.), zero or more times (*). Notice the regular expression .* returns the entire example file back highlighted in yellow. Now put a space in front of the dot (.) and see what you get?

How would you find a blank line in a file? Try using the positional meta-characters ^$ on the example file. What do you find?

Now turn on the invert matching button as shown below.

Notice every line in the example file is returned, but not highlighted! Why are the lines not highlighted? Because none of the lines matched the pattern for a blank line, but inverting the matching returns all the lines which do not match the pattern, therefore no highlighting is done. Suppose you wanted to see if Rami or Gabi are members of the example file. How would you find out? You could do searches on each one individually, or you could use Extended Regular Expression in the following manner.

with the following options

Notice it returns the two rows in the example file with any form of Rami or Gabi in the line. Notice both lower and mixed case string matched.

The option Print Line Number In File does just what it says and really doesn't require an example. To find out how to customize your options, read the section tkREMthe Tk Regular Expression Maker. This tool gives help to people who use Grep, Perl, Tcl, Python, Lex & Yacc, and Sed & Awk. If you use one, or more of these, then read up on tkREM!