Compound Character Classes

Character classes are a versatile tool when combined with various pieces of the regex syntax. Compound character classes can help clarify and define sophisticated searches, test for certain conditions in a program, and filter wanted e-mail from spam. This section uses compound character classes to build meaningful expressions with the regex syntax.

Using compound character classes with the regex syntax.

Example 1: Find a partial e-mail address. Use a character class to denote a match for any number between 0 and 9. Use a range to restrict the number of times a digit matches.

  • Regex:
    smith[0-9]{2}@
  • Matches:
    smith44@
    smith42@
  • Doesn't Match:
    Smith34
    smith6
    Smith0a

Example 2: Search an HTML file to find each instance of a header tag. Allow matches on whitespace after the tag but before the ">".

  • Regex:
    (<[Hh][1-6] *>)
  • Matches:
    <H1>
    <h6>
    <H3  >
    <h2    >
  • Doesn't Match:
    <H1
    <   h2>
    <a1>

Example 3: Match a regular 7-digit phone number. Prevent the digit "0" from leading the string.

  • Regex:
    ([1-9][0-9]{2}-[0-9]{4})
  • Matches:
    555-5555
    123-4567
  • Doesn't Match:
    555.5555
    1234-567
    023-1234

Example 4: Match a valid web-based protocol. Escape the two front slashes.

  • Regex:
    [a-z]+:\/\/
  • Matches:
    http://
    ftp://
    tcl://
    https://
  • Doesn't Match:
    http
    http:
    1a3://

Example 5: Match a valid e-mail address.

  • Regex:
    [a-z0-9_-]+(\.[a-z0-9_-]+)*@[a-z0-9_-]+(\.[a-z0-9_-]+)+
  • Matches:
    j_smith@foo.com
    j.smith@bc.canada.ca
    smith99@foo.co.uk
    1234@mydomain.net
  • Doesn't Match:
    @foo.com
    .smith@foo.net
    smith.@foo.org
    www.myemail.com
    Note
    This regular expression will actually also match the smith@foo.net part of the .smith@foo.net example.