Validating the Zip Code

Validating a zip code and a little bit about regular expressions.

By Bob Ray  |  December 27, 2022  |  7 min read
Validating the Zip Code

In the last article, we looked at how to use a Plugin to put users into a user group based on their zip codes (or whatever is in the 'zip' field). In this one, we’ll refine that Plugin a little to normalize and validate the zip code. The article assumes that you’re dealing with USA-style zip (postal) codes, but some of the principles would apply elsewhere.

The Problem

In the code of the previous article, we’re taking the value of the zip field for granted. We check to make sure it’s not blank, but do nothing further. In the USA, postal “zip” codes can be five or nine digits (nine-digit ones being in this form: 55113-2009. In the latter case the last four digits indicate a sub-area within the larger area specified by the first five digits.

What would happen if we just used the (non-empty) value of the zip field? For one thing, we’d have many more user groups—most likely more than we want. Worse yet, two users with the zip codes 55123 and 55123-3009 would be in different user groups even though they might live next door to each other.

To handle this, we’ll add a simple three-line function that checks to make sure the first five characters are all digits, ignores the rest, and always returns a five-digit code, or false in the case of invalid codes or an error. We’ll also have to add an extra line to our Plugin to call the function.

The Code

function getZip($value) {
    /* Remove leading and trailing spaces, tabs
       and carriage returns */
    $value = trim($value);

    /* Return the five digits, or false if there are
       not five leading digits */
    $returnVal = preg_match('/^\d{5}/', $value, $matches);
    return $returnVal? $matches[0] : false;
}

/* UserGroupFromZip plugin --
   attached to OnUserFormSave event */

/* Do nothing if it's not a new user */
if ($mode !== modSystemEvent::MODE_NEW) {
    return;
}

$profile = $user->getOne('Profile');

/* Make the user has a profile */
if ($profile) {
    $zip = $profile->get('zip');

    /* Call out function to get a valid zip or false
       if the zip field is invalid */
    $zip = getZip($zip);

    if (! empty($zip)) { /* valid */
        $groupName = 'Group' . $zip;
        /* See if the group already exists */
        $userGroup = $modx->getObject('modUserGroup',
            array('name' => $groupName));
        /* Create the group if necessary */
        if (! $userGroup) {
            $userGroup = $modx->newObject('modUserGroup');
            $userGroup->set('name', $groupName);
            $userGroup->save();
        }
        $user->joinGroup($groupName);
    }
}
return;

The Function

function getZip($value) {
    $value = trim($value);
    $returnVal = preg_match('/^\d{5}/', $value, $matches);
    return $returnVal? $matches[0] : false;
}

The first line of our function calls the PHP trim() function. This function just trims both ends of the string, removing spaces, tabs, carriage returns, and line feeds. There probably aren’t any, but we want to clean things up in case the user has accidentally put a space or two at the beginning or end of the zip code. The spaces would make the zip code invalid in our function.

In order to understand the rest of our function, we need to explain a bit about regular expressions and PHP's preg_match() function. A regular expression is usually referred to as a “egex” for short. We’ll follow that convention here.

The preg_match() function uses a regular expression (regex) to search a string. In this case, the regular expression itself is /^\d{5}/ (it’s a PHP string, so it’s enclosed in quotes in the code above).

The slash (/) at each end is called a delimiter, and it’s required. It tells the function where the actual regex begins and ends. The delimiter can be any character not used in the expression itself. Slashes are the most common, but if the expression might contain slashes (e.g., a URL or path), other symbols such as # or @ are often used. You must use the same delimiter at each end of the regex and both must be inside the quotation marks.

The tokens in a regex stand for characters or groups of characters to search for. Our regex is quite simple. The first character (^) stands for the beginning of a line. It’s optional, but in this case, we want to make sure that the first five characters are integers (the value a55123, for example, would be invalid, but would pass the regex test without the leading ^).

The next token is a /d. In a regex this stands for any single decimal digit. We could have used the character class [0-9] here, which will match any digit between 0 and 9, but /d is both shorter and unicode-compliant.

Next, we have {5}. This specifies that we want to match exactly five instances of the previous character.

So, to sum up, our regex will match only cases of the beginning of a line followed by exactly five decimal digits. Here are some cases that match and don’t match our regex:

Matches

13425
13425-2354
13425somegarbagehere
13425abc

Non-matches

1342
135-2354
a13425
1343a

What’s Up With the $matches Variable?

The preg_match() function is unusual in that it both returns a value and sets a reference variable. The first argument is the regex, the second is the string to be searched, and the (optional but almost always used) third argument is the reference variable—in this case $matches.

The function returns 1 if the value is found, 0 if it isn’t, and false if there’s an error. We need that return variable to see if the pattern was found, but we also want to get rid of anything beyond the first five digits. That’s where the $matches variable comes in. If there is a match, it gets set to an array containing all the matched substrings found. We’re only interested in the first one (and usually, there will be only one), which will always be $matches[0]. The ^ token matches the beginning of a line, but it will not contribute to the value in matches[0]. Similarly, if you wanted to only accept values that contained only five digits, we’d add the end of line token ($) to the end of our regex. Like the ^ token, it would not be included in the results.

So, if there’s a match, $matches[0] will contain the 5 digit code and our function will return a string of exactly five decimal digits. If there is no match or there’s an error, the if($returnVal) will fail and our function will return false.

We’ve also added this line to our Plugin code:

$zip = getZip($zip);

That line simply calls our function. If it has returned false, the if($zip) in the next line will fail and the group won’t be created. We can be sure that all of our group names will be the word 'Group' followed by exactly five digits.

Internationalization

If you are handling postal codes for some other country, the regex will have to be rewritten to deal with the form of that country’s postal codes. Unfortunately, there is no regex that will handle the codes of all countries. They are too different. One solution is an array of regex patterns, one for each country you need to deal with. You can also use the Google Places API for validation, but that’s well beyond the scope of this article.

More Complex Regex Operations

Regular expressions can be extremely complex and can involve patterns that capture various parts of a matched string. You can then use those parts to create your own string. For example, you could use a single regex to convert “Smith, John” to “John Smith” by capturing the first and last name separately. It’s trickier than you might think because names can contain hyphens and single quotes and there might or might not be a middle name or a title (e.g., Jr., III).

Here’s an example of a regular expression pattern used in MyComponent’s LexiconHelper class to pull the Lexicon topics out of a call to $modx->getLanguageTopics();

$pattern = '#function getLanguageTopics\(\)\s*\{\s*return\s*array\([\'\"]([^\"\']+)[\"\']\)#';

There is almost always more than one way to solve a problem with a regular expression. If you Google “Regex StackOverflow”, (without the quotes) you can see many extended arguments about the best way to solve a given regex problem. There’s a great example here.

Future articles may cover some more sophisticated regular expressions.


Bob Ray is the author of the MODX: The Official Guide and dozens of MODX Extras including QuickEmail, NewsPublisher, SiteCheck, GoRevo, Personalize, EZfaq, MyComponent and many more. His website is Bob’s Guides. It not only includes a plethora of MODX tutorials but there are some really great bread recipes there, as well.