Overview of This Lesson

This lesson introduces you to basic regular expressions and how you can use them for data validation in Java. You might find the following links useful:

In this lesson we'll cover the following:

What is RegEx?

Regular expressions are special expressions that perform pattern matching. In other words, you use a special syntax of codes and characters to search for patterns in a string. For example, you could search a string to see if it contains a valid floating point value or to see if it contained a proper Canadian postal code.

Here is an example of a regular expression:

/\d{3}/g

The two forward slashes act as a container for the main part of the expression. The stuff between the slashes is the pattern you want to search for. In this case, \d matches a single digit, and {3} means 3 of the previous thing, so this expression looks for exactly 3 digits. After the second forward slash is where you'll see special flags that affect how the expression should work. In this case, you'll see a g, which stands for "global match". It means that this expression should search for the entire string and look for multiple matches. Without the global flag, the expression would stop when it found the first match.

The best way to learn regular expressions is to try things, experiment with different patterns and strings, and explore what different patterns do with different strings. There are several regex testers you can use for free online. I recommend any of the following:

Regular Expressions Tutorial

Go through the regular expressions presentation tutorial below. Make sure you have RegExr (or other regex tester) open in another window so you can try the examples and exercises. You will need some good "dummy text" to practice on, so if you click the button below you can grab some dummy text to paste into the text area of the regex tester.

Copy the text below and paste it into the text area of the regex tester you're using:

abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
abc def ghi jkl mno pqrs tuv wxyz 
ABC DEF GHI JKL MNO PQRS TUV WXYZ 
0123456789 _+-.,!@#$%^&*();\/|<>"'
12345 -98.7 3.141 .6180 9,000 +42
555.123.4567 +1-(800)-555-2468
foo@demo.net	bar.ba@text.co.uk
foo@ @foo foo@foo
www.demo.com	http://foo.co.uk/
http://regexr.com/foo.html?q=bar
http://regexr.com/foo.html?x=bar&y=fred+flintstone
https://regexr.com/foo.html
https://regexr.com/foo.html
https://www.regexr.com/foo.html
http://www.foo.co.uk/
Jaunary 20th 2020 2:45pm  25/06/69  25:06:69:15:15:00
20.01.2020 20/01/2020 20-02-2020 
20.01.20 20/01/20 20-02-20
2,238,388.4880 0001 -16.2349238 1.4e-3 1.5E4
! "§ $%& /() =?* '<> #|; ²³~ @`´ ©«» ¤¼× {} 
<html>
<head>
<meta charset="utf-8">
<title>Foo</title>
</head>
<body></body>
</html>

1430 Trafalgar Road
Oakville, ON
L6H 2L1  L6H2L1  n0B2K0 w2n 5k9
l6h2l1 z1z 2z3 v1b2x3 h0h 0h0 h0h0h0
l6h1f2 L6H 1i0 r2d2s8

------FOR EXERCISES IN TUTORIAL-------
Slide 54:
There was a cat. The cat loved to sleep on the desk. Why 
do cats like desks? Because they're cats!  Cats are fun, but cats 
are also divas.

Slide 55:
a)
cat cats cots concatenate tic tac
ct Cat cart mat ca t
b)
Monday romance 987m8ny 2m1n3 Mon:
m:n 2m2m2 meany moon ham n cheese
---------------------------------------
Cat ipsum dolor sit amet, 2 o'clock am howl on top of tall thing eat all the 
power cords? ignore the human until she needs to get up, then climb on her 
lap and sprawl.  Stare at owner accusingly then wink chase imaginary bugs, 
yet ooh, are those your $250 dollar sandals? lemme use that as my litter box
for jump launch to pounce upon little yarn mouse, bare fangs at toy run hide 
until treats are fed stare at imaginary bug. do not try to mix old food with 
new one to fool me! run as fast as i can into another room for no reason. 
The door is opening! how exciting oh, it's you, meh chew master's slippers 
cats are the world so murr i hate humans they are so annoying. Scream for 
no reason at 4 am mew? Cat is love, cat is life mark territory, so purrr purr
littel cat, little cat purr purr. Give me attention or face the wrath of my 
claws cat fur is the new black , so sit in window and stare oooh, a bird, yum.
Catasstrophe meow to be let out ask to be pet then attack owners hand and it's
3am, time to create some chaos . Find a way to fit in tiny box claw at 
curtains stretch and yawn nibble on tuna ignore human bite human hand for 
behind the couch if it fits, i sits for pet me pet me don't pet me, yet knock
over christmas tree yet run off table persian cat jump eat fish. Human is
washing you why halp oh the horror flee scratch hiss bite poop on the floor,
break a planter, sprint, eat own hair, vomit hair, hiss, chirp at birds, eat 
a squirrel, hide from fireworks, lick toe beans, attack christmas tree. 

(If the embedded presentation doesn't work, you can view it at Regular Expressions Tutorial)

Solutions to Exercises

Slide 53 Solutions

  1. c[aeiou]
  2. \w+\?
  3. [A-Z]\w+
  4. \(\d{3}\)-\d{3}-\d{4}
  5. https?:\/\/(www)?(\w+[\.\/\?=&+]?)+ (there are more advanced versions of this that are better)
  6. <\w+>
  7. \d{1,2}\/\d{1,2}\/\d{4}
  8. [abceghjklmnprstvxy]\d[abceghjklmnprstvwxyz] ?\d[abceghjklmnprstvwxyz]\d (with case insensitive flag turned on)

Slide 54

\w+[\.\?!]

Explanation: one or more word-characters followed by either a period, question mark, or exclamation point.

Slide 55

A:

/c[aeiou ]t/g

Explanation: A "c" followed by any vowel or a space, followed by a "t".

B:

/m\wn/gi

Explanation: A letter "m" followed by any word character, followed by a letter "n", with the case-insensitive flag on.

RegEx in Java Programs

So now that you know the basics of RegEx, how do you use RegEx in Java? There are two techniques:

  1. The quick-and-dirty way that's really easy to learn and use, using the String class's matches() method.
  2. The slightly-more-work but way more modular and re-usable way, using the Pattern and Matcher classes.

Technique #1 is the easiest to get started and will do just fine when you want to do a quick one-time pattern match in an application. For example, in a short program where you ask the user for a quantity and price, and you just want to make sure the quantity is a valid integer and the price is a valid floating point.

Use technique #2 if you're doing several matches of the same pattern over and over. For example, if you're asking the user for a list of several inventory items quantities and prices, and you need to validate every single quantity and price for every single inventory item. Here you'd be using the same to patterns to make sure you have a valid integer and a valid floating point number several times, so it's more efficient to use technique #2.

Technique #1: String.matches()

Using the String class's matches() method is a super easy way to do a pattern match with RegEx. The method accepts a regular expression as a String and returns true if the regular expression matches the String object's value or false if it doesn't match. For example:

String studentId = scanner.next();
if (input.matches("\\d{9}")) {
    // valid student ID
} else {
    System.out.println("Invalid Student ID!");
}

Notes:

Use String.matches() as a a "one-off": if you're going to match a pattern more than once, use the Matcher class with a Pattern object instead (which is covered in technique #2). And speaking of that, String.matches() is equivalent to Pattern.matches(regex, string), which is also covered in technique #2, so file that information away for later.

Validating Numeric Data

Now, think back to previous programs you've written that ask for numeric inputs. For example, in one program you asked the user for a package weight in kilograms as a floating-point number. How can you use regex to validate this with String.matches()? The matches() method works only on String objects, so we wouldn't be able to do something like:

double weight = in.nextDouble();
// NOPE!!
if (weight.matches("your regex")) {
   // stuff to do if valid
}

That code isn't valid because weight is a double primitive, it's not a String object. But that doesn't even matter, because the problem is that when you invoke in.nextDouble() and the user types something that can't be parsed or converted into a double value, the nextDouble() method throws an InputMismatchException, which crashes the program. For example, if the user entered "six", the program would crash because the input "six" can't be converted to a double value.

The same thing happens if you use nextInt() and type a value that doesn't contain only digits. For example, entering an input such as "two" or "1.5" will also cause an InputMismatchException and crash the program, because neither of those values can be parsed into an int value.

So how do we use regex to solve the problem? What we'd like to do is to retrieve the input as a String to start with: afterall, a String can contain anything - any combination of letters, digits, and other symbols and characters. Then we can use regex to check and see if the user-entered string matches a pattern for all digits, or digits with only a single decimal point. For example, something like this:

String strWeight = in.next();
if (strWeight.matches("your regex")) {
   // now we know strWeight is a valid floating point number
   double weight = Double.parseDouble(strWeight);
   // do rest of things...
}

Notes:

Double (with an upper-case D) is a wrapper class. You'll actually learn about wrapper classes in detail in term 2 Java, but for now it's enough to know that every primitive type has a wrapper class: each wrapper is the same name as the primitive type, but with an upper-case letter (except int, whose wrapper is Integer, and char, whose wrapper is Character). The numeric wrappers have parse-methods that you can use to convert a String object into a primitive numeric type. The parse methods will throw a NumberFormatException if you give them something that's not valid: actually, exactly like the scanner methods with InputMismatchException. So you want to make sure you do a proper check for valid values before you give a String to any of the parse-methods.

Technique #2: Pattern and Matcher Classes

This technique is a bit more complicated (but not really hard to learn and use, just more coding) but it's also more modular and contains more re-usable code. It's also more efficient if you're performing the same validation more than once.

This technique involves 2 classes:

  1. java.util.regex.Pattern - models a regex pattern
  2. java.util.regex.Matcher - an engine that performs pattern matching

The Pattern class models a regular expression as an object. In other words, you just construct your Pattern object using a specific regex pattern as a String, and the Pattern will use that regular expression to match with.

To create a Pattern object for a specific regular expression, you use the Pattern class's compile() method:

Pattern validIdRegex = Pattern.compile("\\d{9}");

This statement constructs a Pattern object referenced by the variable "validIdRegex", and it contains the pattern "\\d{9}". In fact, it even compiles your regex to make sure it's valid (otherwise it will throw a PatternSyntaxException if the syntax of your regex pattern is incorrect.

You can also pass flags into the compile() method. The flags are constants that belong to the Pattern class. For example:

// string we're validating has multiple lines of text
Pattern validIdRegex = Pattern.compile("\\d{9}", Pattern.MULTILINE);

// we want to do a case-insensitive match
Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.CASE_INSENSITIVE);

// treat special characters like literals and also do a case-insensitive match
Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.LITERAL | Pattern.CASE_INSENSITIVE);

Once you have your pattern, you then use the Matcher class to do the matching. The Matcher class is created from the Pattern class, so it's created with that specific pattern in mind. To create a Matcher, we use the Pattern object's matcher() method and give it the string we want to match to the pattern:

String id = in.next();  // gets an ID input from user

// create a matcher that can check our valid ID regex with the user-entered id value
Matcher validateId = validIdRegex.matcher(id);

The code above constructs a Matcher object using our validIdRegex pattern that we created previously, and it's going to use the id input (which contains whatever the user typed) to match against. So we are using this validateId matcher to validate our user-entered id, and see if it matches the validIdRegex pattern or not.

Note that this statement only creates the Matcher, it hasn't actually performed any matching, yet!

There are 3 different types of matches that the Matcher can perform:

  1. Using Matcher's matches() method
    • matches() tries to match entire string to the Pattern, starting from the beginning
    • the entire string must match, as if the expression was enclosed inside ^$ boundary markers
    • Example:
      boolean isValid = validateId.matches();
      will store true in isValid if id is 123456789 but not if it's abc123456789def
  2. Using Matcher's lookingAt() method
    • tries to match the string from the beginning, but the entire string does not have to match
    • Example:
      boolean isValid = validateId.lookingAt();
      will store true in isValid if id is 123456789 or 123456789def but not if it's abc123456789def
  3. Using Matcher's find() method
    • tries to find a pattern match inside the string, but doesn't have to match the entire string
    • Example:
      boolean isValid = validateId.find();
      will store true in isValid if id is 123456789 or 123456789def or abc123456789def or 123456789_987654321
    • If this is the second/subsequent call on the same string after a previous successful call, it keeps searching, for example if id contains the value "abc123456789def987654321ghi":
      boolean isValid = validateId.find();
      will store true in isValid, and executing
      isValid = validateId.find();
      immediately after will store true in isValid, because it matches the second set of 9 digits (987654321)

There's some nice flexibility there, but most of the time you'll probably just use the matches() method.

When you're using Pattern and Matcher for a set of inputs, you will likely want to re-use the same pattern and matcher objects. That's fine, but to perform another match on more than one input value, you need to reset the Matcher to use the new input:

String value = in.next();
Pattern patt = Pattern.compile("\\d+");
Matcher matcher = patt.matcher(value);
...
value = in.next(); // get a new value
matcher.reset(value); // use new value on the matcher

Summary

In summary, to validate some input using Pattern and Matcher, perform 3 steps:

  1. Create a Pattern object for a specific regex.
    Pattern p = Pattern.compile("\\d{9}");
  2. Create a Matcher object for your Pattern and a specific input string
    Matcher m = p.matcher(studentId);
  3. Execute one of the matcher methods to validate your input string
    boolean isValid = m.matches();
  4. If you want to use the same pattern on another input, rest the matcher
    m.reset(newStudentId);

Practice Exercises

  1. Come up with a regular expression that checks for valid numeric values as follows:
    • integer value: one or more digits and nothing else
    • floating-point value: zero or more digits, which may or may not be followed by a decimal point and one or more digits. You might want to play with this one in a regex tester:
      • should match values 1 1.1 1.12 .1 .12 0.0 1.0
      • would not match 1. 1.a a.1 ab.c . 1.12.12 1.1.1 (or anything with non-digits)
  2. Create a program that asks the user for an integer and a floating point value. Display success/fail messages for each e.g. "Number is a valid integer." or "Invalid floating point value."
  3. Write a program that validates a series of user names: Your program should repeatedly request a user name as a String. For each user name entered, display "valid" if the user name is between 8 and 12 characters and contains only upper- or lower-case letters, otherwise display "Invalid".
  4. A course has 3 exams: Exam 1 is worth 15% of the final grade, Exam 2 is worth 25% of the final grade, and Exam 3 is worth 35% of the final grade. All three exams are each out of 50 marks. Write a program that requests the grades out of 50 for each exam and then calculates the percentage weighting earned and the total weighting earned. All exam grades entered must be valid floating-point values. Sample run #1:
    Enter grade for Exam #1: two
    Enter grade for Exam #2: 22
    Enter grade for Exam #3: 19.2.2
    Error: All exam grades must be valid numbers.

    Sample run #2:

    Enter grade for Exam #1: 23.5
    Enter grade for Exam #2: 32
    Enter grade for Exam #3: 39.5
    Exam Average: 67.60

    You could easily make several different versions of this program: The validation could be done individually for each exam grade, for example. If you're familiar with loops and/or arrays, you could even create a list for the three grades (a regular array would be more efficient than a collection object, for this program) and use iteration to populate and process it. Feel free to explore different ways of writing this program, just make sure your program is intuitve and user-friendly.

  5. Write a program that asks for donations for three local businesses that need help during pandemic lockdowns. Ask for a donation to the pub, a donation to the bakery, and a donation to the pizza shop. Determine if each donation amount is valid. If they are, display the total donation amount. If any of the three amounts are not valid, display an error message such as "Donation amount must be a valid decimal value." or "Donation amount must be greater than 0."