This lesson introduces you to basic regular expressions and
how you can use them for data validation in Java. You might
find the following links useful:
Load up the site RegExr
in a new tab so you can try the examples while you
go through the notes.
OverAPI
RegEx Cheat Sheet (OverAPI has
lots of useful reference sheets - have a look
around the site later).
Regular expressions are special expressions that perform pattern
matching. In other words, you use a special syntax of codes and characters
to search for patterns in a string. For example, you could search
a string to see if it contains a valid floating point value or to see
if it contained a proper Canadian postal code.
Here is an example of a regular expression:
/\d{3}/g
The two forward slashes act as a container for the main part of the
expression. The stuff between the slashes is the pattern you want to
search for. In this case, \d matches a single digit, and {3} means
3 of the previous thing, so this expression looks for exactly 3 digits.
After the second forward slash is where you'll see special flags that
affect how the expression should work. In this case, you'll see a g,
which stands for "global match". It means that this expression should
search for the entire string and look for multiple matches. Without
the global flag, the expression would stop when it found the first match.
The best way to learn regular expressions is to try things, experiment
with different patterns and strings, and explore what different patterns
do with different strings. There are several regex testers you can use
for free online. I recommend any of the following:
Go through the regular expressions presentation tutorial below. Make sure
you have RegExr (or other
regex tester) open in another window so you can try the examples and
exercises. You will need some good "dummy text" to practice on, so
if you click the button below you can grab some dummy text to paste
into the text area of the regex tester.
Copy the text below and paste it into the text area of the regex
tester you're using:
https?:\/\/(www)?(\w+[\.\/\?=&+]?)+
(there are more advanced versions of this that are better)
<\w+>
\d{1,2}\/\d{1,2}\/\d{4}
[abceghjklmnprstvxy]\d[abceghjklmnprstvwxyz] ?\d[abceghjklmnprstvwxyz]\d
(with case insensitive flag turned on)
Slide 54
\w+[\.\?!]
Explanation: one or more word-characters followed by either a
period, question mark, or exclamation point.
Slide 55
A:
/c[aeiou ]t/g
Explanation: A "c" followed by any vowel or a space, followed by
a "t".
B:
/m\wn/gi
Explanation: A letter "m" followed by any word character,
followed by a letter "n", with the case-insensitive flag on.
RegEx in Java Programs
So now that you know the basics of RegEx, how do you use
RegEx in Java? There are two techniques:
The quick-and-dirty way that's really easy to learn
and use, using the String class's matches() method.
The slightly-more-work but way more modular and
re-usable way, using the Pattern and Matcher classes.
Technique #1 is the easiest to get started and will do
just fine when you want to do a quick one-time pattern
match in an application. For example, in a short program
where you ask the user for a quantity and price, and you
just want to make sure the quantity is a valid integer
and the price is a valid floating point.
Use technique #2 if you're doing several matches of the
same pattern over and over. For example, if you're
asking the user for a list of several inventory items
quantities and prices, and you need to validate every
single quantity and price for every single inventory
item. Here you'd be using the same to patterns to
make sure you have a valid integer and a valid floating
point number several times, so it's more efficient
to use technique #2.
Technique #1: String.matches()
Using the String class's matches() method is a super easy
way to do a pattern match with RegEx.
The method accepts a regular expression as a String and returns true
if the regular expression matches the String object's value
or false if it doesn't match. For example:
String studentId = scanner.next();
if (input.matches("\\d{9}")) {
// valid student ID
} else {
System.out.println("Invalid Student ID!");
}
Notes:
The matches() method takes the regular expression as
a String input, and the string it's matching it against
is the value of the String object that's invoking
the matches() method. So in this case, we're checking
to see if the string value of the input
variable matches the pattern "\\d{9}" (exactly 9 digits).
input.matches("\\d{9}") returns a boolean true or
false, so there's no need for any other relational operators
(e.g. using == true would be inefficient)
Because the regular expression string contains the \d,
the \ backslash must be escaped by adding an extra
backslash character (which is why you see \\d). If you
don't use the extra backslash, Java then sees an escape
sequence \d (which means nothing to Java, so you'll get
an error).
String.matches() always checks the entire string
from the beginning, so you don't need to use boundaries
like ^ and $.
Use String.matches() as a a "one-off": if you're going to
match a pattern more than once, use the Matcher class with
a Pattern object instead (which is covered in technique #2).
And speaking of that, String.matches() is
equivalent to Pattern.matches(regex, string), which is
also covered in technique #2, so file that information
away for later.
Validating Numeric Data
Now, think back to previous programs you've written that ask
for numeric inputs. For example, in one program you asked
the user for a package weight in kilograms as a floating-point
number. How can you use regex to validate this with String.matches()?
The matches() method works only on String objects, so we
wouldn't be able to do something like:
double weight = in.nextDouble();
// NOPE!!
if (weight.matches("your regex")) {
// stuff to do if valid
}
That code isn't valid because weight is a double primitive,
it's not a String object.
But that doesn't even matter, because the problem is that
when you invoke in.nextDouble() and the user types something
that can't be parsed or converted into a double value,
the nextDouble() method throws an InputMismatchException,
which crashes the program. For example, if the
user entered "six", the program would crash because the
input "six" can't be converted to a double value.
The same thing happens if you use nextInt() and type a value
that doesn't contain only digits. For example, entering
an input such as "two" or "1.5" will also cause an
InputMismatchException and crash the program, because neither
of those values can be parsed into an int value.
So how do we use regex to solve the problem? What we'd like
to do is to retrieve the input as a String to start with: afterall,
a String can contain anything - any combination of letters, digits,
and other symbols and characters. Then we can use regex to
check and see if the user-entered string matches a pattern for
all digits, or digits with only a single decimal point.
For example, something like this:
String strWeight = in.next();
if (strWeight.matches("your regex")) {
// now we know strWeight is a valid floating point number
double weight = Double.parseDouble(strWeight);
// do rest of things...
}
Notes:
We use the next() method to get a String input, and
we're only interested in one "word". If you used nextLine(),
it would include spaces, which we don't want.
We invoke the matches() method on strWeight, because
strWeight is a String object.
Inside the if block, we parse the String version of weight
(strWeight) into a double using a parseDouble() method, which
belongs to a special class called "Double".
Double (with an upper-case D) is a wrapper class.
You'll actually learn about wrapper classes in detail in term 2
Java, but for now it's enough to know that every primitive
type has a wrapper class: each wrapper is the same name
as the primitive type, but with an upper-case letter (except
int, whose wrapper is Integer, and char, whose wrapper is
Character). The numeric wrappers have parse-methods that
you can use to convert a String object into a primitive
numeric type. The parse methods will throw a NumberFormatException
if you give them something that's not valid: actually, exactly
like the scanner methods with InputMismatchException. So
you want to make sure you do a proper check for valid values
before you give a String to any of the parse-methods.
Technique #2: Pattern and Matcher Classes
This technique is a bit more complicated (but not really hard
to learn and use, just more coding) but it's also more
modular and contains more re-usable code. It's also more
efficient if you're performing the same validation more
than once.
This technique involves 2 classes:
java.util.regex.Pattern - models a regex pattern
java.util.regex.Matcher - an engine that performs pattern matching
The Pattern class models a regular expression as an object.
In other words, you just construct your Pattern object using
a specific regex pattern as a String, and the Pattern will
use that regular expression to match with.
To create a Pattern object for a specific regular expression,
you use the Pattern class's compile() method:
Pattern validIdRegex = Pattern.compile("\\d{9}");
This statement constructs a Pattern object referenced by the
variable "validIdRegex", and it contains the pattern "\\d{9}".
In fact, it even compiles your regex to make sure it's
valid (otherwise it will throw a PatternSyntaxException
if the syntax of your regex pattern is incorrect.
You can also pass flags into the compile() method. The flags are
constants that belong to the Pattern class. For example:
// string we're validating has multiple lines of text
Pattern validIdRegex = Pattern.compile("\\d{9}", Pattern.MULTILINE);
// we want to do a case-insensitive match
Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.CASE_INSENSITIVE);
// treat special characters like literals and also do a case-insensitive match
Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.LITERAL | Pattern.CASE_INSENSITIVE);
Once you have your pattern, you then use the Matcher class
to do the matching. The Matcher class is created from the Pattern
class, so it's created with that specific pattern in mind.
To create a Matcher, we use the Pattern object's matcher()
method and give it the string we want to match to the pattern:
String id = in.next(); // gets an ID input from user
// create a matcher that can check our valid ID regex with the user-entered id value
Matcher validateId = validIdRegex.matcher(id);
The code above constructs a Matcher object using our validIdRegex pattern
that we created previously, and it's going to use the id input
(which contains whatever the user typed) to match against. So we
are using this validateId matcher to validate our user-entered id,
and see if it matches the validIdRegex pattern or not.
Note that this statement only creates the Matcher, it hasn't
actually performed any matching, yet!
There are 3 different types of matches that the Matcher can perform:
Using Matcher's matches() method
matches() tries to match entire string to the Pattern,
starting from the beginning
the entire string must match, as if the expression was enclosed inside ^$
boundary markers
Example:
boolean isValid = validateId.matches();
will store true in isValid if id is 123456789 but not
if it's abc123456789def
Using Matcher's lookingAt() method
tries to match the string from the beginning,
but the entire string does not have to match
Example:
boolean isValid = validateId.lookingAt();
will store true in isValid if id is 123456789 or 123456789def but
not if it's abc123456789def
Using Matcher's find() method
tries to find a pattern match inside the string, but doesn't
have to match the entire string
Example:
boolean isValid = validateId.find();
will store true in isValid if id is 123456789 or 123456789def or
abc123456789def or 123456789_987654321
If this is the
second/subsequent call on the same string after a previous successful call,
it keeps searching, for example if id contains the value "abc123456789def987654321ghi":
boolean isValid = validateId.find();
will store true in isValid, and executing
isValid = validateId.find();
immediately after will store true in isValid, because it matches the
second set of 9 digits (987654321)
There's some nice flexibility there, but most of the time you'll
probably just use the matches() method.
When you're using Pattern and Matcher for a set of inputs, you will
likely want to re-use the same pattern and matcher objects. That's fine,
but to perform another match on more than one input value, you need
to reset the Matcher to use the new input:
String value = in.next();
Pattern patt = Pattern.compile("\\d+");
Matcher matcher = patt.matcher(value);
...
value = in.next(); // get a new value
matcher.reset(value); // use new value on the matcher
Summary
In summary, to validate some input using Pattern and Matcher,
perform 3 steps:
Create a Pattern object for a specific regex.
Pattern p = Pattern.compile("\\d{9}");
Create a Matcher object for your Pattern and a
specific input string
Matcher m = p.matcher(studentId);
Execute one of the matcher methods to validate your
input string
boolean isValid = m.matches();
If you want to use the same pattern on another input,
rest the matcher
m.reset(newStudentId);
Practice Exercises
Come up with a regular expression that checks for valid
numeric values as follows:
integer value: one or more digits and nothing else
floating-point value: zero or more digits,
which may or may not be followed by a decimal
point and one or more digits. You
might want to play with this one in a regex tester:
should match values 1 1.1 1.12 .1 .12 0.0 1.0
would not match 1. 1.a a.1 ab.c . 1.12.12 1.1.1
(or anything with non-digits)
Create a program that asks the user for an integer and a
floating point value. Display success/fail messages for each
e.g. "Number is a valid integer." or "Invalid floating point value."
Write a program that validates a series of user names:
Your program should repeatedly request a user name as a String.
For each user name entered, display "valid" if the user name is between 8 and 12
characters and contains only upper- or lower-case letters, otherwise display
"Invalid".
A course has 3 exams: Exam 1 is worth 15% of the final grade, Exam 2 is worth
25% of the final grade, and Exam 3 is worth 35% of the final grade. All three
exams are each out of 50 marks. Write a program that requests the grades out of
50 for each exam and then calculates the percentage weighting earned and the
total weighting earned. All exam grades entered must be valid floating-point
values. Sample run #1:
Enter grade for Exam #1: two
Enter grade for Exam #2: 22
Enter grade for Exam #3: 19.2.2
Error: All exam grades must be valid numbers.
Sample run #2:
Enter grade for Exam #1: 23.5
Enter grade for Exam #2: 32
Enter grade for Exam #3: 39.5
Exam Average: 67.60
You could easily make several different versions of this program:
The validation could be done individually for each exam grade, for example.
If you're familiar with loops and/or arrays, you could even create
a list for the three grades (a regular array would be more efficient
than a collection object, for this program) and use iteration to
populate and process it. Feel free to explore different ways
of writing this program, just make sure your program is intuitve
and user-friendly.
Write a program that asks for donations for three local businesses that need help
during pandemic lockdowns.
Ask for a donation to the pub, a donation to the bakery, and a donation to the pizza shop.
Determine if each donation amount is valid. If they are, display the total donation amount.
If any of the three amounts are not valid, display an error message such as "Donation
amount must be a valid decimal value." or "Donation amount must be greater than 0."