Regular expressions are special expressions that perform pattern matching. In other words, you use a special syntax of codes and characters to search for patterns in a string. For example, you could search a string to see if it contains a valid floating point value or to see if it contained a proper Canadian postal code.
Here is an example of a regular expression:
/\d{3}/g
The two forward slashes act as a container for the main part of the expression. The stuff between the slashes is the pattern you want to search for. In this case, \d matches a single digit, and {3} means 3 of the previous thing, so this expression looks for exactly 3 digits. After the second forward slash is where you'll see special flags that affect how the expression should work. In this case, you'll see a g, which stands for "global match". It means that this expression should search for the entire string and look for multiple matches. Without the global flag, the expression would stop when it found the first match.
The best way to learn regular expressions is to try things, experiment with different patterns and strings, and explore what different patterns do with different strings. There are several regex testers you can use for free online. I recommend any of the following:
Go through the regular expressions presentation tutorial below. Make sure you have RegExr (or other regex tester) open in another window so you can try the examples and exercises. You will need some good "dummy text" to practice on, so if you click the button below you can grab some dummy text to paste into the text area of the regex tester.
Copy the text below and paste it into the text area of the regex tester you're using:
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ abc def ghi jkl mno pqrs tuv wxyz ABC DEF GHI JKL MNO PQRS TUV WXYZ 0123456789 _+-.,!@#$%^&*();\/|<>"' 12345 -98.7 3.141 .6180 9,000 +42 555.123.4567 +1-(800)-555-2468 foo@demo.net bar.ba@text.co.uk foo@ @foo foo@foo www.demo.com http://foo.co.uk/ http://regexr.com/foo.html?q=bar http://regexr.com/foo.html?x=bar&y=fred+flintstone https://regexr.com/foo.html https://regexr.com/foo.html https://www.regexr.com/foo.html http://www.foo.co.uk/ Jaunary 20th 2020 2:45pm 25/06/69 25:06:69:15:15:00 20.01.2020 20/01/2020 20-02-2020 20.01.20 20/01/20 20-02-20 2,238,388.4880 0001 -16.2349238 1.4e-3 1.5E4 ! "§ $%& /() =?* '<> #|; ²³~ @`´ ©«» ¤¼× {} <html> <head> <meta charset="utf-8"> <title>Foo</title> </head> <body></body> </html> 1430 Trafalgar Road Oakville, ON L6H 2L1 L6H2L1 n0B2K0 w2n 5k9 l6h2l1 z1z 2z3 v1b2x3 h0h 0h0 h0h0h0 l6h1f2 L6H 1i0 r2d2s8 ------FOR EXERCISES IN TUTORIAL------- Slide 54: There was a cat. The cat loved to sleep on the desk. Why do cats like desks? Because they're cats! Cats are fun, but cats are also divas. Slide 55: a) cat cats cots concatenate tic tac ct Cat cart mat ca t b) Monday romance 987m8ny 2m1n3 Mon: m:n 2m2m2 meany moon ham n cheese --------------------------------------- Cat ipsum dolor sit amet, 2 o'clock am howl on top of tall thing eat all the power cords? ignore the human until she needs to get up, then climb on her lap and sprawl. Stare at owner accusingly then wink chase imaginary bugs, yet ooh, are those your $250 dollar sandals? lemme use that as my litter box for jump launch to pounce upon little yarn mouse, bare fangs at toy run hide until treats are fed stare at imaginary bug. do not try to mix old food with new one to fool me! run as fast as i can into another room for no reason. The door is opening! how exciting oh, it's you, meh chew master's slippers cats are the world so murr i hate humans they are so annoying. Scream for no reason at 4 am mew? Cat is love, cat is life mark territory, so purrr purr littel cat, little cat purr purr. Give me attention or face the wrath of my claws cat fur is the new black , so sit in window and stare oooh, a bird, yum. Catasstrophe meow to be let out ask to be pet then attack owners hand and it's 3am, time to create some chaos . Find a way to fit in tiny box claw at curtains stretch and yawn nibble on tuna ignore human bite human hand for behind the couch if it fits, i sits for pet me pet me don't pet me, yet knock over christmas tree yet run off table persian cat jump eat fish. Human is washing you why halp oh the horror flee scratch hiss bite poop on the floor, break a planter, sprint, eat own hair, vomit hair, hiss, chirp at birds, eat a squirrel, hide from fireworks, lick toe beans, attack christmas tree.
(If the embedded presentation doesn't work, you can view it at Regular Expressions Tutorial)
c[aeiou]
\w+\?
[A-Z]\w+
\(\d{3}\)-\d{3}-\d{4}
https?:\/\/(www)?(\w+[\.\/\?=&+]?)+
(there are more advanced versions of this that are better)<\w+>
\d{1,2}\/\d{1,2}\/\d{4}
[abceghjklmnprstvxy]\d[abceghjklmnprstvwxyz] ?\d[abceghjklmnprstvwxyz]\d
(with case insensitive flag turned on)\w+[\.\?!]
Explanation: one or more word-characters followed by either a period, question mark, or exclamation point.
A:
/c[aeiou ]t/g
Explanation: A "c" followed by any vowel or a space, followed by a "t".
B:
/m\wn/gi
Explanation: A letter "m" followed by any word character, followed by a letter "n", with the case-insensitive flag on.
So now that you know the basics of RegEx, how do you use RegEx in Java? There are two techniques:
Technique #1 is the easiest to get started and will do just fine when you want to do a quick one-time pattern match in an application. For example, in a short program where you ask the user for a quantity and price, and you just want to make sure the quantity is a valid integer and the price is a valid floating point.
Use technique #2 if you're doing several matches of the same pattern over and over. For example, if you're asking the user for a list of several inventory items quantities and prices, and you need to validate every single quantity and price for every single inventory item. Here you'd be using the same to patterns to make sure you have a valid integer and a valid floating point number several times, so it's more efficient to use technique #2.
Using the String class's matches() method is a super easy way to do a pattern match with RegEx. The method accepts a regular expression as a String and returns true if the regular expression matches the String object's value or false if it doesn't match. For example:
String studentId = scanner.next(); if (input.matches("\\d{9}")) { // valid student ID } else { System.out.println("Invalid Student ID!"); }
Notes:
input
variable matches the pattern "\\d{9}" (exactly 9 digits).Use String.matches() as a a "one-off": if you're going to match a pattern more than once, use the Matcher class with a Pattern object instead (which is covered in technique #2). And speaking of that, String.matches() is equivalent to Pattern.matches(regex, string), which is also covered in technique #2, so file that information away for later.
Now, think back to previous programs you've written that ask for numeric inputs. For example, in one program you asked the user for a package weight in kilograms as a floating-point number. How can you use regex to validate this with String.matches()? The matches() method works only on String objects, so we wouldn't be able to do something like:
double weight = in.nextDouble(); // NOPE!! if (weight.matches("your regex")) { // stuff to do if valid }
That code isn't valid because weight is a double primitive, it's not a String object. But that doesn't even matter, because the problem is that when you invoke in.nextDouble() and the user types something that can't be parsed or converted into a double value, the nextDouble() method throws an InputMismatchException, which crashes the program. For example, if the user entered "six", the program would crash because the input "six" can't be converted to a double value.
The same thing happens if you use nextInt() and type a value that doesn't contain only digits. For example, entering an input such as "two" or "1.5" will also cause an InputMismatchException and crash the program, because neither of those values can be parsed into an int value.
So how do we use regex to solve the problem? What we'd like to do is to retrieve the input as a String to start with: afterall, a String can contain anything - any combination of letters, digits, and other symbols and characters. Then we can use regex to check and see if the user-entered string matches a pattern for all digits, or digits with only a single decimal point. For example, something like this:
String strWeight = in.next(); if (strWeight.matches("your regex")) { // now we know strWeight is a valid floating point number double weight = Double.parseDouble(strWeight); // do rest of things... }
Notes:
Double (with an upper-case D) is a wrapper class. You'll actually learn about wrapper classes in detail in term 2 Java, but for now it's enough to know that every primitive type has a wrapper class: each wrapper is the same name as the primitive type, but with an upper-case letter (except int, whose wrapper is Integer, and char, whose wrapper is Character). The numeric wrappers have parse-methods that you can use to convert a String object into a primitive numeric type. The parse methods will throw a NumberFormatException if you give them something that's not valid: actually, exactly like the scanner methods with InputMismatchException. So you want to make sure you do a proper check for valid values before you give a String to any of the parse-methods.
Come up with a regular expression that checks for valid numeric values as follows:
Create a program that asks the user for an integer and a floating point value. Display success/fail messages for each e.g. "Number is a valid integer." or "Invalid floating point value."
This technique is a bit more complicated (but not really hard to learn and use, just more coding) but it's also more modular and contains more re-usable code. It's also more efficient if you're performing the same validation more than once.
This technique involves 2 classes:
The Pattern class models a regular expression as an object. In other words, you just construct your Pattern object using a specific regex pattern as a String, and the Pattern will use that regular expression to match with.
To create a Pattern object for a specific regular expression, you use the Pattern class's compile() method:
Pattern validIdRegex = Pattern.compile("\\d{9}");
This statement constructs a Pattern object referenced by the variable "validIdRegex", and it contains the pattern "\\d{9}". In fact, it even compiles your regex to make sure it's valid (otherwise it will throw a PatternSyntaxException if the syntax of your regex pattern is incorrect.
You can also pass flags into the compile() method. The flags are constants that belong to the Pattern class. For example:
// string we're validating has multiple lines of text Pattern validIdRegex = Pattern.compile("\\d{9}", Pattern.MULTILINE); // we want to do a case-insensitive match Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.CASE_INSENSITIVE); // treat special characters like literals and also do a case-insensitive match Pattern validUserRegex = Pattern.compile("\\w{8,12}", Pattern.LITERAL | Pattern.CASE_INSENSITIVE);
Once you have your pattern, you then use the Matcher class to do the matching. The Matcher class is created from the Pattern class, so it's created with that specific pattern in mind. To create a Matcher, we use the Pattern object's matcher() method and give it the string we want to match to the pattern:
String id = in.next(); // gets an ID input from user // create a matcher that can check our valid ID regex with the user-entered id value Matcher validateId = validIdRegex.matcher(id);
The code above constructs a Matcher object using our validIdRegex pattern that we created previously, and it's going to use the id input (which contains whatever the user typed) to match against. So we are using this validateId matcher to validate our user-entered id, and see if it matches the validIdRegex pattern or not.
Note that this statement only creates the Matcher, it hasn't actually performed any matching, yet!
There are 3 different types of matches that the Matcher can perform:
boolean isValid = validateId.matches();will store true in isValid if id is 123456789 but not if it's abc123456789def
boolean isValid = validateId.lookingAt();will store true in isValid if id is 123456789 or 123456789def but not if it's abc123456789def
boolean isValid = validateId.find();will store true in isValid if id is 123456789 or 123456789def or abc123456789def or 123456789_987654321
boolean isValid = validateId.find();will store true in isValid, and executing
isValid = validateId.find();immediately after will store true in isValid, because it matches the second set of 9 digits (987654321)
There's some nice flexibility there, but most of the time you'll probably just use the matches() method.
When you're using Pattern and Matcher for a set of inputs, you will likely want to re-use the same pattern and matcher objects. That's fine, but to perform another match on more than one input value, you need to reset the Matcher to use the new input:
String value = in.next(); Pattern patt = Pattern.compile("\\d+"); Matcher matcher = patt.matcher(value); ... value = in.next(); // get a new value matcher.reset(value); // use new value on the matcher
In summary, to validate some input using Pattern and Matcher, perform 3 steps:
Pattern p = Pattern.compile("\\d{9}");
Matcher m = p.matcher(studentId);
boolean isValid = m.matches();
m.reset(newStudentId);
1. Write a program that asks the user to enter an inventory item ID, a quantity of that item, and a unit price for that item. The inventory item ID must be in the form AB-12345 Where AB are any 2 alphabetical letters in upper- or lower-case and 12345 are any 3, 4, or 5 digits. The quantity must be a valid integer value greater than 0 and the price must be a valid floating point value greater than 0. Display error messages if any of the inputs are invalid. If everything is valid, calculate and display the total as:
inventoryId: $x.xx
2. Write a program that asks the user for a postal code in the form A1A 1A1. The space is optional. The letters can be in upper- or lower-case. None of the alpha values can be D, F, I, O, Q or U. The first alpha also can't include W or Z.
The program should display a message indicating whether or not the postal code is valid.
3. Write a program that asks the user for a starting date and then an ending date as strings in the form dd/mm/yyyy or dd/mm/yy. The date numbers and month numbers can be 1 or 2 digits, and the years must be 4 digits or 2 digits. Display a message if either of the two dates are in the correct format or not.
4. Write a program that asks the user to choose a login name for your web site. Their login name must be only letters and digits, but the first character can't be a digit. The number of total characters must be between 8 and 12. If the user chooses an invalid login, display an appropriate message.