Simple Data Validation

Things To Do Before Class

Things To Do After Class

Validating User Input

When you ask a user to input data, you should always validate that data and make sure it's correct. Never assume that the user will enter the data correctly, even if you give clear instructions (users don't often read instructions :P )

Copy and paste the following program into your editor:

import java.util.Scanner;

public class DataValidation {
    
    public static void main(String[] args) {

        Scanner in = new Scanner(System.in);
        
        System.out.print("Enter the number of items in inventory: ");
        int numItems = in.nextInt();
        System.out.print("Enter the cost per item: ");
        double itemCost = in.nextDouble();
        
        double totalCost = numItems * itemCost;
        
        System.out.printf("Total Cost: $%.2f%n", totalCost);
    }
}

What kinds of things could go wrong in this program? What kinds of user-entered values would crash the program?

If you're not sure, try entering the following:

Which sets of input values crash the program?

Which sets of input values produce output that doesn't make sense?

There are two kinds of problems here:

  1. The user could enter a value that isn't compatible with the data type being retrieved. e.g. entering non-digits or entering a decimal value for the number of items.
  2. The user could enter values that don't make sense, like 0 or negative values.

Solving problem #1 can be done by using regular expressions:

  1. Retrieve the user input as a String
  2. Use a regular expression to check that the String is a valid value (i.e. a valid integer or a valid floating-point)
  3. If the input is valid, go ahead and parse it into an integer or a floating-point.

The second problem (which assumes that the input data was successfully checked with regular expressions) can be solved with boolean expressions: if the input has a valid range (i.e. if a number is greater than 0 or a number is between 1 and 10, it's valid; otherwise, it isn't.

Both of these require the use of logical structures like selections and iteration.

Exercise

[Solutions]

1. Modify the above program: Create a regular expression that matches a valid integer value (at least one digit and no other characters/symbols). Then use an if statement to check and see if the number of items input matches the regex. If it does, ask for the price, otherwise display an informative error message.

When asking for the price, create a regular expression that matches a valid floating point value. Then use an if statement to check and see if the price input matches the regex. If it does, calculate and display the total cost, otherwise display an error message.

2. Modify the previous program: Add the code to check and see if the number of items is more than 0: if it isn't, display an informative error message, otherwise continue with the price input. Do the same thing for price: ensure it's greater than 0 before you calculate and display the total, and make sure you display an appropriate error message if the price is invalid.

Repeating Validation

What if we wanted the program to continue running, and just ask the user over and over again for the inputs until those inputs are entered correctly?

Enter a number: two
Error: you can only enter digits.
Enter a number: 0
Error: number must be between 1 and 100.
Enter a number: five
Error: you can only enter digits.
Enter a number: 9
Valid Number Entered

For example, let's write some code to ask the user for an integer number.

  1. Make sure the number matches an integer regex (at least 1 digit, nothing else).
  2. If it doesn't, display the error message "Error: you can only enter digits."
  3. But if the number does match the regex, parse it into an int value.
  4. After parsing, make sure the value is between 1 and 100, inclusive:
  5. if it isn't, display the message "Error: number must be between 1 and 100."
  6. If it is, print the message "Valid Number Entered"

So that could give us this code to validate number:

System.out.print("Enter a number: ");
String strNum = in.next(); 
if (!strNum.matches("\\d+")) { // note, you could use [+\\-]?\\d+ for negatives
    System.out.println("Error: you can only enter digits.");
} else {
    num = Integer.parseInt(strNum);
    if (num <= 0 || num > 100) {  // <= 0 a bit redundant since we can't have -sign
        System.out.println("Error: number must be between 1 and 100.");
    } else {
        System.out.println("Valid Number Entered");
    }
}

Now what we'd like to do is repeat this code until the user enters a valid value: both a valid integer and an integer between 1 and 100. Clearly if we want to repeat this code, we need a loop, but what kind of loop? And what would our loop condition be?

The second question is easier to answer: we want to perform these tasks while the user's input is invalid. But how can we word this as a boolean expression?

The problem is that we have two criteria: does the string contain a valid int, and does the int value fit the desired range? Unfortunately, we can't check the range of values until we parse the input into an int, and we can't parse the input into an int until we know the string contains a valid int value (i.e. matches the regex).

We can't use two different loops, either. Why? Look at this:

System.out.print("Enter a number: ");
String strNum = in.next(); 
while (!strNum.matches("\\d+")) {
    System.out.println("Error: you can only enter digits.\nEnter a number: ");
    num = in.next();
} 
num = Integer.parseInt(strNum);
while (num <= 0 || num > 100) {
    System.out.println("Error: number must be between 1 and 100.\nEnter a number: ");
    strNum = in.next();  // oops: this is a string, we need to update num
}

What if the user enters 999? The first loop on line 3 is not executed because !strNum.matches("\\d+") is false: the input 999 matches the regex pattern.

Then Integer.parseInt(strNum) on line 7 converts the input 999 into an int value and stores 999 in the num variable.

Now we have the second while loop on line 8: num <= 0 || num > 100 is true, so the body of the loop executes.

However, on line 11 in.next(), which grabs a new input from the user, stores the String variable strNum: we would now need to parse that into an integer so we can compare the new num at the top of our while loop, right?

System.out.print("Enter a number: ");
String strNum = in.next(); 
while (!strNum.matches("\\d+")) {
    System.out.println("Error: you can only enter digits.\nEnter a number: ");
    num = in.next();
} 
num = Integer.parseInt(strNum);
while (num <= 0 || num > 100) {
    System.out.println("Error: number must be between 1 and 100.\nEnter a number: ");
    strNum = in.next();  
    num = Integer.parseInt(strNum);  // what if strNum is invalid??
}

But wait: What if the user enters an invalid integer for the input on line 12? For example, what if they type "9.9" ? That will cause parseInt() to crash, which we don't want: we're trying to avoid this in the first place!

We have to check the new strNum value with our regex again, but how can we do that?

Should we copy the first while loop into the second one?

System.out.print("Enter a number: ");
String strNum = in.next(); 
    
while (!strNum.matches("\\d+")) {
    System.out.println("Error: you can only enter digits.\nEnter a number: ");
    num = in.next();
}                    
num = Integer.parseInt(strNum);
while (num <= 0 || num > 100) {
    System.out.println("Error: number must be between 1 and 100.\nEner a number: ");
    strNum = in.next();
                
    while (!strNum.matches("\\d+")) {
        System.out.println("Error: you can only enter digits.\nEnter a number: ");
        num = in.next();
    }  
    num = Integer.parseInt(strNum);
}

That's redundant: we don't want to copy and paste code like that because if we have to edit how that part of the validation works, we'd have to change the code in two places.

What we really want to be able to do is say, if the user enters a non-numeric value this time, then just stop and let's start over by asking them for a new input.

This means we can change our 2nd while loop to an if statement:

System.out.print("Enter a number: ");
String strNum = in.next(); 
    
while (!strNum.matches("\\d+")) { 
    System.out.println("Error: you can only enter digits.\nEnter a number: ");
    num = in.next();
}                    
num = Integer.parseInt(strNum);
if (num <= 0 || num > 100) {
    System.out.println("Error: number must be between 1 and 100.");
}
// at this point we want to start over

We want to repeat the entire block of code: ask for a number, make sure it matches the regex, parse it, make sure it's between 1 and 100. So we need an outer loop:

while () {     // repeat while INVALID     
    System.out.print("Enter a number: ");
    String strNum = in.next(); 
        
    while (!strNum.matches("\\d+")) {
        System.out.println("Error: you can only enter digits.\nEnter a number: ");
        num = in.next();
    }                    
    num = Integer.parseInt(strNum);
    if (num <= 0 || num > 100) {
        System.out.println("Error: number must be between 1 and 100.");
    }
}

So what would our condition be? We want to iterate until we finally get a valid num value, so why not use that? We could do this in a few different ways:

while (num <= 0 || num > 100)

If we choose this, we'll have to initialize num to an invalid value, first so that we are forced into the loop: we're already initializing num to 0, so that works:

int num = 0;
while (num <= 0 || num > 100) { ... }

If the user does manage to get to the parse statement, on line 9 and then we find that the integer value of the input is not between 1 and 100, that will cause the loop to re-iterate, so that works.

Some people don't like the redundant condition, though, and that's legitimate. Instead, you could create a boolean flag and use that to keep track of a valid/invalid input:

int num = 0;
boolean isValid = false; // assume it's invalid to start
while (!isValid) {          
    System.out.print("Enter a number: ");
    String strNum = in.next(); 
        
    while (!strNum.matches("\\d+")) {
        System.out.println("Error: you can only enter digits.\nEnter a number: ");
        strNum = in.next();
    }                    
    num = Integer.parseInt(strNum);
    if (num <= 0 || num > 100) {
         System.out.println("Error: number must be between 1 and 100.");
    } else {
        // if we get here, it must be valid by now
        isValid = true;  // terminates loop
    }
}

Note that you can also do this with a bottom-checking loop:

int num = 0;
boolean isValid = false; // assume it's invalid to start
do {          
    System.out.print("Enter a number: ");
    String strNum = in.next(); 
        
    while (!strNum.matches("\\d+")) {
        System.out.println("Error: you can only enter digits.\nEnter a number: ");
        strNum = in.next();
    }                    
    num = Integer.parseInt(strNum);
    if (num <= 0 || num > 100) {
         System.out.println("Error: number must be between 1 and 100.");
    } else {
        // if we get here, it must be valid by now
        isValid = true;  // terminates loop
    }
} while (!isValid);

I actually prefer the bottom-checking loop in this example: it makes more sense to see if we got a valid number AFTER we check the user's first input instead of doing it before they even get a chance to input anything.

Note that this isn't the only solution to this problem. You could also use nested if statements for both the regex and the valid range inside your loop:

int num = 0;
boolean isValid = false; // assume false until we discover otherwise
do {    
    System.out.print("Enter a number: ");
    String strNum = in.next();  
            
    // check regex first
    if (!strNum.matches("\\d+")) {
        System.out.println("Error: you can only enter digits.");

        // after this, there's nothing left to execute so we jump to
        // while(!isValid) which will be true still, so we do the loop again
                
    } else { // number matches the regex: it's a valid number
            
        num = Integer.parseInt(strNum);
                
        // is it between 1 and 100?
        if (num <= 0 || num > 100) {
            System.out.println("Error: number must be between 1 and 100.");
            // after this, there's nothing left to execute so we jump to 
            // while (!isValid) which will be true still, so we do the loop again
                    
        } else {
            // if we get here, everything checked out fine!
            isValid = true;
                    
        }  // end of if not between 1 and 100
                
    } // end of if !matches()
            
} while (!isValid);  // still invalid?
// if we get here, we've now got a valid num!

The user won't notice any difference between the two.

Exercises

[Solutions]

Modify the item quantity / price program from earlier to ask the user repeatedly until they enter the number of items and price correctly. Include user-friendly error messages. Only calculate/display the total cost if the price and number of items are valid.