Data Types - String and Character

Things To Do First

Things To Do Later

Strings

In programming, when we are referring to textual data, we use the word string. A string is any set of letters, numbers, or symbols that are considered "text" and are not for calculation. For example, a customer's first name would be stored as a string. A customer's outstanding balance would not be stored as a string because at some point we'd probably calculate with that value, so we'd store it as a number (likely a floating-point number). A customer's phone number is usually stored as a string; even though a phone number can be stored as just a set of digits (your program code could format a phone number with brackets and dashes, so you don't need to store those), you don't calculate with phone numbers (e.g. you don't add two phone numbers together or multiply a phone number by some other value). The same is sometimes true of ID numbers such as a product ID or student ID. Sometimes if a programmer is concerned with saving storage space, s/he might store a phone number or ID value as an integer instead of a string.

In Java, strings are not actually considered a primitive data type; strings are an object type. We will learn more about objects and classes later in the course, but it is important to know that a string is technically an object, not a primitive data type. You can often treat a string like a primitive data type, which can make things a bit confusing! For now, we will keep in mind that a string is an object, although we will cover some aspects of strings as we cover the rest of the primitive data types.

Note
Java is one of those languages that is full of "Just type this now and we'll learn it later." things. Learning about things like Strings is one of these things. We will learn some things about strings now that we will add to or change later in the course.

When we refer to a string literal, which is just a string value that you write explicitly in code or on the computer screen, we surround the string in double-quotes like "this". When you start writing Java code, this becomes extremely important! If you don't put double-quotes around your string values in your code, Java won't understand what you're typing!

Recall that you can perform the concatenation operation on a String using the + operator. In programming, using + on Strings simply places the strings next to each other. For example:

String s1 = "Hello";
String s2 = "World";
String s3 = s1 + s2;
System.out.println(s3);

If you try this code, you'll see that you get the output "HelloWorld" on the screen. The statement that evaluates s1 + s2 just "glues" the values in s1 and s2 together. Since s1 did not have a space after the letter "o" and s2 did not contain a space in front of the letter "W", there was no space in between the Hello and World in s3.

Be careful when mixing Strings with numerics in expressions, such as:

String s4 = "10 plus 2 is " + 10 + 2;
System.out.println(s4);

When Java evaluates the expression "10 plus 2 is " + 10 + 2, it sees two + operators in a row. We know that when we see this, we perform the operations in the order in which they appear. Therefore, first Java will evaluate "10 plus 2 is " + 10. The first operand in this statement, "10 plus 2 is ", is a String value. So right away Java knows it's doing concatenation, and not addition. It concatenates the string to the value 10, resulting in the string "10 plus 2 is 10". Next, it evaluates the second + operator. On one side, it has the result from the previous operation which is "10 plus 2 is 10". It knows this is a String, so therefore it treats the + as concatenation and concatenates the 2, resulting in the final result of "10 plus 2 is 102".

In order to force the 10+2 part of the expression to evaluate first, you'd do the same thing you would do in an expression like 3 * 4 + 2 if you wanted the 4+2 to evaluate first: add parenthesis (). So change the statement to:

String s4 = "10 plus 2 is " + (10 + 2);

You should then see the proper output of:

10 plus 2 is 12

What if you wrote the expression this way:

String s4 = 10 + 2 + " is 10 plus 2";
System.out.println(s4);

You'll see that here, the 10 + 2 is evaluated first, and since Java sees two integer values, it knows the + is an addition operator, so it performs the addition of 2 and 10 first, for a result of 12. Then it sees the second + followed by the string " is 10 plus 2". It sees the integer 12 on one side and the string on the other, so it knows that it must concatenate the two, since it knows it can't perform addition on a String. That's how you end up with the proper output

12 is 10 plus 2

Try the following expression:

String s4 = "10 times 2 is " + 10 * 2;
System.out.println(s4);

What output do you get? Why?

Characters and Character Operations

There is an additional primitive type in Java called the char type. The char type is used for storing and representing single characters, as opposed to a set of characters (a set of characters is a string!). When we write out a char value, we surround it in single-quotes instead of double-quotes. This helps us to visually see that a value is a char and not a string, but it's also required Java syntax, just as double-quotes are the proper syntax for string values.

One interesting thing to not about the char type is that it is considered one of the integer types. In a later session when we talk about casting and converting data from one type or another, this will be apparent. For now, just keep in mind that the char type is in the category of integer primitives.

The char type in java is 2 bytes, so it is capable of representing up to 65,536 characters using the 16-bit Unicode standard (although Java does support the Unicode Supplementary Characters, it's beyond the scope of this course). Special Unicode characters won't usually display on the console but you can display them on a message dialog (and on GUI components, which you'll learn in term 2).

To display a specific Unicode character, use the \u escape character. For example, \u00f7 will display the division sign:

System.out.println('\u00f7');

One common use for the char type is for representing special "escape sequences". You can view a table of the escape sequences in Java in chapter 4.3 (table 4.5). We will refer back to this table when we start writing Java code. You've already used the \n (newline) character in a previous class and the \u (Unicode) character in the example mentioned in the previous paragraph.

To help differentiate between strings and chars, study these examples:

Note that it doesn't matter if the character or string is made up of digits! Any digits enclosed in quotes is either a character or a string, not a number. For example, 123.456 is a number, but "123.456" is a string. Note that '123.456' is invalid because a char type can consist of only one character or escape sequence.

Read!
Useful links: The Unicode Homepage and What is ASCII?

Furthermore, when dealing with numeric values, you must never use symbols or commas as you would when writing numbers on paper. For example, if you were to write down the value twelve thousand dollars, you might put:

$12,000

In a programming statement where you were writing a numeric, this would be invalid! The dollar sign and the commas are not valid numeric values! Decimal points are allowed (and only one decimal point) and the positive negative signs are also allowed.

Character Operations

Because char is a primitive type in the integer category of numbers, you can perform some interesting operations on characters. Try the following code statements in a Java program:

public class CharacterStuff {

    public static void main(String[] args) {

        char c = 'a';
        int intChar = c;
        System.out.println("char: " + c);
        System.out.println("int: " + intChar);

        c++;
        System.out.println(c);

        c -= 32;
        System.out.println(c);
    }
}

Your output should appear as:

char: a
int: 97
b
B

The first two lines of code create some variables, a char and an int, and initialize them with the values 'a'. What's interesting is in the statement int intChar = c;. Here we are assigning the value of the char variable to the int variable. This causes the value 'a' to be converted (or cast) into an int value. This causes the ASCII/Unicode value 97 to be stored in the intChar variable (check any ASCII or Unicode table and you'll see that the letter "a" has a value of 97).

Notice the statement c++;. This increments the value of c by 1, but c is a char, so how could this work? Remember that char is in the category of integers, so adding 1 to the char variable c is really adding 1 to the ASCII value of the character 'a': 97 + 1 = 98. 98 is the ASCII/Unicode value for the letter "b", which is why you see the output "b" for the print statement.

Similarly, c -= 32; subtracts 32 from the ASCII/Unicode value in c. The variable c was 98 (b); 98 - 32 = 66. 66 is the ASCII/Unicode value for the letter "B". In fact, if you subtract 32 from any lower-case character, you'll get it's upper-case character.

Exercises

1. Write a program that finds the ASCII/Unicode code for each of the following character values:

  1. '7'
  2. '1'
  3. 'a'
  4. 'A'
  5. 'z'
  6. 'Z'
  7. '*'

2. Open the following chart in a new browser window/tab: Simple ASCII Table.

  1. What is the decimal value for the character 'A'?
  2. What is the decimal value for the character 'a'?
  3. What character has a decimal value of 0?

[solutions]

Null Values for Strings and Characters

Recall that when we learned about numeric primitive types, we learned that 0 is the null value for any numeric.

Things get more complicated when we want to store a null value in a string. For example, you might have a customer's last name, but you might not yet have their first name, so you need to be able to leave the first name field empty. In this case, you would store the customer's first name as a null string. A null string is represented as a set of double-quotes with nothing between them like this: ""

A null string is not the same as a space; "" is not the same as " ". The space is an actual character and has a value to the computer (see Appendix B of your text to understand how this works or check out this Simple ASCII Table). The "" or null string is truly an empty string, so it's the true null value for strings.

A char's null value is represented as '\0' to indicate that a char value is empty.