Lesson Overview

In any programming language, you often have reserved words or keywords. Additionally, there are often reserved characters. Reserved words/characters and keywords are special words or characters that are used by the programming language. For example, in Java, the reserved words public, class, and int have a special meaning in the language, and the " (double-quotes) and \ (back-slash) are used for specific tasks so they are reserved characters.

You have to be careful when using reserved words and characters in your code. For example, if you want to display the text My name is "Sydney". In a Java program you'll have to escape the double-quotes so that Java interprets them as part of your output, and not as actual double-quotes.

In HTML, there are several reserved characters that have special meaning in HTML code. For example, when you type the < symbol in your code, the browser assumes you're opening an element tag. You might be wondering, how then am I able to display the < symbol without my browser assuming that everything after the < is an element? That's the secret to this whole lesson ;)

Prerequisites

It's important that you understand the syntax of HTML, and the minimal HTML code for valid web documents, the basics of HTML elements, and standard web site structure.

HTML Character Entities

When you want to include reserved characters like <, >, and & inside the content of a page, your browser might accidentally interpret these characters as HTML code, and your page won't render correctly.

Furthermore, we learned in an earlier lesson that HTML interprets several contiguous whitespace characters as a single space: If your content contains a series of spaces, tabs, and new-lines all in a row, they'll be interpreted as a single space. What if we wanted to show several whitespace characters in our content?

Try it out - start a new HTML project with an index file and add the code below to the body of your main index page (don't forget the minimal HTML and the header and footer elements):

<p>I learned to use the <p></p> tags.</p>
<p>Ten spaces here:          Done.</p>

Now save your page and, if necessary, upload it to your server. Then load your page into a browser. Notice the following things:


the page doesn't appear as we would expect
How the page looks in the browser
  1. The first paragraph broke into 2 paragraphs.
  2. The second paragraph shows only one space instead of 10 spaces.

For the first issue, there's more going on than we actually see: right-click the nested paragraphs in the first sentence and choose INSPECT or INSPECT ELEMENT. Make sure you're on the ELEMENTS tab. If necessary, expand the BODY tag in the Elements tab so you can see the paragraphs.


the source view of the code we added to our page
The inspector shows that there are extra paragraph elements we didn't add.

You'll notice that the browser actually rendered the paragraphs differently than we typed them: this is because in HTML, you're not allowed to nest <p></p> tags inside each other. The browser will instead interpret them as several individual paragraph tags.

What happens is that the browser first sees the opening <p> element in <p>I learned the ... part of the code.

When the browser encounters the <p></p> tags in the middle of the paragraph, it realizes that it's not allowed to nest paragraphs, so instead it closes the first <p>:
<p>I learned to use the</p>
and then it renders the empty paragraph elements:
<p></p>

Next, the browser renders the remaining content, but it notices that it ends in a </p> tag. It closed the outer <p> tag earlier, so now it has an unmatched </p> tag. To fix this, it adds an opening <p> tag for the unmatched closing </p> tag:
tags.<p></p>

Clearly this is not the output we intended: we wanted to actually display HTML tags in our page content (much like I'm doing here throughout these pages!)

For the second issue, we see only one space, not 10. Again, if you right-click and inspect the document, you'll see that there are spaces in the actual source, but the browser renders those spaces as one space. Recall that the browser will always render several contiguous whitespace characters as one space.

So how do we display special characters like HTML tags and spaces in our HTML page content? By using HTML Entities.

HTML Entities are special codes we use to display special characters: the browser reads these codes and renders them as the actual character instead of interpreting them as HTML code. Character entity codes start with the ampersand character & (which is itself a reserved character in HTML, for this reason) and ends with a semi-colon (;) The value between the ampersand and the semi-colon define which character the entity represents. For example, &lt; has the ampersand, the letter "l", the letter "t", and then the semi-colon: the letters "l" and "t" are short for "less than", so &lt; is the character entity for the < (less-than symbol).

For example, when you want to display an HTML tag in the actual content of your page, use the code &lt; for the < less-than symbol and &gt; for the > greater-than symbol.

To display a space, use the &nbsp; entity (nbsp stands for "no-break space").

Edit your previous example so that the page renders the inner paragraph tag properly and also shows all 10 spaces.

<p>I learned to use the &lt;p&gt;&lt;/p&gt; tags.</p>
<p>Ten spaces here:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Done.</p>

Tip: in the 2nd line of code, you don't need every single &nbsp;. Since the browser will parse a space normally, you can alternate and get the same result:

<p>I learned to use the &lt;p&gt;&lt;/p&gt; tags.</p>
<p>Ten spaces here:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Done.</p>

Which Characters Must Use Entities and How?

This should be the next obvious question: which characters do we need to escape, and how do we know what those escape codes are? In general, you should always use character entities for the following characters:

In All HTML Code
Character Description Entity Code
< Left pointy bracket &lt;
> Right pointy bracket &gt;
& Ampersand (and) &amp;
Inside Attribute Values Only
" double-quote &quote;
' single-quote &apos;

Note that you likely won't have any need to use the character entities for the single- and double-quotes in any of these HTML tutorials but it's good to know, just in case you're also exploring JavaScript or PHP while you're going through these tutorials.

It's also helpful to memorize the following commonly-used entities, as they might be needed often in some pages or applications:

Character Description Entity Code
non-breaking space Use where you want a space that will not cause a soft line-break when the line is wrapped in the browser &nbsp;
non-breaking hyphen - similar to a non-breaking space: used for a hypen that doesn't cause a soft line-break when the line is wrapped in the browser &#8209;
© copyright symbol &copy;
® registered trademark &reg;

Additionally, if your document encoding scheme doesn't support a certain character, you'll need to escape it. This is actually quite common with many special symbols, as we'll see in the next section of this tutorial.

In the next section, I'll also show you some common resources for figuring out what the entity codes are for various characters and special symbols you'll want to display in HTML.

Exercise

See if you can write the code to display the following on your page, exactly as shown here:

I learned to use HTML Entities
such as &lt; and &gt; for the < and > characters.

When you're done, you can check your answer.

HTML Symbol Entities

In addition to displaying reserved characters and whitespace characters, HTML Entities can be used to display special symbols that aren't normally available on the keyboard. For example:


Please recycle ♼
My mood today: ☺
To calculate the area of a circle, use ℼr2
Today, 1$ Canadian = 0.68€

I produced the output above with the following code:

<p>Please recycle &#x267c;<br>
My mood today: &#x263a;<br>
To calculate the area of a circle, use &#x213C;r<sup>2</sup><br>
Today, 1$ Canadian = 0.68&euro;</p>

Entities can be displayed with special names or with coded numbers, and coded numbers can be displayed using a decimal (base-10) code or a hexadecimal (base 16) code. For example, the < symbol can be displayed with the named entity &lt; or with the decimal entity &#60; or with the hex entity &#x003c; (the 60 comes from the numeric base-10 ASCII code for the < symbol and the 003c is the hexadecimal value fo 60).

When using the decimal version of the entity code, always use the # after the & symbol. When using the hexadecimal version of the entity code, always use the #x after the & symbol.

For example, the summation symbol ∑ can be displayed with the named entity &sum; or the decimal entity &#8721; or the hex entity &#x2211;

This now opens up a huge set of characters that you can display, so how do you kno wwhat codes to use for a character or symbol? Thankfully, there are several lists available for free on the Internet. Here are some that I use:

You can use the character and symbol entities to display any character that's not on the standard keyboard. Note that some browsers don't display all of the characters: in this case, you'll usually see an empty box. Sometimes this is also an operating system/fonts issue - when I installed a recent version of Windows on my home computer, I noticed my computer could no longer display certain special symbols.


some special symbols (heart, curly leaf) appear but
                         other symbols appear as an empty box
When your computer can't display certain characters, they sometimes appear as an empty box.

Some symbols only have decimal/hex entities and don't have named entities. Named entities are easier to remember, so we usually use those when we can, but there are always lots of charts available online with the decimal/hex entity codes, so you don't have to memorize them (we do tend to memorize frequently used ones simply by using them often).

Exercise

Create the HTML for the content in each of the following images (each block of text is enclosed within a paragraph element). Use entities where appropriate.


Today there is a 70% chance of rain. There's a rain icon in front of the word rain
Exercise 1 uses an emoji for "rain"

I have learned to play the scales in C, G, D, B flat, and E flat. By the end of the month I hope to be able to play them all, including C sharp and F sharp.
Exercise 2 uses the flat and sharp symbols

which element contains a piece of meta-data for a document? followed by options for the elements head, meta, data, and link
Exercise 3 displays actual HTML tags using pointy brackets

If you want, you can check your solutions.