In any programming language, you often have
reserved words or keywords.
Additionally, there are often reserved characters.
Reserved words/characters and keywords are special words or
characters that are used by the programming language.
For example, in Java, the reserved words
public,
class, and
int have a special meaning
in the language, and the
" (double-quotes) and
\ (back-slash) are
used for specific tasks so they are reserved characters.
You have to be careful when using reserved words and
characters in your code. For example, if you want
to display the text My name is "Sydney".
In a Java program you'll have to escape
the double-quotes so that Java interprets them as part
of your output, and not as actual double-quotes.
In HTML, there are several reserved characters that
have special meaning in HTML code. For example,
when you type the < symbol in your code, the browser
assumes you're opening an element tag. You might be
wondering, how then am I able to display the < symbol
without my browser assuming that everything after the <
is an element? That's the secret to this whole lesson ;)
When you want to include reserved characters like <, >,
and & inside
the content of a page, your browser might accidentally
interpret these characters as HTML code, and your page
won't render correctly.
Furthermore, we learned in an earlier lesson that HTML
interprets several contiguous whitespace characters
as a single space: If your content contains a series of
spaces, tabs, and new-lines all in a row, they'll be interpreted
as a single space. What if we wanted to show several
whitespace characters in our content?
Try it out - start a new HTML project with an index file
and add the code below to the body of your main index page (don't forget
the minimal HTML and the header and footer elements):
<p>I learned to use the <p></p> tags.</p>
<p>Ten spaces here: Done.</p>
Now save your page and, if necessary, upload it to your server.
Then load your page into a browser. Notice
the following things:
The first paragraph broke into 2 paragraphs.
The second paragraph shows only one space instead
of 10 spaces.
For the first issue, there's more going on than we actually
see: right-click the nested paragraphs in the first sentence
and choose INSPECT or INSPECT ELEMENT. Make sure you're on the ELEMENTS
tab. If necessary, expand the BODY tag in the Elements
tab so you can see the paragraphs.
You'll notice that the browser actually rendered the
paragraphs differently than we typed them: this is because
in HTML, you're not allowed to nest <p></p> tags
inside each other.
The browser will instead interpret them as several individual
paragraph tags.
What happens is that the browser first sees the opening
<p> element in
<p>I learned the
... part of the code.
When the browser encounters the
<p></p> tags
in the middle of the paragraph, it realizes that it's
not allowed to nest paragraphs, so instead it closes
the first <p>: <p>I learned to use the</p>
and then it renders the empty paragraph elements: <p></p>
Next, the browser renders the remaining content, but it
notices that it ends in a </p> tag. It closed the outer
<p> tag earlier, so now it has an unmatched </p> tag.
To fix this, it adds an opening <p> tag for the unmatched
closing </p> tag: tags.<p></p>
Clearly this is not the output we intended: we wanted to
actually display HTML tags in our page content (much like
I'm doing here throughout these pages!)
For the second issue, we see only one space, not 10.
Again, if you right-click and inspect the document, you'll see
that there are spaces in the actual source, but the
browser renders those spaces as one space. Recall that the
browser will always render several contiguous whitespace
characters as one space.
So how do we display special characters like HTML tags
and spaces in our HTML page content? By using
HTML Entities.
HTML Entities are special codes
we use to display special characters: the browser reads these codes and
renders them as the actual character instead of interpreting
them as HTML code. Character entity codes start with the ampersand
character & (which is itself a reserved character in HTML, for
this reason) and ends with a semi-colon (;) The value between the
ampersand and the semi-colon define which character the entity represents.
For example, < has the ampersand, the letter "l", the letter "t", and then
the semi-colon: the letters "l" and "t" are short for "less than", so <
is the character entity for the < (less-than symbol).
For example, when you want to display an HTML tag in the
actual content of your page, use the code
< for the <
less-than symbol and
>
for the > greater-than symbol.
To display a space, use the
entity (nbsp stands
for "no-break space").
Edit your previous example so that the page renders the
inner paragraph tag properly and also shows all 10 spaces.
<p>I learned to use the <p></p> tags.</p>
<p>Ten spaces here: Done.</p>
Tip: in the 2nd line of code, you don't need every single .
Since the browser will parse a space normally, you can alternate
and get the same result:
<p>I learned to use the <p></p> tags.</p>
<p>Ten spaces here: Done.</p>
Which Characters Must Use Entities and How?
This should be the next obvious question: which characters do
we need to escape, and how do we know what those escape codes are?
In general, you should always use character
entities for the following characters:
In All HTML Code
Character
Description
Entity Code
<
Left pointy bracket
<
>
Right pointy bracket
>
&
Ampersand (and)
&
Inside Attribute Values Only
"
double-quote
"e;
'
single-quote
'
Note that you likely won't have any need to use the
character entities for the single- and double-quotes in
any of these HTML tutorials but it's good to know, just
in case you're also exploring JavaScript or PHP while
you're going through these tutorials.
It's also helpful to memorize the following commonly-used
entities, as they might be needed often in some pages or
applications:
Character
Description
Entity Code
non-breaking space
Use where you want a space that will
not cause a soft line-break when the line is wrapped in the browser
non-breaking hyphen -
similar to a non-breaking space:
used for a hypen that doesn't cause a soft line-break when the line is
wrapped in the browser
Additionally, if your document encoding scheme doesn't support
a certain character, you'll need to escape it. This is actually
quite common with many special symbols, as we'll see in the
next section of this tutorial.
In the next section, I'll also show you some common resources
for figuring out what the entity codes are for various characters
and special symbols you'll want to display in HTML.
Exercise
See if you can write the code to display the following
on your page, exactly as shown here:
I learned to use HTML Entities
such as < and > for the < and > characters.
In addition to displaying reserved characters and whitespace
characters, HTML Entities
can be used to display special symbols that aren't normally
available on the keyboard. For example:
I produced the output above with the following code:
<p>Please recycle ♼<br>
My mood today: ☺<br>
To calculate the area of a circle, use ℼr<sup>2</sup><br>
Today, 1$ Canadian = 0.68€</p>
Entities can be displayed with special names or with coded
numbers, and coded numbers can be displayed using a decimal
(base-10) code or a hexadecimal (base 16) code.
For example, the < symbol can be displayed with
the named entity <
or with the decimal entity <
or with the hex entity <
(the 60 comes from the
numeric base-10 ASCII code for the < symbol and the 003c is the
hexadecimal value fo 60).
When using the decimal version of the entity code, always
use the # after the & symbol. When using the hexadecimal
version of the entity code, always use the #x after the &
symbol.
For example, the summation symbol ∑ can be displayed
with the named entity ∑ or the decimal entity
∑ or the hex entity ∑
This now opens up a huge set of characters that you can display,
so how do you kno wwhat codes to use for a character or symbol?
Thankfully, there are several lists available for free on the
Internet. Here are some that I use:
W3Schools:
HTML Character Sets - Lots of information about encoding and
character sets in addition to a categorized list of entity codes
(left-side menu, under the headings "HTML Symbols" and "HTML Entities").
&What is a
tool that allows you to search for special characters and emojis
and their entity codes. For example, search for "weather", "smile",
"currency", or "math"
FreeFormatter HTML Entity
List - is a great reference because
not only do they have lists of entity codes, but they also have some really
useful free tools along the left side, such as the
HTML Escape /
Unescape tool (give it some text and it will replace all the reserved HTML
characters with entity codes, or vice versa).
Unicode Table
also has a categorized list of basic and common symbols, so it's easy and fast
if you're looking for something basic.
HTML Entity Encoder/Decoder
is not a reference but another tool that you can use to replace special
characters and symbols with HTML entities, or to take code containing HTML
character entities and convert it to characters and symbols.
You can use the character and symbol entities to display any
character that's not
on the standard keyboard. Note that some browsers don't display
all of the characters: in this case, you'll usually see an empty
box. Sometimes this is also an operating system/fonts issue -
when I installed a recent version of Windows on my home computer,
I noticed my computer could no longer display certain special symbols.
Some symbols only have decimal/hex entities and don't have
named entities. Named entities are easier to remember, so we
usually use those when we can, but there are always lots of
charts available online with the decimal/hex entity codes, so
you don't have to memorize them (we do tend to memorize
frequently used ones simply by using them often).
Exercise
Create the HTML for the content in each of the following
images (each block of text is enclosed within a paragraph
element). Use entities where appropriate.