Lesson Overview

This lesson introduces you to HTML: What it is, why we use it, and how we use it. We'll look at where HTML came from and why we use it to structure a document and its content. We'll examine the basic syntax of HTML and learn about elements and attributes. Lastly, we'll discuss the HTML standard: where does it come from, who defines it, and how are HTML elements categorized?

Prerequisites

It's important that you understand the basics of how web pages work, how we develop web pages and web applications, progressive enhancement, the basic structure of a web site, and that you've already installed and set up all the necessary software for development basic web pages.

Introducing HTML

HTML, Hypertext Markup Language, is the language we use to create web documents and applications. HTML is an application of something called SGML (Standard Generalized Markup Language). SGML defines how other markup languages should work and be defined. SGML says that a markup language has elements (tags) and attributes, comments and character references/entities (ways to represent special characters such as the © symbol).

SGML uses a DTD (Document Type Definition) to define a markup language. For example, there is a SGML DTD for the current versions of HTML that states that the elements of HTML are defined as tags that have names, and that those tags are identified by the tag name inside pointy brackets. So an element with the name "span" should be represented by the tag <span>. The DTD also states that any HTML element that can contain text or other HTML elements denote the boundaries of the content with an opening tag and a closing tag. For example, if you wanted to code the "span" element and give it the text "foo", it must be coded as:

<span>foo</span>

The DTD for HTML also indicates that the closing tag for an element must include the forward-slash in front of the element's name, like you see in the example.

I might be simplifiing things a bit, but in reality, an SGML DTD is a syntax all its own. Thankfully, we don't need to learn SGML in order to learn HTML! Learning HTML is just a matter of learning the different elements and what they do. To be honest, it's a lot of memorization: there are hundreds of elements in HTML. But learning when, where, and how to use each element is going to be the bigger challenge.

You might have also heard of XML (Extensible Markup Language). XML is also derived from SGML and is used to structure data. In fact, there is a version of HTML that is based on XML called XHTML but it's not as flexible as HTML and still endures a lot of criticism today. XML has stricter rules but it's used in a variety of applications that need to share data over networks. XML is also used as a basis for more specific languages, such as XHTML, FXML (used to create user interfaces in Java FX) and EPUB (used to for e-book files that can be read on smart devices and e-readers). You might learn XML in later courses, but it's actually very simple. For example, here's a bit of XML that defines some of my cats:


<cats>
    <cat id="1">
        <name>Arti</name>
        <breed>Tabby</breed>
        <colour>Tawny and White<colour>
    </cat>
    <cat id="2">
        <name>Sydney</name>
        <breed>Tabby</breed>
        <colour>Grey<colour>
    </cat>
    <cat id="3">
        <name>Mr. Bibs</name>
        <breed>Domestic Shorthair</breed>
        <colour>Black<colour>
    </cat>
    <cat id="4">
        <name>Kaluha</name>
        <breed>Tabby</breed>
        <colour>Brown<colour>
    </cat>
</cats>
An example of some XML code that defines cat data

HTML looks very similar to the XML you see above, so you'll see quickly why they are related through SGML.

HTML is written in plain text, so you can use any editor, even if it's just Notepad. However, using editors that have syntax highlighting are helpful and make it easier to read and edit your code.

HTML Syntax

HTML is made up of elements, attributes, and comments. Elements and attributes are always in all lower-case letters. Never use upper-case letters in elements or attributes. This is part of the HTML5 standard.

Elements

HTML consists of a set of pre-defined elements or tags that represent the structure of your web page. Many of these elements can contain text, which contains the content of your web page.

For example, a paragraph of text can be containted inside a <p> element like so:


<p>This is a paragraph of text.
HTML consists of a set of pre-defined elements or tags that represent the
structure of your web page.  Many of these elements can contain
text, which contains the content of your web page.</p>
The <p> Element, or paragraph element, is used to add a paragraph of text to an HTML document.

There are elements you will use for paragraphs, document headers and footers, articles, sidebars, lists, tables, forms, and many other standard parts of a document.

Elements don't contain styling properties such as colours, backgrounds, borders, fonts, and layout. These things are configured with CSS (Cascading Stylesheets), which you'll learn in a different set of tutorials.

Most elements have an opening tag and a closing tag. For example, the paragraph tag <p></p> encloses a paragraph of text and/or other HTML content. A few elements only have an opening tag such as the line-break tag <br> which adds a line break or new-line into a document.

How do you learn all the different tags? Practice!! Practice!!! Practice!!!

Attributes

Attributes appear inside an element's opening tag. Attributes add extra information to the element. Attributes are generally assigned a value. When using an attribute that has a value, always place the value in single- or double-quotes.

For example,

<a href="https://terminallearning.com">Visit TerminalLearning for more tutorials</a>

This example uses the <a> element, or the anchor element. The anchor element is also known as the "link element" because it is used to add a hyperlink (or clickable link) in a document. But be careful, because there is also a <link> element that has nothing to do with creating hyperlinks on a web page (you'll learn about the <link> element when you learn about CSS files).

The content in between the anchor element's open and closing tags is the link text: the text that appears as the clickable link on the page.

This <a> element's opening tag has an attribute called href. The <a> element's href attribute defines the URL/page/file you want the user to go to when they click the link text. The user will see the text "Visit TerminalLearning for more tutorials" on the page as a clickable link. When the user clicks that text, their browser will load the URL https://terminallearning.com.

You could also use single quotes around the href attribute's value and write it as

<a href='https://terminallearning.com'>Visit TerminalLearning for more tutorials</a>

Right now it doesn't matter if you use single-quotes or double-quotes, but you should choose one and be consistent. This will matter later if you start coding with JavaScript and PHP.

HTML consists of several different attributes. Many are Global Attributes, which can be added to any HTML element/tag. Examples include the accesskey attribute, which identifies a special shortcut key a user can press to go to (or focus on) that element, and the id attribute, which assigns a unique variable name to an element (these are often needed for CSS and/or JavaScript).

Other attributes are specific to certain elements. For example, the href attribute is used for <a> and <link> elements to specify the URL or path to a link or file, and the src attribute is used in <img> elements to specify the path to an image file.

As you learn various HTML elements, you'll also learn the different attributes available. Especially if you practice!!

Comments

Comments are special bits of code that are ignored by compilors, parsers, browsers, etc. If you have learned a programming language, you've likely learned how to add comments to your program code. We do the same thing in HTML, although the reasons for adding comments and documentation are different.

The purpose of a comment in programming languages is to describe WHY you're doing something, especially when you're commenting a complex piece of program code or algorithm. We also use comments in HTML to label the different parts of our page structure or and explain why we're using certain elements/attributes for certain things.

Comments are very important, and are unfortunately under-taught and not emphasized enough. If you ever hear other developers say things like "No one comments in industry.." or "You shouldn't comment because then only you can understand your code so they can't fire you." know that those things are NOT TRUE! In fact, those things were likely said by someone who mistakenly things that comments are to describe WHAT the code is doing, and not WHY it was written a certain way.

It's absolutely vital that you comment your code, and if you don't comment your code, or you comment it badly, that will make you look unprofessional and could even get you demoted or fired. Comments help other developers understand WHY you chose to use a certain technique, logic, or structure. The purpose of comments is not necessarily to describe WHAT you're doing: any coder can read your code and tell what you're doing! The idea that not commenting can save your job is just B.S. When someone is trying to understand your code, it's much easier when that code is commented and documented well, describing why the code is written the way it is and what you were trying to accomplish. We will focus a lot on good documentation in this course.

To add a comment or to document your HTML code, you use the <!-- symbols to open the comment and then the --> symbols to close the comment. Your comment text goes in between. For example:

<body>

  <!-- HEADER area with main menu navigation -->
  <header>

    <h1>An Example of Commenting</h1>
    <h2>Sydney Greenstreet</h2>

    <nav>
      .. imaginary menu stuff here: catalog, browse, order history, etc...
    </nav>
  </header>

  <!-- MAIN content area with recent articles -->
  <main>

    ... imaginary content here...

    <!-- social media feeds -->
    <aside>
       ... tweets and facebook posts and things...
    </aside>
    
    ... imaginary content here...
  </main>

  <!-- FOOTER area with copyright, contact info, and support/contact navigation -->
  <footer>
    
    <address>&copy; 2017 Sydney Greenstreet the Cat</address>
    
    <!-- TODO: add email link to contact Sydney -->
    <nav>
      ... imaginary links to contact us, support, about us, etc...
    </nav>
  </footer>

</body>

In the example, you can see that the main sections of the document have been documented to describe their purpose. You'll also notice a special TODO comment inside the <footer> element. A TODO is one of a few special key words used in documentation to identify tasks that still need to be performed. You can use these in Java, too!

The HTML Standard

An important part of learning HTML is understanding industry standards. Such standards for web technologies are defined by 2 main organizations: The World Wide Web Consortium (W3C) and the The Web Hypertext Application Technology Working Group.

The W3C was founded in 1994 by Tim Berners-Lee after he left CERN. The objective of the W3C was to develop standards for the World Wide Web (WWW). They remained the sole community for web standards until 2004.

In 2004, several of the web's main players (including Apple, Mozilla, and Opera) were at a conference discussing their concerns about where the W3C was going in terms of HTML (this was around days of XHTML, which many developers felt was becoming an ungainly monstrosity). These folks decided to form their own consortium (WHATWG) and developed their own standards for HTML based on the pre-XHTML standards. WHATWG has maintained their own HTML standards ever since.

The Drama Between W3C and WHATWG

Eventually, W3C abandoned the XHTML standard and decided to focus on maintaining the new HTML standards, which was also maintained by WHATWG at the same time. The HTML5 draft was released in 2007 and accepted as the current standard in 2014. Both W3C and WHATWG had similar standards for HTML5 with some minor differences, although the number of differences grew over the years, and in some cases, both standards completely contradicted each other. This became very confusing and problematic for developers.

Additionally, WHATWG preferred a Living Standard for HTML: No version numbers are assigned because the standards documents grow as the language grows and expands. When items are changed or new things are added, the documentation is updated as necessary. W3C preferred retiring standards: a version number is assigned and the documentation for that version is static. Changes and new items are added to a draft document of the next version, but this draft is not yet the accepted standard, yet. When there are enough new additions and changes to the language, the draft of the new standard is then published as the current standard with a new version number. Then the old standard is retired.

There are advantages and disadvantages of both: for example, browsers need time to update to accept new standards so sometimes a living standard that changes quickly can be difficult to keep up with. But maintaining a living standard for a language that changes quickly, as is the case with HTML, is much easier. A living standard also allows developers to start implementing changes right away because they know the browsers will also be ensuring their software supports the new changes as they are added to the living standard. With a static standard, the changes that browsers and developers make are less gradual and generally happen only when a current standard is published. This means that sometimes developers have to spend large amounts of time implementing many changes at once, rather than being able to implement them gradually over time.

In May 2019, W3C and WHATWG signed an agreement that they would collaborate on a single HTML Living Standard and DOM Specification so that developers could be assured that there was one set of standards for HTML and DOM specifications (we cover DOM in a future lesson). They agreed that both the HTML and DOM standards would be maintained by WHATWG and W3C would no longer publish their own specifications for HTML and DOM (but they continue to maintain other specifications). If you're interested in the finer details of the agreement you can read the announcement published on May 28 2019 in the W3C blog.

The HTML Living Standard

The HTML Living Standard can be found at WHATWG: HTML Living Standard. This document outlines the standard structure of HTML documents, the syntax for elements and attributes, and describes the intended use for all the HTML elements and attributes. There are several other sections of the specification that are beyond the scope of this particular set of tutorials, but some of those will appear in other courses! Feel free to explore the document as you become more comfortable with HTML.

The HTML elements are orgainzed into categories, and you can see these in Section 4 of the Living Standard. The categories of elements we're covering in this course are:

Section # Category Description Examples of Elements
4.1, 4.2 Document Structure and Meta Data Elements that define the main structure of the page and the meta data for the document. Note that we already covered these elements in the Minimal HTML lesson. HTML, HEAD, TITLE, META
4.3 Document Sectioning Elements that logically structure the various sections of the page/document. These are semantic elements that tell you something about that section of the page. BODY, SECTION, NAV, HEADER, FOOTER, H1, H2, etc.
4.4 Content Grouping Elements that organize the actual content of the page. These elements contain actual content (e.g. text, figures). These are block elements, and the element name helps define the type of content it contains, in some cases. P, BLOCKQUOTE, PRE, MAIN, DIV, FIGURE, MENU, elements for lists
4.5 Text-Level Semantics In-line elements that help define the style or meaning of a piece of in-line text. EM, STRONG, CODE, SPAN, BR
4.6 Links Elements used to create hypertext links in a document. A, AREA
4.8 Embedded Content Elements that contain media such as images, audio, and video. IMG, PICTURE, EMBED, AUDIO, VIDEO
4.9 Tables The various elements that make up a table of data with rows and columns. TABLE, TR, TD, TH
4.10 Forms The various elements that make up a form that can be used to collect inputs from users. FORM, INPUT, BUTTON, FIELDSET, LEGEND
4.11 Interactive Elements Elements used to create interactive components such as a details/disclosure box or a dialog box. DETAILS, SUMMARY, DIALOG

The next set of tutorials covers each of the above sections, although some sets of elements require their own tutorials!