HTML5 Tutorial: the basics

Tags & Semantics

One of the most important developments on the world World Web is undoubtedly the introduction of HTML5 and there is no doubt that this new version will replace the existing HTML4 and XHTML. Despite of the fact that HTML5 is in development, modern browsers rapidly come up with new support for more and more elements of HTML5. But which elements are already supported, fully or to a limited extent? Which elements have disappeared and which have remained?

The obvious question here is: what is the added value of HTML5 with respect to HTML4/XHTML? Well, this:

  • simplification, cleanup and repair of HTML4 and XHTML;
  • richer semantic markup;
  • new possibilities when working with forms;
  • own multimedia capabilities (e.g. video);
  • powerful APIs.

Caution. HTML5 is not an official standard yet, but is still in development and therefore in a exprerimental phase. However, according to the W3C, HTML5 will become a official recommendation in 2014. Standarization will proably follow soon

1. A new DOCTYPE

In HTML4 or XHTM1 the markup listed below was commonly used at the top of the page in order to define the kind of document:

<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

In HTML5 this is drastically shortened in:

<!DOCTYPE html>

The instruction to the browser to render the page in UTF-8 encoding for Unicode in HTML4/XHTML looked like this:

<meta http-equiv="content-type" content="text/html; charset=utf-8">
In HTML5 this has changed into:
<meta charset="utf-8">
So your first HTML5 document might look like this:
<!DOCTYPE html>
<meta charset="utf-8">
<title>Hello World!</title>
<p>Hello World. I salute you!</p>
eerste HTML5 pagina

But wait a minute...! What happened to the <html>, <head>, and <body> tags? Is it really that easy? The truth is that <html>, <head>, and <body> were never really compulsory in HTML4, and still aren't in HTML5 . The HTML page rendered fine without them. That is to say, they are present in the background, only the browser inserts them by itself. There are implicitly present. This is true for HTML5 pages, however not for XHTML5 pages (the XML variant of HTML5):

The XML variant of HTML5: XHTML5

In HTML5 developers have the freedom of choice of flavor since HTML5 can be written either in HTML or in XML syntax. In the latter case, the markup is called XHTML5. XHTML5 is the XML serialization of HTML5. However, one should not be confused since XHTML5 is an application of XML. In other words, HTML5 and XHTML5 have identical vocabulary but different parsing rules. XHTML5 has a stricter syntax and is not forgiving when you make markup mistakes. Furthermore, some parts of XHTML5 such as processing instructions are not valid in HTML5. Below you find the XHTML5 variant of the Hello World! page. Mind the self-closing tags in the line <meta charset="utf-8"/>. This is typical XHTML markup. Also note that there is an xml processing instruction before the DOCTYPE declaration.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Hello World!</title>
</head>
<body>
<p>Hello World. I salute you!</p>
</body>
</html> 

So in XHTML5 inserting the <html>, <head>, and <body> tags is absolutely required.

I suggest that you place the <html>, <head>, and <body> tags also in your HTML5 mark up. It just looks ...neat.

2. Conversing to HTML5

2a. Remove all presentational markup:

If you want to apply HTML5 consistently remove all HTML page markup that is intended to adjust the layout or presentation of the page and replace them with a CSS rule in your stylesheet:

Obsolete presentation markupCSS alternative
<basefont> <big> <font> <tt>font properties
<s> <strike> <u>text-decoration
<center> and align=centertext-align:center en margin:auto
align=left, right, or justify on <div>
and other text elements
text-align:left, right, justify
respectively
align=left or align=right on <img> and
other replaced elements
float:left en float:right
<body text link alink vlink>color property en
:link,:visited,:active pseudo-classes
<body background>background-image
bgcolor attributebackground-color
border attribute and <table frame rules>border properties
<table cellspacing>border-spacing
<table cellpadding>padding in de tabel cellen (<th> and <td>)
<br clear>clear
hspace, vspace, marginheight,
marginwidth
margin properties
<hr noshade size>border-style:solid and border-width
nowrap attributewhite-space:nowrap
valign attributevertical-align
height and width attributesheight and width properties
<plaintext>geen CSS, gebruik hiervoor. <pre> of maak op als text/plain, geen HTML
  

2b. Remove "deprecated" tags and attributes

There are just very few sites that use use HTML elements, and are therefore also referred to as an "unsuccessful." When you meet them somewhere in your site and don't want make the impression of an absolute amateur, remove them.

Failed HTML functionsUse in its place
<img longdesc>Use a visual description or a link to go
<frameset> <frame>Redesign the page content with <iframe> if necessary
<html version>Nothing, just remove the version attribute
<meta scheme>Avoid all the invisible meta data, and add microformats in its place to visual content
rev attributeUse rel microformats

3a. Still allowed: some maintained HTML 4 attributes associated with backwards compatibility.

A HTML4 presentational attribute that is maintained (for compatibility reasons) in HTML5 is the border attribute in the img tag. The reason is that it can remove the, in older browsers occurring, rather annoying blue border around images within a link.So HTML5 support border = "0" (no other values​​) purely for the mentioned reason .

   <a href="..."><img border="0"></a>
   
Actually, the border = "0" attribute in linked images is unnecessary , because it is also possible to solve the problem in a different, more elegant manner, namely as follows:
:link img,
:visited img 
{ border:0; } 
 

A second presentational attribute that has been preserved is the type attribute within the <script> tag. Currently, there is no need to indicate that we are dealing with javascript within the <script> tag (Javascript has already won the battle). However, in older browsers this requirement still applies. For that reason and only that reason HTML5 allows the type = "text / javascript" within the <script> tag:

<script type="text/javascript">

5. XHTML5 en HTML5

Good news in the transition to HTML5 is that the compatibility with XHTML5 remained. You can still make up your web page in XHTML5 so your code is more accessible to XLM processors, which is beneficial for the interchangeability of the content of your website with other applications.
It does mean however that you have to work slightly more accurate in the preparation of your XHTML5 markup as XHTML5 is simply less forgiving and is bound by stricter rules. Below are some simple rules for formatting valid XHTML5. Even though you have never been a fan of XHTML5 or XML, applying these rules certainly can not hurt because it keeps your code clean and consistent. And it also increases the speed and readability of the page by search engines.

5a. Self-closing tags

Some HTML4 elements have no closing tags. In order to be processed well by XML processors, they should be equipped with self-closing tags : (<... />), e.g.:

<br/>
<hr/>
<img/>
<input/>
<link/>
<meta/>
<option/>

5b. Use quotes around attribute values

XHTML required that all attribute values ​​are placed in quotes, and in XHTML5 you can just go on with this. It not only can not hurt, but it is also a good habit with the benefit of avoiding unintentional errors like e.g. attribute values ​​with spaces or other punctuation. The following examples illustrate the use of quotes around attribute values​​:

Without quotes:

<img src=plaatje1.jpg alt=image>
With quotes (better):
<img src="plaatje1.jpg" alt="image">

5c. Use explicit tags for a consistent DOM

in a previous section, we constructed an HTML5 document:

<!DOCTYPE html>
<meta charset="utf-8">
<title>Hallo mijn eerste HTML5 pagina!</title>
<p>Hallo mijn eerste HTML5 pagina. I salute you!</p>
 

We found that the <html>, <head> and <body> were missing because they are optional in HTML4 and HTML5 and that an HTML document can do well without these tags because they are implicitly present after the page has rendered. However, in XML (and XHTML5) there are no implicit tags so they must be written in the document. The <html>, <head>, <body> tags are required in XHTML5:

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Hallo mijn eerste HTML5 pagina!</title>
</head>
<body>
<p>Hallo mijn eerste HTML5 pagina. I salute you!</p>
</body>
</html> 
 
Creating a table is another well-known case where tags can be Implicitly present in HTML (and therefore have to be added in XHTML5 / XML explicitly).
HTML allows tables are formatted as follows:
<table>
<tr>
<td> row 1 cell 1 </td> <td> row 1 cell 2 </td>
</tr>
<tr>
<td> row 2 cell 1 </td> <td> row 2 cell 2 </td>
</tr>
</table>
 
which implies a <tbody> tag around the rows.
Again, in XHTML5 / XML this <tbody> tag should be noted explicitly.
 
<table>
<tbody>
<tr>
<td> row 1 cell 1 </td> <td> row 1 cell 2 </td>
</tr>
<tr>
<td> row 2 cell 1 </td> <td> row 2 cell 2 </td>
</tr>
</tbody>
</table>
 

5d. Update obsolete markup

Now that you have removed all presentation markup and flopped functionalities and also cleaned up everything and placed your tags explicitly related to XHTML5 / XML compatibility, it is time to remove the last traces of obsolete mark up.

HTML5 has finally settled with some HTML4 deprecated tags and attributes that do not fall into any of the above categories. If your code uses some of the following code fragments, replace them with the equivalent in the right column.

Removed HTML4HTML5 replacement
<acronym><abbr>
<applet><object>
<dir><ul>
<a name="a1"></a><div id="a1"> ... </div>
<img name="i1"><img id="i1" />

Leave a comment