Sunteți pe pagina 1din 10

An Introduction to HTML 5

Brief History and Specification Characteristics


PAULO AMARAL
Universidade de Évora

HTML 5 has been one of the most mentioned buzz words of the last years in technology. Its
new additions to a very mature specification have been thoroughly discussed in the media and its
developer’s promises of plugin independence have marked the technology as the next destination
in the web’s life. This paper will discuss the history of the HTML specification and its structure,
following it with a description of HTML 5’s characteristics and novelties.
Categorias e Descritores de Assunto: 1 [I.7.2]: HTML; 2 [I.7.1]: WEB
Termos Gerais: Standardization
Palavras-chave adicionais: htm5, specification

1. INTRODUCTION
HTML 5 is the new version of the standard specification of the markup language
that has been the building block for web pages since the beginning of the World
Wide Web as we know it. As years passed by, the HTML specification has been
maturing by the constant addition of new elements and revisions of its structure. We
are now faced with another version of this specification, after many years on which
it has only suffered some minor adjustments, being that this new HTML is now
being totally revolutionized to follow up the new challenges that web applications
and multimedia content’s growth have created.
This paper will briefly describe the history and evolution of HTML, following it
up with a study of its new version. Howewer, this paper is not meant to be an
exhaustive study about the HTML 5 specification, it is meant to try to answer
to the questions about HTML 5’s ability to supply its users with a matured web
experience without the dependence of third-party plugins and, at the same time,
provide them all of the plugins possibilities.

2. A BRIEF LOOK INTO HTML


2.1 It all began with the ARPAnet
In August of 1962 a computer scientist named J. C. R. Licklider, of MIT, wrote a se-
ries of memos discussing a concept he called ”Intergalactic Computer Network”. In
these memos, Licklider envisioned a system of connected machines on which any of
its users could access any data they wanted, from any geographical point. Later on,
he became the first head of the computer research program at DARPA (Defense
Advanced Research Projects Agency), where he was joined by Ivan Sutherland,
Bob Taylor and Lawrence G. Roberts. After the release of the first paper dealing

Paulo Amaral, m5682@alunos.uevora.pt


Universidade de Évora, Mestrado em Eng. Informática. Relatórios de Estado da Arte 2011.

SemEv 2011, Páginas 1–10.


2 · Paulo Amaral

with packet switching theory by Leonard Kleinrock, which would later be funda-
mental to the development of the Internet, the DARPA agency began to explore
the concept of computer networking by putting together a plan for the ARPAnet.
For the next couple of years, the DARPA community and its assistants started to
connect the first nodes (machines) of its network, which consisted of a connection
between three Universities (UCLA, UCSB and University of Utah) and the Stanford
Research Institute’s Augmentation Research Center. At this point, the communica-
tion protocols associated with ARPAnet had already turned from Circuit Switching
(a communication method on witch two nodes establish a dedicated communication
channel) to Packet Switching, which is a communication method that’s still being
used today and on which all of the data is transmited in suitably-sized blocks, which
allowed the ARPAnet to expand the number of nodes of its network. Throughout
the following years, more computers were added to this network and, by December
1970, the NCP (Network Control Protocol, the first Host-to-Host protocol of the
ARPAnet) was completed. The sucessful implementation of this protocol made pos-
sible the development of the first network applications, which eventually led to the
development of the first great web application in 1972: the electronic mail.[et al.
2011]
As years passed by, more and more networks were being connected to the ARPAnet,
which led its researchers to come to the conclusion that the existing communication
protocol, NCP, was insufficient to suit the demands of multiple networks connecting.
This situation urged the development of the TCP/IP protocol, which was specifi-
cally designed to be able to support communication over multiple networks. The
1980s also saw the birth of one of the most important technologies for the Internet,
the DNS (Domain Name System), a system that organizes machines into domains
and maps each machine to its respective IP address by its host name.[Tanenbaum
1996]
2.2 The world after Tim Berners-Lee
In 1989, the researchers of CERN, the European Center for Nuclear Research, felt
the need for a system that would help them to share their reports, blueprints,
drawings, photos and other documents across diferent countries. In March of the
same year, a physicist from CERN, Tim Berners-Lee, proposed a web of linked
documents whose implementation was finished 18 months later. This network of
documents was named the World Wide Web (WWW) and on its first form only
consisted of a text-based prototype, ledding to the development of the first web
browser (an application that retrieves, presents and transverses information on
the World Wide Web). During the development of the web browser, Berners-
Lee understood that a specification was needed so that all of the web browsers
could display the data at its original format. This idea gave birth to the HTML
specification, which is a standardized language to describe Web pages.[Tanenbaum
1996] Berners-Lee’s futuristic vision led to the standardization of this specification,
which allowed any web browser to read and format any web page it received. It’s
because of this standard language that nowadays it’s possible for anyone connected
to the Internet to easily access a world of information that revolutionized the way
how we communicate, work and even play. HTML was the World Wide Web’s
mechanism to widespread to a much broader audience due to the simplicity it
SemEv 2011.
An Introduction to HTML 5, Brief History and Specification Characteristics · 3

introduced on how we transmit information to the world.


2.3 HTML language structure
When you write a web page, the language that you use is HTML. With this lan-
guage, you can incorporate text, graphics, links, etc. to a web page. But what
really is HTML? HTML is a markup language whose objective is to describe how
a document is formatted. Markup languages are languages that use markup terms
to specify formatting commands. For example, if you want to specify in HTML
that the word ”paper” is to be presented in bold, all you have to do is to use the
markup terms (in this case also called HTML tags) <b> and </b> to begin and
end the HTML expression in the following manner:
<b> paper </b>
This formatting method has the advantage of specifying precisely to the browser
the desired formatting on which the information should be presented. This means
that any browser that ”understands” the HTML specification can easily reformat
some element to be presented on any screen dimension or resolution needed.
The HTML specification requires that a web page begins with a <html> tag
and ends with a </html> tag, and it consists of a head and a body, which are
materialized in <head></head> and <body></body> tags. HTML tags are
case insensitive, which means that <head> and <HEAD> correspond exactly to
the same thing. The ”blank” spaces are ignored by the HTML parser, but there are
tags that are used to specify extra spaces or paragraphs, for example.[Tanenbaum
1996]
It is also important to note that some tags require special parameters, which are
called attributes. To demonstrate this, imagine that you want to incorporate an
image called ”dog.png” on your web page. To do this, you will need the html tag
<img> and the attribute src that points out to the source of the image:
<img src=”dog.png”>
It’s of no use to do a further study of the HTML structure, as it’s not the
main focus of this paper. Instead, we will now take a look at the evolution of the
specification before the arrival of HTML 5.
2.4 HTML evolution
The Document Type Definition (DTD, a ”set of markup declarations that define a
document type for SGML-family markup languages”[Wikipedia 2011a]) of HTML
has suffered some changes along its years of existence. At its roots, the Level 0
HTML specification had no graphical or displaying elements because it’s goal was
merely to build a platform-independent solution for supplying and retrieving data
for its users.
In the Level 1 HTML, the container concept was introduced to HTML with
the introduction of the HEAD and BODY elements. The opening and closing of
tags was also introduced at this level, but not for all of the tags.
It was with the HTML+ specification that graphical and displaying elements
where firstly seen in HTML. A HTML+ document can use the headers, paragraphs,
lists, tables and figures elements.
SemEv 2011.
4 · Paulo Amaral

With a much more mature specification in hands, it was now possible to the
HTML 2 developers to add the FORM element to its tag list. It was now also
possible to add line breaks to separate elements by using the BR tag. This new
version of HTML also saw the birth of a very important element to the indexation
and cataloging of web pages: the META element. A META element is a group of
tags whose objective is to provide a description of the content of a web page.
The fifth version of HTML is called HTML 3 and it introduced the FIG element,
which allows the elements around a figure to be flowed if there’s sufficient room in
the page. Some aditional attributes were also introduced for background images,
tabs, footnotes and banners, and we can now add justification attributes to a page
elements, with the ALIGN attribute.
The HTML 3.2 specification saw some graphical and sound additions to the
previous one, as it introduced the SCRIPT and STYLE tags, which allowed web
pages to be filled with animations, colors and sounds.
Finally, HTML 4 brought the style sheets into the specification. With style
sheets, we were able to separate the content and presentation of a web page into
different documents. The STYLE, DIV and SPAN elements were also included with
the goal of being used with style sheets.[Galaxy 2011]

3. A DIVE INTO THE HTML 5 REVOLUTION


3.1 From HTML 4 to HTML 5
After the deployment of HTML 4 in December 1997, the World Wide Web Consor-
tium (W3C) announced that they would not continue to evolve the HTML specifi-
cation. After a couple of months, W3C published XML 1.0 (a markup language for
encoding documents in machine-readable form[Wikipedia 2011b]), which prompted
them to announce shortly after that the HTML specification would be reformulated
in XML without the addition of new elements or features. These ideas were later
materialized in the XHTML 1.0 specification, which would later on receive a few
minor new features and evolve to XHTML 1.1.[Pilgrim 2010][Lawson and Sharp
2011]
Howewer, history told us that the XHTML specification would not be a consis-
tent and popular one. But why? The answer to this question reflects less in the
specification itself and more on its implementation. It’s a commonly accepted fact
that web browsers are pretty forgivable about the ”correctness”of HTML pages, i.e.,
even if someone writes a HTML page with a few syntax mistakes, web browser’s
are still able to interprete this page and display its content. This lack of control on
the HTML syntax led to the fact that over 99% of the existing web pages contain
at least one syntax error. When W3C published their XHTML specification, it was
decided that all the XML interpreters would not accept syntax errors in their pages,
which means that a web browser would not display a web page that parsed a single
badly formed line of syntax. It is pretty obvious that this constituted a problem
of backward compatibility on the abundance of web pages that contained badly
formed HTML lines, and that by following this logic not a single web page would
be parsed by any web browser and thus all this web pages would be considered
obsolete.
To address this issue, the main web vendors, web development companies, and
SemEv 2011.
An Introduction to HTML 5, Brief History and Specification Characteristics · 5

other W3C members held the Workshop on Web Applications and Compound Docu-
ments, in June 2004, on which they discussed the evolution of the HTML standard.
As a result, there were elaborated seven principles by which the new specification
should be built around:
(1) Backwards compatibility, clear migration path
(2) Well-defined error handling
(3) Users should not be exposed to authoring errors
(4) Practical use
(5) Scripting is here to stay
(6) Device-specific profiling should be avoided
(7) Open process
Although this principles were well received in the community, the development
of a new HTML specification had an unexpected result: W3C foundation was
not keen on the idea of further developments to HTML, and in response a group
of individuals of Apple, Mozilla and Opera founded the WHAT Working Group
(WHATWG) that set out the mission to address this concerns. The key goal for
the creation of this group was to developt a new specification (Web Applications
1.0) without disregarding backward compatibility with the previous specifications
of HTML, thus enabling that the web pages that were written following the old
specification’s guidelines to be presented in browsers that interprete the new one.
After some years of lone working, WHATWG was joined by W3C to finally evolve
HTML to a new level, a move that would shortly see its results: Web Applications
1.0 was renamed to HTML 5 and both parties worked together from this point to
evolve the specification.[Pilgrim 2010]
3.2 HTML 5 principles
Although backward compatibility was the major issue when the HTML 5 specifi-
cation started development, WHATWG and W3C also stated that there were also
other fundamental issues to be tackled in this new version of HTML. The four core
principles of the new specification are:
(1) Compatibility
(2) Utility
(3) Interoperability
(4) Universal access
Each one of them will now be individually discussed.
3.2.1 Compatibility. As it was mentioned before in this paper, HTML 5 was
developed in a way that is backwards compatible with the previous specifications.
Although some presentational elements of old versions of the specification are not
present in this one, browsers that interprete HTML 5 pages are obligated to support
these elements and attributes, i.e., browsers are supposed to display the information
as it was originally written.
Besides the backwards compatibility issue, there were also analyzed millions of
web pages to discover the common ID names for DIV tags. The results showed that
SemEv 2011.
6 · Paulo Amaral

there is a great amount of repetition on the web, as a lot of this elements had the
exact same names in a great percentage of the pages. Because of this facts, new
elements (like the <header> element) were introduced to the specification with the
intention of reducing repetitions and to simplify the structure of web pages to its
authors.[Peter Lubbers and Salim 2010][W3C 2011]
3.2.2 Utility. The HTML 5 specification was written to value user before author.
This means that the specifications is extremely pratical in its code definition, though
in some cases, less than perfect. Let’s look at an example to demonstrate this
property:
id=”prohtml5”
id=prohtml5
ID=”prohtml5”
These code snippets are equally valid in HTML 5 and this example demonstrates
exactly that, although it allows some code to be written in a sloppy way, the
specification visibly focus more on the user than on the developer. By allowing this
code structure, the user won’t be presented with a coding error that doesn’t allow
him to see some parts of the web page.[Peter Lubbers and Salim 2010]
3.2.3 Interoperability. The HTML 5 specification is specified in a document
that’s over 900 pages long, but this complexity has a purpose: simplification. With
some new features like a native browser ability instead of complex JavaScript code,
a simplified DOCTYPE, a simplified character set declaration and powerful yet
simple APIs, the HTML 5 spec. achieves a level of simplicity that allows it to
behave the same way on each different system implementation it runs. Besides
that, its error recovery was defined in a way so that browsers can display “broken”
markup in a standard way.[Peter Lubbers and Salim 2010]
3.2.4 Universal access. HTML 5 uses the Web Accessibility Initiative (WAI)
Accessible Rich Internet Applications (ARIA) standard to support access to users
with disabilities. WAI-ARIA elements support screen-readers that can be added to
HTML 5 code, for example.
Besides its accessibility elements, HTML 5 funcionality also works across all dif-
ferent devices and platforms and provides support to all world languages.[Peter Lub-
bers and Salim 2010]

3.3 Goodbye Plugins


Up until now, web browsing required its users a lot of plugin-installation to support
for many web pages’s features. Plugins have some problems though:

(1) They cannot always be installed;


(2) They can be disabled or blocked;
(3) They are difficult to integrate with the rest of an HTML document (because of
plugin boundaries, clipping, and transparency issues);
(4) They present some performance issues.
SemEv 2011.
An Introduction to HTML 5, Brief History and Specification Characteristics · 7

To address this issues, the HTML 5 specification is packaged with native fun-
cionalities (through its powerfull API) and an enhanched interaction with CSS and
scripting elements that allows authors to design its web documents without the
inclusion of third party plugins. To demonstrate this, take the Flash Videos ex-
ample: Flash Videos are a very common way to display video files on web pages,
but this task is normally a resources consuming one[Ozer 2011], which slows down
the system or evens crashes it, obviously harming the user experience. Because of
this problems, HTML 5 has a video tag (<video>) that allows a web page to play
a video without the need of a plugin and that runs with a considerable enhanced
performance on its processing.[Ozer 2011]
The following list is a group of elements that belong to the HTML 5 API and
that clearly reflects the departure from plugin-dependence from previous versions
of the specification:
(1) Canvas (2D and 3D)
(2) Channel messaging
(3) Cross-document messaging
(4) Geolocation
(5) MathML
(6) Microdata
(7) Server-Sent events
(8) Scalable Vector Graphics (SVG)
(9) WebSocket API and protocol
(10) Web origin concept
(11) Web storage
(12) Web SQL database
(13) Web Workers
(14) XMLHttpRequest Level 2

3.4 The HTML 5 structure


At the time of this writing, the HTML 5 specification is still a work in progress
which means that the following features that will be presented could be changed in
the future, but the goal here is to present a brief description of the current status
of the HTML 5 structure.
As it was mentioned before in this paper, it was usual for web pages authors
that worked with the previous versions of HTML to structure their pages with
div elements, dividing their web pages in a variety of sections like header, footer,
sidebar, ... (Fig.1)
In the HTML 5 specification, there were introduced new elements to represent
each of these different sections. We can now use the <header> and <footer> tags,
for example, to structure our web pages (Fig.2).
But why should we use these new elements instead of the usually named div tags?
When we use this new elements in conjunction with the heading elements (h1 to
h6), we can mark up nested sections with heading levels, for example. This feature
allows us to easily apply some properties to a group of contents without having to
SemEv 2011.
8 · Paulo Amaral

Fig. 1. Typical two-column layout marked up using divs

Fig. 2. New elements: header, nav, section, article, aside, and footer

apply them to each one of the content individually. As an example, observe the
structure with nested sections and h1 elements on Fig.3.

Fig. 3. Applying h1 headings to nested sections

The header elements represents the heading of a section, and may contain sub-
headings, page’s version history, bylines, etc. The footer element represents the
footer of a section and generally contains informations about the author of a section,
links to related documents, copyright information and so on. The nav element
serves as a section of navigation links and it can be used as a table of contents
or for the usual site navigation. The aside element is used for content related to
the section it relates to, being tipically used for marking up sidebars. The section
element is an element that represents a generic section of a document, like a chapter
SemEv 2011.
An Introduction to HTML 5, Brief History and Specification Characteristics · 9

for example. Finally, the article element can be seen as an independent section of
a document that is suitable for content like news, blog post, comments, etc.[Hunt
2011]

3.5 Multimedia novelties of the specification


As discussed before, HTML 5 introduced a series of new elements to its specification
that generally eliminated the dependence of third-party plugins. Two of the most
important elements of the spec. are the video and audio tags that enable multimedia
content to be presented on a web page without the help of plugins. This way, if
you want to embed a video on a web page, all you have to do is to write:

Fig. 4. The video tag

In this code snippet, the controls attribute is a boolean whose value determines if
the video is On or Off by default. The poster attribute is used to specify an image
to be displayed as a preview of the video, i.e., it substitutes the video image before
the video itself starts playing.
To embed an audio element in a web page is just as simple as it was with the
video tag(Fig. 5).

Fig. 5. The audio tag

The source element of a video and audio element can be used for specifying a
list of alternative video and audio formats of the same media element. To use this
element, the src tag must be ommited from the list of attributes of the audio or
video tag(Fig. 6).
There are other video and audio elements in the specification that are not dis-
cussed here, but the elements presented in this paper should be enough to testify
the power and simplicity of HTML 5’s multimedia elements.[Hunt 2011]

4. CONCLUSION
By following its principle of backwards compatibility, HTML 5 has proved that it
can add new and fundamental elements to its specification without ignoring the
thousands of web pages that currently exist on the Web around the world. The
addition of new elements like the multimedia library and the geolocation library
gave to this new version of the specification some new weapons to fight for the
achievement of independence of the undiserable plugins that we are used to use in
the last couple of years. The new type of organization allowed by this technology
SemEv 2011.
10 · Paulo Amaral

Fig. 6. The source tag

has also given us a better document structure and a simplified formatting of this
documents.
Howewer, the HTML 5 specification is still a draft, whereupon many of its ideas
are still in an experimental stage on behalf of the two organizations responsible for
its development.[Lawson and Sharp 2011]Still, it’s fair to say that HTML 5 brought
us some new fundamental innovations and simplifications that in the future, with
its diverse native APIs, will replace the tradicional third-party mechanisms that are
usually the first choice for web applications and multimedia contents development.
REFERÊNCIAS
et al., B. M. L. 2011. A brief history of the internet. http://www.isoc.org/internet/history/
brief.shtml#LK61.
Galaxy, L. 2011. The evolution of html. http://www.layoutgalaxy.com/html-tutorial/
evolution.php4.
Hunt, L. 2011. A preview of html 5. http://www.alistapart.com/articles/previewofhtml5.
Lawson, B. and Sharp, R. 2011. Introducing HTML5. New Riders, New Riders 1249 Eighth
Street Berkeley, CA 94710.
Ozer, J. 2011. Flash player: Cpu hog or hot tamale? it
depends. http://www.streaminglearningcenter.com/articles/
flash-player-cpu-hog-or-hot-tamale-it-depends-.html.
Peter Lubbers, B. A. and Salim, F. 2010. Pro HTML5 Programming: Powerful APIs for
Richer Internet Application Development. Apress.
Pilgrim, M. 2010. HTML5 Up and Running. O’Reilly Media.
Tanenbaum, A. 1996. Computer Networks. Prentice Hall, Upper Saddle River, New Jersey 07458.
W3C. 2011. Html5 differences from html4. http://www.w3.org/TR/html5-diff/.
Wikipedia. 2011a. Document type definition. http://en.wikipedia.org/wiki/Document_Type_
Definition.
Wikipedia. 2011b. Xml. http://en.wikipedia.org/wiki/Xml.

SemEv 2011.

S-ar putea să vă placă și