Who will read your Semantic HTML?

I've talked about Semantic HTML before, and many other people have. But the one thing I find missing in these discussions is an explanation of why we should use Semantic HTML, or more specifically, who or what it will be that later reads your Semantic HTML to extract meaning from it.

Semantic HTML is all about adding meaning to your document by using appropriate HTML elements. It's a great concept. Why waste time and space with unnecessary <div> and <span> elements when other more meaningful elements are available like <h1> or <label>?

Certainly, Semantic HTML has many practical benefits. <label> has usability and accessibility benefits, like allowing people to click some text to check a checkbox, or allowing people with screenreaders understand what a text input is for. Headers like <h1> give a document structure by allowing hierarchical naming of the sections of a document. <title> lets you give a document a title. And of course, <a> lets you create hyperlinks which tie web pages together.

All these are very clear and understandable benefits and ways of using Semantic HTML. I find there are other benefits to using a variety of elements, like being able to work with the HTML and CSS of a document more easily. If a document was make completely with <div>s, you'd spend a lot of time giving things class names and IDs unnecessarily, and you spend a lot of time trying to figure which </div> goes with which <div>.

Now what really bugs me is when people start arguing about the semantic meaning of a certain element. Recently on Snook was a discussion of the Use of ADDRESS Element. Now, I understand where such discussions are coming from. The W3C HTML specs do try to define what each element is supposed to be used for. But if such a definition isn't totally clear, then you really can't use the element for anything. And you know a definition is vague when a dozen semantic and standards enthusiasts can't quite agree. And if these people can't agree, then what about the millions of other people who make web pages and haven't even heard of the W3C?

I believe it comes down to practicalities. For example, with the <address> element, you could argue that the element should only be used for web page contact information because the specs imply this. But what will this allow us to do? Is someone going to make a tool that looks for <address> elements on a page and uses it to let you contact the owner of the page? I doubt it, except maybe spam email harvesters. And even if there was, the tool would find itself to be nearly useless since so many web pages are likely using <address> for a wide variety of purposes that have little to do with web page contact info.

And what about other elements that don't even have a specific use like ordered, unordered and definition lists? I find it hard to imagine a scenario where the semantic meaning implicit in a list of items can be utilized. It's possible Google Sets uses lists this way, but chances are it mostly uses comma-separated words. And maybe the definitions feature of Google could use definition lists, but most of the results come from sites that don't use definition lists at all.

Well this brings me to the point I wanted to make. We need to think about who or what it is that will actually be extracting the meaning we're adding to our documents by using Semantic HTML. And basically I can think of three groups:

Web developers
Yourself or others that will actually be reading and working with the HTML you produce. For this purpose, class names, IDs and elements all add semantic meaning or at least readability to a document. This makes it easier to work with the HTML, and understand what each element represents structurally.
Search engine spiders and other bots
These are tools that read a large number of web pages and try to extract some meaning from them. Search engines understand that text in titles, meta tags, links and headers is special. Technorati's Microformats Search is another great example of semantics being utilized.
Web browsers, screen readers and other clients
These understand what many of the different elements are for and allow the visitor to interact with these elements in a unique way, like with a checkbox or link. Also, the client can communicate semantics to a visitor by displaying elements a certain way, like numbering the items of an ordered list. However, this semantic communication can be messed with by using CSS. An ordered list with list-style: none and list items floated will communicate no semantics to a user of a visual web browser.

In terms of these three groups of web page users, try to think of what difference it will make if a <dd> gets used for a blog post body instead of strictly a word definition. If the semantics can't be used or even considered to really mean anything, then can they even be considered semantic?

Who will read your Semantic HTML?

Web developers

Search engine spiders and other bots

Web browsers, screen readers and other clients

About the author