Semantic meaning: Using Styles in editing, print design, and ebook formatting

Say what you mean. It’s so simple in books, right? If it’s an important idea, bold it. Or maybe italicize it. If it’s a section header, make it big. Chapter title, make it really big. That’s all there is to it, right?

Right?

Not quite. Not anymore. Life is digital. Books arrive on the page only after journeying through several computers running different softwares.

And even after that, books diverge as they approach the reader, going to hardcover, paperback, e-readers, screen readers for the visually impaired—and who can say what other medium? Books no longer die. They can be expected to live long lives and in many forms. Who knows on what devices or in what formats books will be read in years to come?

Here's the thing: All those manually bolded, italicized, and embiggened words? They won't necessarily survive through the transmogrification from existence to existence. Simple visually-defined clues to meaning can get lost in the process.

No longer can books get away with saying what they mean. They also need to, in their very structure, indicate how they mean it.

Talking about semantics

Consider the definition of semantics offered by W3 Schools:

Semantics is the study of the meanings of words and phrases in a language.
Semantic elements = elements with a meaning.

When we read a print book, we pick up on the meaning from how the text has been styled visually—boldface, italics, enlarged text, all caps, small caps, etc.

But formats alone do not define semantic meaning

dolphin-shark-appearance.jpg

Web design taught us that simple text sizing and bolding wasn’t enough. For search engines and screen readers to understand what was on the page and how each element related to the others, we needed semantic markup—and it had to extend beyond mere <p> tags and header tags like <h1>, <h2>, etc.

Over several iterations, leaders in web and accessibility standards in collaboration with browser developers established markup elements—invisible to the reader—that define the semantic meaning of the text on the page. At the most basic level, we’re talking about what is a paragraph, what is a header, what text is emphasized by the author, and so on. Moving out from there, this semantic markup describes what is an article, what is an aside, what is a section of a larger piece, etc. Layers on top of this semantic architecture define additional identification metadata—microformats, RDFa, ARIA roles, Schema.org markup, Dublin Core—for the purposes of accurate indexing and providing accessibility for people with disabilities.

On the printed page, we see and identify the basic semantic structure through the visual cues in the typography and layout. Emphasized text becomes italicized or bolded. Headers are enlarged, styled, and set on their own lines. Chapter titles start on new pages.

On the digital screen, the appearance of these visual cues reflects what is actually defined in semantic markup (and styled in CSS stylesheets). We as readers don’t see that underlying architecture; we just interpret the architecture from what we see.

As readers, we identify this paragraph here as a new paragraph because we see the break before it. But semantically, this is a new paragraph because it has been tagged with a <p> tag that identifies this as paragraph element. The break we see is a result of formatting styling applied to the semantic tag. In other words...

Because this is a new paragraph, the web browser (or reading device) presents the text as a new block of text—in this case, with a break before it.

It’s not just for websites

Let’s look at the creation of a book. The author writes it and (hopefully) rewrites it. At some point, the author uses computer software. Maybe it’s Microsoft Word. Maybe it’s Scrivener. Or Google Docs. At some point, the author is done and hands off the manuscript to the editor (either directly or after sale of the book to a publisher).

The editor almost always works in Microsoft Word.

So does any other editor touching the manuscript.

For better or worse, Word is unavoidable in books publishing. Yes, there are workarounds—but those workarounds do not avoid the need for defining and retaining the semantic meaning of the manuscript content.

Books are born from a Styled process

—Not styled as in uniform, but Styled for consistency as it moves through the process from author to reader.

In Microsoft Word, semantic meaning is indicated by application of Styles.

Microsoft describes Styles as:

Built-in styles are combinations of formatting characteristics that you can apply to text to quickly change its appearance.

What they don’t say is the most important thing—that Word’s Styles define semantic structure. Styles designate what is a paragraph, what is a header—and what kind of header—what text is emphasized, and so on.

The application of Word’s Styles represents author’s best method to define the semantic structure her text. Without Styles, the semantic structure of the manuscript is left to visual cues only, and that can lead to problems.

     In this screenshot from Word 365 (for Mac), the manuscript text is on the left in “page layout” mode. The plain numbers are just line numbers I turned on for easy reference. The colored numbers running down the left column indicate the Style that has been applied to that text. The panel on the right lists the Styles that are in use.

    In this screenshot from Word 365 (for Mac), the manuscript text is on the left in “page layout” mode. The plain numbers are just line numbers I turned on for easy reference. The colored numbers running down the left column indicate the Style that has been applied to that text. The panel on the right lists the Styles that are in use.

    Direct formatting is a terribly fragile way to go in the imperfect process of transitioning the book from application to application, from format to format.

       Look what happens when the inline formatting is removed from the text samples. Only the semantic meaning remains.

      Look what happens when the inline formatting is removed from the text samples. Only the semantic meaning remains.

      Not only can a subheading or chapter title indicated only by visual formatting end up getting missed or misinterpreted by and editor or book designer, such visual indicators can be lost along the way simply by technical glitch in the software.

      • A technical glitch might remove a bolded passage.
      • A software hiccup might remove formatting on a heading.
      • A migration from one format to another might interpret chapter titles as regular text.

      Styles provide the most robust and specific means to define and sustain semantic intent as the manuscript goes through the process of becoming a book.

      • For the editor(s) using Word.
      • For the book designer using InDesign or other layout program which imports Word Styles as part of the creation process of the print master.
      • For the ebook designer/formatter who translates Word Styles into HTML as part of the process of creating the ePub and/or Kindle master
      • For the meatgrinder, Kindle upload converter, or other automated conversion system that applies an automated algorithm to the Styles in the Word document to define the HTML structure of the resulting ebook.
      • For the ebook reading device that relies upon clearly marked semantic structure in order to indicate chapters, provide a Table of Contents, mark paragraphs, format italics, and so on.

      Style me now or Style me later

      For Authors

      Let's be real now. If Word’s Styles seem like too much to deal with, forget about it. The author’s first priority is on the content itself. Editors are accustomed to receiving from authors Word manuscripts that have few or no Styles applied. If the author has applied Styles to her text, the editors will be able to do their work more efficiently. But not having Styles is not the end of the world.

      For Editors

      I would argue that while it would be very helpful for editors mark everything with Styles, it is not essential per se. Editors must focus first on clarity and consistency and the content of the words. Book designers are accustomed to receiving from editors Word manuscripts that have few or no Styles applied. But again, the more Styles have been cleanly applied to the manuscript, the easier it is for book designers to lay out the books.

      For Print Book Designers

      Honestly, Styles are not required. It’s perfectly fine for a book designer to format a book word by word, paragraph by paragraph.

      But if they aren’t using Styles, they certainly are not working efficiently. For Styles not only mark semantic meaning, they also provide quick and easy ways to modify formatting throughout the document with just a few clicks. (Do you want to change the chapter title font one chapter at a time, or once with a quick alteration of the relevant Paragraph Style?)

      For Ebook Designers and Formatters

      Here is where semantic structure becomes absolutely imperative. Without proper markup, an ebook would be just blobs of text, unreadable, illegible, broken. Yes, the technically capable (if not savvy) ebook designer can make everything a basic paragraph and code inline formatting into every chapter title, every subheading, every blockquote. And it might even look okay to visual readers.

      But such a document would embody no semantic meaning. Screen readers would miss those inline nuances. Search bots and indexing systems would have no indication as to the relative importance of such strings of text. And down the line, when the text is, for example, projected within someone's virtual reality glasses? Who knows what they would see?

      Consistency Matters

      On the other hand, if all goes well in the application of Styles across the board, the results in terms of the semantic structure of the text will be consistent between all formats of the book, including those yet to be invented.

      • The print book, which incorporates visual formatting to indicate semantic meaning to the reader.
      • The ebook, whose content is marked up with HTML to define how to visually format the text for the reader.
      • The screen reader, which reads the structure of the content to indicate to the visually impaired book consumer what each blob of text means in terms of structure.

      Convinced? Share your comments below.