Accessibility tagging in Metanorma PDFs

Author’s picture Ronald Tse Author’s picture Alexander Dyuzhev on 03 Sep 2021

Need for accessibility tagging

Assistive technologies (AT) in general and screen readers are important tools for the visually impaired to read documents, and that also applies to standardization documents prepared using Metanorma.

Many organizations that utilize the Metanorma suite are legally required to provide “accessible” output, in other words, additional information that ensures content to be useable by AT tools.

Metanorma is committed to supporting the population that utilize assistive technologies. In this post we introduce Metanorma PDF accessibility features that are built into every PDF generated using Metanorma.

Note
Common legal requirements include the US Federal Government’s Section 508 and the European Accessibility Act.

Introducing the PDF tag tree

Those who have read the previous post about PDF math accessibility will know that the PDF format provides two kinds of information hierarchies, namely:

  • the content tree is the representation of the layout of content, providing a hierarchy of data elements that reflect the selectable text of a PDF file;

  • the tag tree is the representation of the logical structure of the document and its content, providing a hierarchy of data elements intended for accessibility applications.

Accessibility features in PDF commonly rely on information embedded in the tag tree.

The PDF tag tree is implemented as a hierarchy of tags with metadata that each relate to a visual element on the page, in effect "tagging" a content element with additional information. Each tag defines the structural role of the content element, such as whether it is a section title, or a list label, etc.

In fact, a correctly populated and structured tag tree is a main requirement for screen readers and other assistive technologies to work properly with a PDF document.

Tags are immensely useful for accessibility experiences, they:

  • allow the identification of document elements and their roles, such as headers, paragraphs, lists, and external elements (outside of the PDF file) on a PDF page, in effect making the content accessible;

  • provide a meaningful reading order for screen readers, text-to-speech tools and other assistive technology tools;

  • facilitate document resizing/reflowing for viewing with non-default font-size or smaller screens.

Note
Specification of PDF tags are defined in the PDF 1.7 standard, ISO 32000-1:2008.

Basic structural tagging

As described in the previous post, Metanorma generates PDFs through mn2pdf, a Java PDF processor based on the open-source Apache FOP (Formatting Objects Processor), a print formatter driven by XSL formatting objects (XSL-FO) technology.

While Apache FOP provides a default mapping for Formatting Objects (FO) to PDF tags, the mapping is basic and does not fully meet the needs of modern assistive technologies.

In the following sections we illustrate how Metanorma performs tagging.

Table 1. Metanorma formatting object mapping to PDF tags (identical to Apache FOP)
Meaning Formatting object element PDF tag value

Major division, clause/section

fo:page-sequence

Part

Block

fo:block-container

Div

Paragraph

fo:block

P

The ISO Rice document with an accurately populated tag tree
Figure 1. The ISO Rice document with an accurately populated tag tree

Detailed structural tagging

Lists and list items

The PDF standard also provides the list and list item tags to identify those roles within rendered content, in Metanorma we extend the mapping to them.

Table 2. List-related mapping to PDF tags
Meaning Formatting object element PDF tag value

List

fo:list-block

L

List item

fo:list-item

LI

The following example demonstrates the tagged list and list items in a generated PDF document.

Tags with `L` and `LI` for list and list items in the ISO Rice document
Figure 2. Tags with L and LI for list and list items in the ISO Rice document

We’ve customized the mapping to more accuracy of the tagging:

Headings, sub-headings and more

The PDF standard provides a series of heading tags to identify the differentiation of importance amongst headings, and they are automatically supported by the Metanorma PDF generation engine.

These tags are not mapped from Formatting Objects but directly set by the generation engine in output.

Table 3. Heading mapping to PDF tags
Meaning PDF tag value

Header 1, clause heading

H1

Header 2, sub-clause heading

H2

Header 3, second-level sub-clause

H3

Header 4, third-level sub-clause

H4

Header 5, fourth-level sub-clause

H5

Header 6, fifth-level sub-clause

H6

Tags `H1` to `H6` for clause and sub-clause headings
Figure 3. Tags H1 to H6 for clause and sub-clause headings

Table of contents

The PDF standard provides the TOC and TOCI tags for the "Table of Contents" section and each individual entry within the table of contents.

Table 4. Table of contents mapping to PDF tags
Meaning PDF tag value

Table of contents section

TOC

Table of contents individual entry

TOCI

Tags `TOC` and `TOCI` for the Table of Contents
Figure 4. Tags TOC and TOCI for the Table of Contents

Block quotes

The BlockQuote tag is provided by the PDF standard to tag quotations in block form.

Table 5. Block quote mapping to PDF tags
Meaning PDF tag value

Block quote

BlockQuote

Tag `BlockQuote` for block quotations
Figure 5. Tag BlockQuote for block quotations

Index

While not every document contains an index, the PDF standard helpfully provides a special tag Index to indicate a document’s index content.

Table 6. Index section mapping to PDF tags
Meaning PDF tag value

Index section

Index

Index individual entry

P

Tag `Index` for the document’s Index
Figure 6. Tag Index for the document’s Index

Source code

The PDF standard provides the Code tag to indicate that the tagged content is software source code.

Table 7. Source code mapping to PDF tags
Meaning PDF tag value

Source code inline or block

Code

Tag `Code` to indicate source code
Figure 7. Tag Code to indicate source code

Summary

Metanorma provides excellent support of PDF accessibility features out of the box, and particularly provides an accurate and fully structured tag tree in generated PDFs to facilitate usage of assistive technologies.

If you have any further accessibility needs with Metanorma, please do not hesitate to contact us!