PDF/A instructions

This page briefly summarizes recommendations for writing thesis in Latex, that would conform to the PDF/A format. The recommendations are based on several web-pages (see below), including the official recipe of MFF UK, which in addition discusses also PDF/A for other text editors than Latex: https://www.cuni.cz/UK-7987.html

Recommendations:

  1. Please avoid using online PDF/A converters – they often fail or generate PDFs with problematic pages converted into a screenshot-like figure, which makes the PDF even worse machine-readable !
  2. Use pdflatex of version at least 3.14159265-2.6-1.40.17 (TexLive 2016).
  3. In order to validate a generated PDF use one of the following tools:
  4. Stick to the recommended MFF Latex template as much as possible:
    • Bc. / Mgr. / PhD.
    • Following the instructions in the README of the templates, download the latest version of pdfx a unzip it into the sub-directory tex of the template (thus the directory structure should be vzor-bc/tex/pdfx/…).
    • Stick to the PDF/A-2u format in the template, which allows more features in the PDF documents.
    • Test, that the template generates valid PDF/A document (make passes without errors and the resulting PDF passes the validation).
  5. Figures are often a source of PDF/A incompatibility:
    • Test each of your figures alone, by using a pure MFF Latex template with the figure included.
    • In case the figure causes PDF/A incompatibility, try to fix the figure (not the whole thesis!) using the following command from Ghostscript:
      • gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFA=2 -sColorConversionStrategy=RGB -dOverrideICC=true -sOutputFile=output.pdf input.pdf
      • (or in older Ghostscript versions: gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sColorConversionStrategy=RGB -dOverrideICC=true -sOutputFile=output.pdf input.pdf)
      • or use an online convertor at https://kam.mff.cuni.cz/pdfix/, which allows in addition to convert text into lines for problematic fonts (parameter -dNoOutputFonts for ghostscript).
    • The figures are typically missing embedded fonts. The command above adds them in (the embedded fonts can be checked using command pdffonts figure.pdf).
    • In case your original figure is of bitmap type (PNG, JPG), do not convert it into PDF, but keep the original bitmap format.
    • If everything fails, convert the original figure to high-DPI bitmap.
  6. Another common problem originates from the use of special mathematical symbols when the validator complains about an unknown Unicode or missing key ToUnicode. The only solution is to try to replace the symbol by a similar one, which will pass the validation. It is recommended to test the text of the thesis as being written, which allows better identification of the problematic symbols (PS: The validation is not sensitive to missing references, therefore one can test e.g. each chapter alone).
  7. In case your chapter / section titles contain special or mathematical symbols, help Latex to find the proper substitution for the interactive content annotation of the resulting PDG, e.g.:
    • \section{Analysis of \texorpdfstring{$B^0_s \to \mu^-\mu^+$}{B0s->mu+mu-} decay}
  8. In the case you intend to include a PDF article as part of the thesis, but the article does not conform to the PDF/A format, upload it into SIS as an external attachment, which will be bound with the thesis (PS: The external attachment needs manual approval, which can take some time).

External sources: