When I first started working for the Congressional Budget Office back in 2009, I quickly developed a reputation as the guy who hates PDFs. It’s followed me pretty much ever since.
There’s a bit of me that resents that reputation. I routinely share domain and ecosystem models as PDFs, and generally rely on a PDF when I want someone to comment on the substance of a document rather than wordsmith individual sentences.
What I dislike are PDFs as web content.
PDFs are designed for printing. There’s nothing particularly wrong with printing out a PDF. I spent a not-inconsiderable percentage of my graduate school career (and stipend!) standing at a photocopier, making copies two pages and $0.10 at a time.
The PDF is a decided improvement.
The problem with the PDF isn’t the form. It’s the way it presumes a particular way of thinking.
Writing is a product of the available technology. When you’re limited to a quill and parchment, you write down only the really important stuff. When you can buy a Bic and a Mead notebook for $0.75, you tend to doodle Batman symbols everywhere.
But everything from the quill to the Bic is linear. How could it be otherwise? Something has to be first. Something else has to be second. It’s how paper works.
But when it comes to putting research online, our systems privilege outcomes over process. We concentrate on documenting new information. Indeed, in many cases, we take document literally, locking our findings into PDFs that are optimized for printing.
An all-too-common view in the research and policy sector is that the IMRAD report (usually in PDF) Just Is the way that such materials are presented — that the writing of such things is timeless, independent of technology. Any attempt to change the report is met with accusations of "dumbing things down." The job of designers and technologists is simply to transfer the thing into the appropriate distribution channels.
And, of course, any (or all) of those things could exist as HTML, or they could be contained inside a PDF.
That "extra stuff" exists to “hype, promote, introduce, and discuss” the main text—which is probably a big PDF file.
More confusingly, items with different rhetorical functions can have identical sets of attributes. This is most evident for legacy content that is posted as PDF files. A brief and a working paper have very different rhetorical functions. But the rhetoric is entirely contained in the PDF. The structural bits stored in the CMS are usually identical.
The document is the fundamental unit of information exchange on the web. Research findings are (usually) collected into IMRAD reports that are (usually) formatted as PDFs. Those reports—whether PDF or HTML—are referenced by a single URL and are (usually) categorized via a hierarchical taxonomy.
A CMS built around content-as-documents faces the same problems as content in literal documents—you need human intervention and a lot of cutting-and-pasting to reuse anything.