Screens, Research and Hypertext

Powered by 🌱Roam Garden

Triple Store Database

Getting into the full details of a triple store is far beyond my level of expertise. Think of this as the very not-technically savvy tl;dr of this book, which is worth a read if you really want to dig in on web semantics.

Something like our old friend

consists of three parts: a subject (Quotation), a predicate (publishedIn), and an object (Source).

A standard database (MySQL or its sexier descendants) purports to store relationships—they are called relational databases. But in practice, so-called relational databases store only the subject and the object, along with a key that allows for referencing across different tables. (For you Excel fans, it's like a pivot table on steroids.) But a relational database does not store the actual relationship—the predicate that explains how the subject and object are connected.

A triple store (sometimes also called a graph database or an RDF database), stores subject, object and predicate as triples (hence the name). That added bit of complexity makes triple stores far more powerful than a typical database. Indeed, because it stores the relationships between data, a well-designed triple store can actually surface new information by drawing inferences that aren't themselves explicitly stored as separate pieces of data.

For example, imagine a triple store database that contains the following data:

bio:John b:hasGrandchild b:Matt

bio:Joe b:hasChild b:Matt

bio:John b:hasChild b:Josh

bio:Joe b:hasGender b:cishet

Such a system could deduce that Josh is Matt's uncle, even though it does not directly store any information linking Josh and Matt. (The first three lines entail one of three relationships between Joe and Josh: partners, brothers or brothers-in-law. The fourth line rules out partners. Either of the other two relationships leaves Josh as Matt's uncle.)

This sort of thing is (at least part of) what it means when people talk about "machine learning." Indeed, Facebook's Social Graph, Twitter's Interest Graph and Google's Knowledge Graph are all proprietary examples of triple store databases. It's their ability to draw new inferences from stored data that makes them so valuable to advertisers (and so creepy to privacy advocates).