Alex Kreidler

ProjectsBooksBlog

FoundationDB: The Universal Database

Apr 14, 2020

If I had to use any one currently available database for every project for the rest of my life1, I would pick FoundationDB.

Why? This is that story.

Data Models

In the beginning, there were relational databases. Everything was in a table, with each row as a new instance of a given type, and each column a property of that type. To represent the connections between items, there were foreign key columns, which referenced an ID of another item in a different table. Creating more complex than 1:1 relationships required “join tables” or other mapping schemes.

Then came along the document-based and graph data models. With the advent of JSON technologies on the web, people decided that it would be nice to store “unstructured” documents of data, with various nested key-value objects. Graph databases presented an appealing option for data that naturally fit that format or was inherently complex, like social, financial, or traffic networks.

Each of these three core data models can be stored in a variety of ways. For example all of them can be stored in a key-value store. The tabular data model can be stored in either a columnar or row-based format, which just refers to the directionality of how the table is stored on disk or in memory.

Columnar structures store data chunked by colunn first, with every row for a given column colocated. This is generally a better approach for datasets that require heavy statistics.

The traditional row-based approach stores each row together, allowing for easy access to a specific row, common in business applications, e.g. fetching one user or product.

The Substructure

The ordered key value (OKV) store is a powerful, flexible substrata for any kind of data structure. FoundationDB provides performant Get, Set, and GetRange operations with transactional guarantees. This allows for the user to design a custom data model for their application in a performant and simple way.

Most developers are fine living within one of the three main data models available to them in a variety of commercial or open-source databases. However, some need a combination of them, so they pick up a multi-model databases.

The OKV model allows for a developer to easily implement almost any data model they could implement in-memory with a programming language, such as:

  • tabular
  • matrix/tensor
  • graph
  • document-based
  • linked list
  • trees/tries
  • set
  • stack/queue
  • geographic (e.g. hexagonal)

A simple question is: why doesn’t everyone do this if it’s so much better? One might also ask: Why doesn’t everyone code in C instead of Python?

The answer is: yes, there is more flexibility, but it is also more complex, time-consuming, and difficult to get right.

So the next issue is: how do we simplify access to an ordered key value database so that it has powerful abstractions that are simple to use but also so that the access to the metal is preserved.

That’s where another feature of FoundationDB comes in handy.

Layers

FoundationDB is built on the concept of layers, which are simple APIs/additional libraries that build on to the base FDB API to add additional functionality.

There are a few basic layers builtin to the default FDB client libraries.

They’re a great concept and make a lot of things better. However there are a few problems.

  • Layers must be implemented as a client-side library, which ties them to the FDB SDK they are using in whichever language they are written in. Thus, they must be reimplemented across different languages to support all the same languages that the regular FDB SDK supports.
    • This also means that clients who want to access the database through a traditional API layer like HTTP/REST or GraphQL need to write another server to translate those API calls into FDB client API calls.
  • Layers may have undocumented or complex internals, which makes it harder for the programmer to understand and to access the data which is stored by the layer through the regular API.

A solution:

  • Allow for Server-Side Layers - aka abstractions on a server that provide some of the higher level APIs
  • Publish the internals of the Data Model Layers as a Specification which can be implemented either on the server or client side, and allow direct access to the structure of the data.
  • Create API Layers - which serve an HTTP, GraphQL, etc API on top of a Data Model Layer. These are deployed as separate severs. They should allow for loadbalancing to the actual DB instances.

Footnotes

  1. assuming there would be complex ones, and I don’t mind writing a good chunk of code