Choosing a database is one of the most important decisions you will make before starting a project. There are several database paradigms and each one has its own applications.
Let’s start with the simplest ones.
Popular examples include Redis and mem-cache. The database itself is kind of like a Javascript object or a Python dictionary, every key is unique and contains a value associated with it. In Redis, for example, we use the following commands to fetch/set values.
1
2
3
4
5
6
7
8
9
redis > SET user: 23: bio“ i like turtles”
>>
OK
redis > GET user: 23: bio
>>
”i like turtles”
These databases usually store the values in the main memory, unlike other paradigms which do so in the secondary memory. This makes the key-value databases very fast but limits the amount you can store in them. No joins/relations are possible. Hence, these are mostly only used for caching and are sometimes used as a layer over a different persistent database.
A “wide” column can be understood as if we took a key-value pair database and added an extra dimension to its values where each key points to a bunch of columns. HBase is a very well-known member of this family.
As you can see in the above example, the key of “bob” points to a column group which has “name”, “age”, “eyes”, while “alice” points to a column group which has only “name” and “eyes”. This makes data grouping possible, but unlike relational databases, no predefined schema is required. Since no predefined schema is required you can scale up very easily but joins are not available.
Users interact with it using a query language called CQL. It is mostly used when there are frequent writes in the data but infrequent reads. For example, Netflix uses it to store the history of shows you watched. Even this is not used as a primary database.
Most probably you have worked with or are familiar with this database paradigm. Popular examples include MongoDB, Firestore, etc. Here, we have documents, where each document is a container for key-value pairs. These are non-structured and don’t require a schema.
These documents are grouped together in collections. Fields within these collections can be indexed and the collections can be arranged in a logical hierarchy.
Although these are schema-less you can perform relational-ish queries to some degree, but joins are also not supported here. Since joins are not supported, instead of normalizing the database, we embed it into one document. These are really easy to get started with and are good for any applications that will not require joins a lot. They are not ideal for apps that use a graph data structure a lot like Facebook.
This is an almost fifty-year old database paradigm, but it is still the most popular one. Some examples are MySQL and PostgreSQL. Most probably all of you are familiar with this. A few years after its mathematical conception, SQL was developed which is still used to this date.
The data is arranged in different tables, which are called relations. Each column is an attribute and each row is an entity for which these attributes hold some value. Each entity is recognized by a primary key. Joins are executed with the help of foreign keys and tables are normalized to make the database more optimized and less error prone. The only drawback of this database is that a schema is required beforehand, which is really tough to change once you’ve started working with it. These also support ACID properties of transactions, which means that the data’s integrity is guaranteed even if there is some hardware failure, which is important for banks. These are ideal for most apps, but not good for unstructured data.
Here the data are represented as nodes and the relation between them as directed edges. Popular options are neo4j or dGraph. Let’s say we want to build a many-to-many relationship in SQL, which you can read about here LINK . We will need a third table to act as a junction, but with graph databases, we can cut the middleman directly.
These can be an alternative to SQL databases. These are popularly used in recommendation engines like the one used at Airbnb.
If you want your database to return the most relevant results from a huge amount of data in a very short period of time, you’ll want a full-text search engine. Algolia is a very popular one in this case. It works by analyzing the data under the hood and creating the index under the hood, which then gives a very fast retrieval performance. It’s similar to the index in the back of a text-book.
So whenever a user searches something, instead of searching the whole document it searches only the index. This is expensive to run at scale but adds a lot of value to UX if you are building a search bar, for example.
In this, we only specify the type of data the frontend wants to consume, and then according to that the GRAPHQL query is made. A popular one here is Fauna. If we upload our GraphQL to Fauna it automatically generates a collection where we can store and index data. Under the hood, it takes advantage of the different paradigms we have studied so far. It’s also very fast and ACID-compliant like the relational database. It’s kind of a magic bullet but you should know the underlying concepts to get good with this one.