Relational vs. Non-Relational Databases
What are the key differences between these two main categories of databases, and how do you select the right type of database for different use cases?
When you walk into a grocery store, food items are not strewn about meaninglessly. The store’s products are arranged in aisles that contain similar types of items, and the aisles are organized in such a way to create an efficient and logical flow to a shopper’s visit. Have you ever parked your car at a stadium or large shopping mall? The parking spaces are often arranged in sections identified with a letter, and subsections that are each given a number. This helps visitors remember where they parked, and find their car when it is time to leave.
Similarly, information that companies and organizations maintain is not stored in a manner devoid of structure and organization. Quite the contrary––organizational data nearly always conforms to a very specific set of rules that determine how the information is categorized, how new data entered into the system will relate to existing data, and how information will be retrieved when it is needed in some external system. Any tool that handles these tasks of storing information and serving it up when needed is called a database, and databases are an essential aspect of any information ecosystem.
There are many different kinds of databases, however, and the type of database an organization uses depends on the type and amount of information it will contain, as well as the needs of its end users. Here, we will look at relational and non-relational databases. While sub-categories and different types of databases do exist, most database solutions that organizations use on a daily basis fall into one of these two main families of database architecture.
What are relational databases?
A relational database, which is sometimes referred to as a Relational Database Management System (RDBMS), stores data in the form of rows and columns that comprise tables, and tables that relate to one another through “keys,” or information shared between both relations. Within a relational database, a row in a table is referred to as a “record,” and a table is referred to as a “relation.” For those interested in computing history, relational databases were first defined by E.F. Codd in 1970 who, while working at IBM, wrote the paper “A Relational Model of Data for Large Shared Data Banks”. Codd later received the ACM Turing Award for this enduring contribution to the way data is stored and retrieved.
Tables or “relations” within a relational database are connected to one another through ”keys”–– unique identifiers given to each record in one table that correspond to the same information in another table. When two tables are related in this manner, the column of unique identifiers in the first table is referred to as a “primary key,” and the same information in the adjoining table is referred to as a “foreign key.” Let’s take a look at a simple example of two tables within a relational database to illustrate how this works:
These two tables represent how a primary/foreign key relationship works in a relational database. In the customers table, customer_id is the primary key, since these identifiers are unique to each customer. Tables in a relational database should not have duplicate rows, since this would complicate the process of looking up and retrieving records. Therefore, each table should have a column(s) containing primary keys to give each record a unique identifier. In the orders table, customer_id is the foreign key, since it corresponds to the primary key on customers, and therefore lets us know which customer made this particular order.
We don’t know the orders each customer has placed by looking at the customers table. Similarly, we don’t have any meaningful information about the customers who placed orders just by looking at the orders table. The ideas that each table should only describe one thing, each record should contain a unique identifier, and every table should have a list of constraints dictating what types of data it can and cannot include are foundational to relational databases. These basic principles that help relational databases maintain “referential integrity,” or the accuracy and consistency of data––a key advantage of this type of database architecture.
Querying relational databases
For the sake of comparing querying in relational vs. non-relational databases, let’s take a look at some very simple commands to retrieve data from table(s).
This command would return all of the records in the customer table:
SELECT * FROM customers;
Here, we’re selecting only the orders where product_name is “Donut”:
SELECT * FROM orders WHERE product_name=’Donut’;
The JOIN keyword lets us combine data from multiple tables using the the column that serves as a primary/foreign key between the two relations:
SELECT * FROM customers INNER JOIN orders ON customers.customer_id=orders.customer_id;
In the statement above, the clause `ON customers.customer_id=orders.customer_id` indicates that customer_id is a primary key in the customers table, and a foreign key in the orders table. With this information, we can align the records in orders with those in customers, and retrieve a table that tells us both about our customers themselves as well as the orders they have placed.
When should you use a relational database?
Relational databases are best suited to house data that contains a relatively strong structure that lends itself to rows and columns. Data points that will have a consistent meaning, can be easily placed into categories, and with relationships that can be easily defined are prime candidates for storage in a relational database.
Examples of data models well suited for relational databases include:
- Employee data and internal records
- Purchase histories
- Website and app analytics
- Financial records
- Individual user events
While there is certainly overlap in the use cases that relational and non-relational databases can handle, an RDBMS is more than likely the better choice in situations where repeated data analysis will create a need to constantly query specific data cross sections. Also, running basic SQL queries on structured data stored in relational tables is a relatively straightforward skillset that a variety of internal stakeholders can adopt. Running queries on less structured data data in a non-relational database, on the other hand, may likely require the deeper technical expertise of a programmer or data scientist.
Common relational database solutions include (but are certainly not limited to) Oracle DB, Amazon Redshift, and BigQuery from Google Cloud. While these vendors differ in the specific functionality and services they provide, their solutions are all based on the foundational concepts of relational databases.
What are non-relational databases?
Non-relational databases store data in a non-tabular format and are often based on data structures like objects or documents. Unlike a relational database which depends on relations between individual tables to provide a complete picture of your data, a single document within a non-relational database typically contains all of the data that has to do with whatever that document describes. While relational databases typically store a constrained number of traditional data types, non-relational databases can store a wide variety of information formats side-by-side.
There are a few main subtypes within the category of non-relational or (NoSQL) databases, two of which include:
- Document-oriented databases Named string fields are associated with object data values in a data structure known as a “document” that can be encoded with a technology like JSON, XML, YAML, or BSON. Documents to not need to maintain identical structures, which delivers very high flexibility.
- Key-value stores: A very simple form of NoSQL database which, as the name suggests, uses a structure in which full documents are stored as values corresponding to a unique key which is usually a hashed string. In this type of system, documents can only be accessed via their unique key, which allows for very fast lookups.
Querying a non-relational database
In the world of relational databases, SQL––whether in its purest form or some type of specialized dialect––is used to query practically every RDBMS in existence. There is no equivalent mainstay when it comes to non-relational databases, however, and the way that querying works largely depends on the provider. For instance, many vendors will provide developers with a REST API and libraries to perform read and write operations on the data. Other vendors take this a step further by delivering fully fleshed-out query languages, a shell environment in which to run those queries, and specifications to integrate querying operations into common backend and server-side languages like Node and Python.
MongoDB, for instance, is an example of a database vendor that has developed this type of query syntax. The query below demonstrates how to retrieve an object from a collection using that object’s unique ID:
When should you use a non-relational database?
Non-relational databases are generally considered to be more flexible than relational databases, since they are capable of containing a wide variety of data types including large documents. Owing to their flexibility, they are well suited to use cases in which you have a large quantity of data relating to a single topic that you only expect to grow in size and complexity. These may include:
- Unified customer profiles
- Audience segments
- Application databases
- Large collections of text, images and other data
- Industry-wide trend data
In analytics use cases, using a non-relational database can be beneficial if uncovering patterns and meaning in disparate types of information is your ultimate goal. NoSQL databases support more flexible expansion than relational databases, since new data does not have to conform to the data types of preexisting information. This benefit is also why querying non-relational databases often comes with a steeper learning curve and often requires a deeper investment of technical resources than relational databases, however. Having a very large internal semi- or non-structured NoSQL database on hand will likely require an internal team of data scientists and to manage, query, and analyze.
Outside of analytics, non-relational databases are also a popular choice to support application development, since the types of data that modern mobile and web applications collect is only becoming more diverse. In a large-scale consumer-facing application with thousands or even millions of users, for instance, a tabular data structure may likely inhibit the features and functionality that developers are able to create, whereas a flexible NoSQL database would likely pose less of a roadblock to innovation.
One area in which non-relational databases excel is storing data sets that are most meaningful in their totality, such as user profiles or customer segments. mParticle User Profiles, which contain unified user identities, attributes, audience memberships and other data collected across platforms and can be queried via the Profile API, are stored in a NoSQL format on Aerospike, for instance. Additionally, the mParticle Audience Manager, which allows you to define audiences and connect them to integrations for the purpose of engaging with your users, leverages ScyllaDB.