
When building a web app (or any app) that needs to store data, one of the biggest decisions to make is what database to use. In the world of start-ups, where MVP’s and agile development reigns, NoSQL databases have grown in increasing popularity due largely to their flexible nature and ease of use. Among them MongoDB stands out by far as the most popular . Mongoose , a fantastic ODM for MongoDB in Nodejs, has also seen a rise in popularity, having almost doubled in number of npm downloads from 2015 to 2016.
For web apps using Nodejs/javascript, MongoDB is particularly nice since data is stored using JSON objects, making reading and writing data fluid and natural. However, despite all these great advantages, MongoDB still lacks one of the most useful features found in relational databases, namely…relationships.

Not how itworks…
For almost any app that stores data, its natural for different data entities to have relationships with each other. User’s have roles, shopping carts have items, books have categories…you get the idea. These relationships generally take one of three different forms: one-to-one, one-to-many, and many-many. While MongoDB is not a relational database, there are actually two recommended approaches to representing relationships between entities. Lets take a look at each and see how they pan out.
Note: if you feel like you are already familiar with this topic, or just want a solution, feel free to scroll down to the grumpy cat…
Method 1: Embedding Documents
The most natural method for relating data (represented as documents ) in MongoDB is to embed them in one another. The advantage to this approach is that all the related data can be retrieved with just one query. This is very efficient and seems good at first. Sometimes this can be an appropriate approach, but relying on embedded relationships can come at a high cost down the road. Consider an example with users and roles, where users can belong to one role and roles can have many users. In this case we can either choose to embed user documents inside role documents, or vice versa. If we decide that users are the most important entity and that roles should be embedded within them, then we could end up with data that looks like this:
Users: [ { _id: 1234, email: 'bob@admin.com', role: { name: 'Admin', description: 'A user with awesome powers.' } }, { _id: 4321, email: 'bill@admin.com', role: { name: 'Admin', description: 'A user with awesome powers.' } } ]The problem with his approach arises when we want to change the details of a role. If we update say, the description of the role for the first user, the role description for the second user stays the same. This is a problem if we want to stay consistent, which we almost always do. It may not seem like a big issue with only two users, but what about when there are hundreds of users? Finding and updating every embedded role does not sound fun.
Realizing this, we may choose to embed users inside roles instead. This could result in data like so:
Roles: [ { _id: 5678, name: 'Admin', description: 'A user with awesome powers.', users: [ { email: 'bob@admin.com' }, { email: 'bill@admin.com' } ] } ]Problem solved, right? Sure, if no other entities in our database have relationships with users…which is unlikely. If we have another entity, “teams”, that can relate to multiple users, then we’re back at square one. Updating a user’s email would have to be reflected in any role or team that the users existed in. As the number of entities and relationships grow, the consistency issues grow even faster. This usually results in code that is a nightmare of confusion. The efficiency that we had in the beginning is gone as well. So maybe there’s a better way? Lets check out the next approach.
Method 2: Document References
Rather than embedding documents inside one another, we can give each document an id and store related document ids rather than the whole document. Lets look at the data from the previous example represented this way:
Users: [ { _id: 1234, email: 'bob@admin.com', role: 5678 }, { _id: 4321, email: 'bill@admin.com', role: 5678 } ] Roles: [ { _id: 5678, name: 'Admin', description: 'A user with awesome powers.', users: [ 1234, 4321 ] } ]Now whenever a document is updated, no other actions are needed because the reference ids don’t change. Consistency problem solved! However there is a tradeoff. In Method 1 only one query was needed to grab all the data related to a user (or a role). Now in order to get the data for a role AND its associated users, we must first perform a query for the role and then perform another query to look up the users based on the ids. This may not seem that bad, but it can quickly get out of hand as well. Lets bring in the teams entity and look at our data again:
Users: [ { _id: 1234, email: 'bob@admin.com', role: 5678 }, { _id: 4321, email: 'bill@admin.com', role: 5678 } ] Roles: [ { _id: 5678, name: 'Admin', description: 'A user with awesome powers.', users: [ 1234, 4321 ] } ] Teams: [ { _id: 1357, name: 'Managers', description: 'They manage things.', users: [ 1234 ] }, { _id: 9753, name: 'Editors', description: 'They edit things.', users: [ 4321 ] } ]Now if we want to retrieve all the data contained in a role, we have to do the same two queries as before, along with a third more complex query to populate the team data for each user. As a developer you are now left with having to write complex query handlers for EVERY situation in which related data needs to be retrieved.
This becomes even more difficult if you are trying to retrieve this data through a web api. The api would have to accept some sort of “embed” query parameter and each endpoint would need to support logic to populate nested data. On top of this more logic would be needed to support filtering queries based on referenced objects. This issue usually results in many overly-specific endpoints with custom logic for different “types” of queries tuned to individual entity structures. This approach can also quickly lead to hard to scale, hard to maintai