One of the key tenants of MongoDB schema design is to account for the absence of server-side joins. Data is joined all the time inside of application code of course, but traditionally there’s been no way to perform joins within the server itself.
This changed in 3.2 with the introduction of the $lookup operator within the aggregation framework. $lookup performs the equivalent of a left outer join eg: it retrieves matching data from another document and returns null data if no match is found.
Here’s an example using the MongoDB version of the Sakila dataset that I converted from mysql back inthis post:
db.films.aggregate([ {$match:{"Actors.First name" : "CHRISTIAN", "Actors.Last name" : "GABLE"}}, {$lookup: { from: "customers", as: "customerData", localField: "_id", foreignField: "Rentals.filmId" }}, {$unwind: "$customerData"}, {$project:{"Title":1, "FirstName":"$customerData.First Name", "LastName" :"$customerData.Last Name"}}, ])What we’re doing here is finding all customers who have ever hired a film staring “Christian Gable”; We start by finding those films in the films collection (lines 2-3), then use $lookup to retrieve customer data (lines 4-9). Films embeds actors in the “Actors” array; the customers collection embeds films that have been hired in the "Rentals" array.
The result of the join contains all the customers who have borrowed the movie returned as an array, so we use the $unwind operator to “flatten” them out (line 10). The resulting output looks like this:
{ "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "SUSAN", "LastName" : "WILSON" }{ "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "REBECCA", "LastName" : "SCOTT" }
{ "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "DEBRA", "LastName" : "NELSON" }
{ "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "MARIE", "LastName" : "TURNER" }
{ "_id" : 1, "Title" : "ACADEMY DINOSAUR", "FirstName" : "TINA", "LastName" : "SIMMONS" }
One thing that we need to be careful here is with join performance. The $lookup function is going to be executed once for each document returned by our $match condition. There is - AFAIK - no equivalent of a hash or sort merge join operation possible here, so we need to make sure that we've used an index. Unfortunately, theexplain() command doesn’t help us. It tells us only if we have used an index to perform the initial $match , but doesn't show us if we used an index within the $lookup .
Here's the explain output from the operation above (TL;DR):
> db.films.explain().aggregate([ ... {$match:{"Actors.First name" : "CHRISTIAN", ... "Actors.Last name" : "GABLE"}}, ... {$lookup: { ... from: "customers", ... as: "customerData", ... localField: "_id", ... foreignField: "Rentals.filmId" ... }}, ... {$unwind: "$customerData"}, ... {$project:{"Title":1, ... "FirstName":"$customerData.First Name", ... "LastName" :"$customerData.Last Name"}}, ... ... ]) { "waitedMS" : NumberLong(0), "stages" : [ { "$cursor" : { "query" : { "Actors.First name" : "CHRISTIAN", "Actors.Last name" : "GABLE" }, "fields" : { "Title" : 1, "customerData.First Name" : 1, "customerData.Last Name" : 1, "_id" : 1 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "sakila.films", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "Actors.First name" : { "$eq" : "CHRISTIAN" } }, { "Actors.Last name" : { "$eq" : "GABLE" } } ] }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "$and" : [ { "Actors.First name" : { "$eq" : "CHRISTIAN" } }, { "Actors.Last name" : { "$eq" : "GABLE" } } ] }, "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$lookup" : { "from" : "customers", "as" : "customerData", "localField" : "_id", "foreignField" : "Rentals.filmId", "unwinding" : { "preserveNullAndEmptyArrays" : false } } }, { "$project" : { "Title" : true, "FirstName" : "$customerData.First Name", "LastName" : "$customerData.Last Name" } } ], "ok" : 1 }However, we can see the queries created by the $lookup function if we enable profiling. For instance if we turn profiling on can see a full collection scan of customers has have been generated for every film document that has been joined:
These “nested” collection scans are bad news. Below is the results of abenchmark in which I joined two collections using $lookup with and