MongoDB Performance: Running MongoDB Map-Reduce Operations On Secondaries

Map-reduce is perhaps the most versatile of the aggregation operations that MongoDB supports.

Map-Reduce is a popular programming model that originated at Google for processing and aggregating large volumes of data in parallel. A detailed discussion on Map-Reduce is out of the scope of this article but essentially it is a multi-step aggregation process . The most important two steps are the map stage (process each document and emit results) and the reduce stage (collates results emitted during the map stage).

MongoDB supports three kinds of aggregation operations: Map-Reduce , aggregation pipeline and single purpose aggregation commands . You can use this MongoDB comparison document to see which fits your needs.

In my last post, we saw, with examples, how to run Aggregation pipelines on secondaries. In this post, we will walk through running Map-Reduce jobs on the MongoDB secondary replicas.

MongoDB Map-Reduce

MongoDB supports running Map-Reduce jobs on the database servers. This offers the flexibility to write complex aggregation tasks that aren’t as easily done via aggregation pipelines. MongoDB lets you write custom map and reduce functions in javascript that can be passed to the database via Mongo shell or any other client. On large and constantly growing data sets, one can even consider running incremental Map-Reduce jobs to avoid processing older data every time.

Historically, the map and the reduce methods used to be executed in a single-threaded context. However, that limitation was removed in version 2.4.

Why run Map-Reduce jobs on the Secondary?

Like other aggregation jobs, Map-Reduce too is a resource intensive ‘batch’ job so it is a good fit for running on read-onlyreplicas. The caveats in doing so are:

1) It should be ok to use slightly stale data. Or you can tweak the write concern to ensure replicas are always in sync with the primary. This second option assumes that taking a hit on the write performance is acceptable.

2) T he output of the Map-Reduce job shouldn’t be written to another collection within the database but rather be returned to the application (i.e. no writes to the database).

Let’s look at how to do this via examples, both from the mongo shell and the Java driver.

Map-Reduce on Replica Sets Data Set

For illustration, we will use a rather simple data set: A daily transaction record dump from a retailer. A sample entry looks like:

RS-replica-0:PRIMARY> use test
switched to db test
RS-replica-0:PRIMARY> show tables
txns
RS-replica-0:PRIMARY> db.txns.findOne()
{
"_id" : ObjectId("584a3b71cdc1cb061957289b"),
"custid" : "cust_66",
"txnval" : 100,
"items" : [{"sku": sku1", "qty": 1, "pr": 100}, ...],
...
}

In our examples, we will calculate the total expenditure of a given customer on that day. Thus, given our schema, the map and reduce methods will look like:

var mapFunction = function() { emit(this.custid, this.txnval); } // Emit the custid and txn value from each record
var reduceFunction = function(key, values) { return Array.sum(values); } // Sum all the txn values for a given custid

With our schema established, let’s look at Map-Reduce in action.

MongoDB Shell

In order to ensure that a Map-Reduce job is executed on the secondary, the read preference should be set to secondary . Like we said above, in order for a Map-Reduce to run on a secondary, the output of the result must be inline (In fact, that’s is the only out value allowed on secondaries). Let’s see how it works.

$ mongo -u admin -p pwd --authenticationDatabase admin --host RS-replica-0/server-1.servers.example.com:27017,server-2.servers.example.com:27017
MongoDB shell version: 3.2.10
connecting to: RS-replica-0/server-1.servers.example.com:27017,server-2.servers.example.com:27017/test
2016-12-09T08:15:19.347+0000 I NETWORK [thread1] Starting new replica set monitor for server-1.servers.example.com:27017,server-2.servers.example.com:27017
2016-12-09T08:15:19.349+0000 I NETWORK [ReplicaSetMonitorWatcher] starting
RS-replica-0:PRIMARY> db.setSlaveOk()
RS-replica-0:PRIMARY> db.getMongo().setReadPref('secondary')
RS-replica-0:PRIMARY> db.getMongo().getReadPrefMode()
secondary
RS-replica-0:PRIMARY> var mapFunc = function() { emit(this.custid, this.txnval); }
RS-replica-0:PRIMARY> var reduceFunc = function(key, values) { return Array.sum(values); }
RS-replica-0:PRIMARY> db.txns.mapReduce(mapFunc, reduceFunc, {out: { inline: 1 }})
{
"results" : [
{
"_id" : "cust_0",
"value" : 72734
},
{
"_id" : "cust_1",
"value" : 67737
},
...
]
"timeMillis" : 215,
"counts" : {
"input" : 10000,
"emit" : 10000,
"reduce" : 909,
"output" : 101
},
"ok" : 1
}

A peek at the logs on the secondary confirms that the job indeed ran on the secondary.

...
2016-12-09T08:17:24.842+0000 D COMMAND [conn344] mr ns: test.txns
2016-12-09T08:17:24.843+0000 I COMMAND [conn344] command test.$cmd command: listCollections { listCollections: 1, filter: { name: "txns" }, cursor: {} } keyUpdates:0 writeConflicts:0 numYields:0 reslen:150 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 1, R: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 0ms
2016-12-09T08:17:24.865+0000 I COMMAND [conn344] query test.system.js planSummary: EOF ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:0 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 2, R: 1 } }, Collec

MongoDB Performance: Running MongoDB Map-Reduce Operations On Secondaries

Trending Articles

出售:uesugi 上杉升壓牛

漫谈赵婷、李安、泰伦斯·马利克和摄影机的“上帝位置”

宝可梦无限融合6.4.6最新汉化版+福利版，PC端+安卓端

VLC 3.0.16 免安裝中文版 - 老牌影片播放自由軟體

日本童顏巨乳女星比一比誰才是真的小學生？

名詞解釋：直接員工(DL)與間接員工(IDL)的差異，對工時的影響

用2D的X-Ray實例演練檢查BGA空焊問題

Dahon自救會之SP8火線救援

[转载]贾平凹《废都》删节部分增补

討稅女王線上看第8集大結局

[桜都字幕组&7³ACG] 憧憬成为魔法少女/魔法少女にあこがれて/Mahou Shoujo ni Akogarete S01 | 01-13...

动画「Visual Prison」BD第三卷封面公开

关门一家亲：习远平、张澜澜、徐才厚

[分享] 真元各階段需求及增加屬性列表

Devart UniDAC v10.3.0 SOURCES Delphi / Lazarus [含附件]

台南危樓

出售: Samsung UA55F6400AJXZK 3D Smart TV “55吋”99%new

uini.upx2px已废弃了,uni.rpx2px在APP上不支持怎么处理?

【露營趣】中和 TNR-060 鋁合金休閒桌摺疊桌野餐桌露營桌折疊桌蛋捲桌一桌四椅板凳桌椅組

郑州公安传唤多名维权人士及家属