Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Fastest way to un-cap a MongoDB capped collection

$
0
0

A couple of weeks agoI converted a mongodb collection to a capped collection. This is for an event archive which we only need to keep the last monthof events stored locally. The issue is that this data grows unbounded for a long time until we manually free up disk space. Enter capped collections ,which create a fixed-size collectionby automatically removing the oldest documents. Unfortunately, I didn’t realize that our application would also update existing documents. Updates which cause a document to grow will fail . Bummer.

Now we need to rollback to uncapped collections and find another way to manage the database size (ahem, cron job). The recommended approachusing “copyTo(…)” is deprecated in newer versions and agonizingly slow for a large 100GB+ data set in older versions. (This database was still running 2.4. I know.)

For a basic benchmark, using “copyTo(…)”took about 10 hours to copy ~80% of a100GB capped collection. But the catch is that there’s no progress indicator at all. The database is locked during the copy so you can’t even look at the size of the new collection, and there’s nothing useful printed in the logs. I only know it completed that much because I halted the copy and then looked. Probably should’ve let it finish, but I didn’t know if it was making any progress, much less that close.

Following the advice of a kind strangerin IRC (#mongodb), I decided to try mongodump and mongorestore .

The dumpwas fast and showed progress the whole time. (Took 22 minutes total to dump.)

PROD root@myhost:/data/uncap # mongodump -d mydb -c mycoll
connected to: 127.0.0.1:27017
Sun Feb 5 00:02:28.879 DATABASE: mydb to dump/mydb
Sun Feb 5 00:02:28.880 mydb.mycoll to dump/mydb/mycoll.bson
Sun Feb 5 00:02:31.004 Collection File Writing Progress: 868400/67569879 1% (objects)
...
Sun Feb 5 00:24:13.004 Collection File Writing Progress: 67480900/67569879 99% (objects)
Sun Feb 5 00:24:14.203 67569879 objects
Sun Feb 5 00:24:14.203 Metadata for mydb.mycoll to dump/mydb/mycoll.metadata.json

But then the restorecreated a new capped collection. Dratz!

Fortunately, the dump includes a metadata file in JSON format.

cat dump/mydb/mycoll.metadata.json
{
"options": {
"capped": true,
"size": 107374182400
},
"indexes": [
{
"v": 1,
"key": {
"_id": 1
},
"ns": "mydb.mycoll",
"name": "_id_"
}
]
}

Go ahead and remove that “options” section which specifies the capped collection size. Now restore.

PROD root@myhost:/data/uncap # mongorestore -d mydb -c mycoll_tmp dump/mydb/mycoll.bson
connected to: 127.0.0.1:27018
Sun Feb 5 00:29:41.473 dump/mydb/mycoll.bson
Sun Feb 5 00:29:41.473 going into namespace [mydb.mycoll_tmp]
Sun Feb 5 00:29:44.060 Progress: 51202216/106169934834 0% (bytes)
Sun Feb 5 00:29:47.007 Progress: 106497873/106169934834 0% (bytes)
Sun Feb 5 01:57:19.065 Progress: 106159626025/106169934834 99% (bytes)
67569879 objects found
Sun Feb 5 01:57:19.637 Creating index: { key: { _id: 1 }, ns: "mydb.mycoll_tmp", name: "_id_" }

So it automatically creates the indices for us. (If you have a lot of indices, this takes a long time.)

Check it out now.

rs-prod:PRIMARY> db.mycoll.isCapped()
true
rs-prod:PRIMARY> db.mycoll_tmp.isCapped()
false
rs-prod:PRIMARY> db.mycoll.count()
9876543210
rs-prod:PRIMARY> db.mycoll_tmp.count()
9876543210

Perfect! Now we can just drop the old collection and rename the new collection.

db.mycoll.drop()
db.mycoll_tmp.renameCollection('mycoll')
db.mycoll.count()
9876543210

Lessons Learned:

Read the fine print on capped collections before deciding they’re perfect. Things that use eval internally (ahem, copyTo) can bepainfully slow. Sometimes doing adump, manuallytweaking a config, and then restoring isfastest.

Viewing all articles
Browse latest Browse all 6262

Trending Articles