Adventures in Performance Debugging

As we’ve built CockroachDB, correctness has been our primary concern. But as we’ve drawn closer to our beta launch, we’ve had to start paying significantly more attention to performance. The design of CockroachDB always kept performance and scalability in mind, but when you start measuring performance, there are inevitably surprises. This is the story of the detection, investigation, and fix of just one performance bug .

First, a little context about CockroachDB for those new to the project. CockroachDB is a distributed SQL database which employs RocksDB to store data locally at each node. A basic performance test is to write random data into a table and test how many operations per second can be achieved. This is exactly what the block_writer example program does. It uses a simple table schema:

CREATE TABLE IF NOT EXISTS blocks (
block_id BIGINT NOT NULL,
writer_id STRING NOT NULL,
block_num BIGINT NOT NULL,
raw_bytes BYTES NOT NULL,
PRIMARY KEY (block_id, writer_id, block_num)
)

CREATE TABLE IF NOT EXISTS blocks ( block_id BIGINT NOT NULL, writer_id STRING NOT NULL, block_num BIGINT NOT NULL, raw_bytes BYTES NOT NULL, PRIMARY KEY (block_id, writer_id, block_num) )

And then spawns a number of workers to insert data into the table:

INSERT INTO blocks (block_id, writer_id, block_num, raw_bytes)
VALUES ($1, $2, $3, $4)

INSERT INTO blocks (block_id, writer_id, block_num, raw_bytes) VALUES ($1, $2, $3, $4)

The block_id is randomly chosen and writer_id is uniquely assigned to each worker. The block_num field is monotonically increasing and ensures that there will never be duplicate rows inserted into the table. The effect is that we’re inserting random rows into the table and never experiencing contention. What could go wrong?

1. The Bug: Rapid performance deterioration

A few weeks ago my colleague Matt Tracy ran the block_writer and discovered rapid performance deterioration:

1s: 1944.7/sec
2s: 1067.3/sec
3s: 788.8/sec
4s: 632.8/sec
5s: 551.5/sec
…
1m0s: 105.2/sec

1s: 1944.7/sec 2s: 1067.3/sec 3s:788.8/sec 4s:632.8/sec 5s:551.5/sec … 1m0s:105.2/sec

Oh my, that isn’t good. Performance starts out at a reasonable 2000 ops/sec, but quickly falls by a factor of 20x. Matt suspected there was some scalability limitation with tables. He noted that once a table fell into this bad performance regime it stayed there. But if he dropped the blocks table and created a new one, performance reset only to degrade rapidly again.

Like any good engineer, Matt turned to cpu profiling to try and determine what was going on. Was there some loop with horrible algorithmic complexity based on the table size? Unfortunately, the profiles didn’t reveal any culprits. Most of the cpu time was being spent inside RocksDB, both during the good performance regime and the bad performance regime. The builtin Go profiling tools are quite good, but they are unable to cross the cgo boundary (RocksDB is written in C++).

2. Snowball tracing to the rescue

Matt was a bit stumped for how to proceed at this point. Conveniently, another engineer, Tobias Schottdorf, was experimenting with adding “snowball” tracing to SQL queries. Unlike sampling-based profilers which periodically stop a program and determine what code is running, a tracing system records implicit or explicit events associated with a specific request. The support Tobias was adding was a new EXPLAIN (TRACE) mode. After the addition of some more tracing events, here is what Matt saw:

I’ve edited the output for clarity and highlighted two lines which shed light on the problem. It should be clear that you can’t achieve 2000 ops/sec, or 1 op every 0.5 ms, if part of the time to perform an operation takes >7ms. It is interesting that this time is being consumed in writing the transaction record at the start of a transaction.

Matt continued to add more instrumentation until the problem was narrowed down to a single RocksDB operation. At this point Matt tagged me in since I’ve had the most experience with our usage of RocksDB. I came onto the field swinging and wrote a micro-benchmark that duplicated the behavior of the BeginTransaction and utterly failed to find any performance degradation. Hmm, curious. I decided to verify I could reproduce the block_writer performance degradation (trust, but verify) and, of course, the problem reproduced immediately. I also verified that checking to see if the transaction record was present at the start of a transaction was the time consuming operation.

3. RocksDB, CockroachDB, and MVCC Now to provide a bit of background on CockroachDB’s MVCC (multi-versio

Adventures in Performance Debugging

Trending Articles

拉花比賽曾對決情侶開咖啡館

泰语每日一词：เมา“醉”（Day 726）

各类游戏机终极档案PDF

家樂福便利購超市台南開元店開幕

天使心外送茶LINE/即時:beauty109109台南半套店,台南叫雞,台南按摩,台南茶訊 (no replies)

地方扫描－涉贿选员林镇代江世钟一审当选无效

NZXT CAM 3.0.3 中文版 - 電腦溫度監控軟體支援手機遠端監控

onActivityCallback的params.result返回值没有生效

令Gaussian 16中SCF未收敛到默认收敛限也能继续做后续计算的方法

[黑白字幕组]我的可爱对黑岩目高不管用 / Kuroiwa Medaka ni Watashi no Kawaii ga Tsuujinai [07]...

fabia combi 原廠音響加裝後車門音響喇叭的經驗分享

RADStudio v12.2.29.0.53982.0329 KeyPatch [含附件]

出售: PERREAUX 750 POWER AMP

人气声优井上麻里奈裸背写真集「Marilro」美图欣赏

[转载]贾平凹《废都》删节部分增补

詐騙猖獗網路名師也中鏢江兆君(小M老師)：學員勿上當！

琥珀金開箱

[公告] 無法登入水族箱和解決當機的方式

集法荷12大馆珍藏故宫特展探索大航海时代

告發片商強迫拍AV？ SOD「最強新人」竹内乃愛作品被刪光