How we’re building a system to scale for billions of requests per day

Background: Snapdeal Acquired Reduce Data A Silicon Valley Company in September 2015. Asif Ali, currently Associate Vice President @ Snapdeal and formerly the founder of Reduce Data shares how they built a system to scale through this blog. This blog was previously published on Medium and Snapdeal uses this platform as the basis forSnapdeal Ads.

****

Building a web-scale system is hard, especially when you have a really small team (<10 in engineering). But building good software is more about using the right technology for each use case than having hundreds of people in your team.

To showcase what we’ve done, I am sharing our experience in building our ad serving stack (Snapdeal Ads)and how we scaled it from almost nothing to peak of 5 B requests a day.

Key decisions We designed for (large) scale. We knew that an ad serving platform won’t have scaling pains many years from now. The traffic growth could be phenomenal right from the start. So, we architected the system to scale out from the get go. The system was designed to scale both horizontally and vertically. We chose Availability and Partition Tolerance (AP) over Consistency and Availability (CA) because our primary need was a low latency, high performance ad auction and serving platform. Data consistency wasn’t much of an issue (for example ads could start serving a few mins late and no one would care). Learn more about Brewer’s theorem (also known as CAP Theoram) here. No vendor lock-in / Limited use of proprietary tech: Open source software has reached an unquestionable maturity in a vast variety of use cases and in order to keep costs low ― we decided to have no vendor lock in with proprietary software. We built a system around Mechanical Sympathy : The software was built with understanding of how the hardware works and should be able to best leverage it. Limited use of Cloud technology: We decided early on against limited use of cloud tech because a) EC2 and counter parts tend to be very expensive when compared to barebones counter parts in ad serving use cases and b) Network jitter, disk virtualization etc showed increase in latencies on EC2 in our early tests. Latency exists, cope with it and try to eliminate it. All lookups should happen in under 1ms. We used RocksDB and a variety of other solutions as primary caches / embedded databases. We used SSD’s where possible, again to reduce latencies. We did not virtualize hardware and took advantage of large specs (256GB RAM, 24 core machines) to parallelize a lot of the computation. Disk writes, if any, were timed and flushed every N seconds with chunks of data. Nginx was tuned to support keep-alive connections and Netty was optimized to support large concurrent load. Key data was always available instantly (in microseconds), to the ad server. All of this data was stored in libraries / data structures in-memory. The architecture should be shared, nothing . Atleast the ad servers which interfaced with the external bidders should be and they should be extremely resilient. We should be able to unplug ad servers and the system should not even blink. All key data, results needs to be replicated. Keep a copy of raw logs for a few days. It was okay, if the data was a bit stale and the system inconsistent. Messaging systems must be fault tolerant. They can crash but not loose data. Current infrastructure 40 50 nodes across 3 data centres 30 of them high compute (128 256G RAM, 24 cores, top of the line CPUS and where possible SSDs) Rest of them, much smaller 32G RAM, Quadcore machines. 10G private network + 10G public network Small Cassandra, Hbase and Spark Clusters. Our key requirements were The system should be able to support one or more bidders which send RTB 2.0 requests over HTTP (REST Endpoints) The system should be able to participate in an auction responding with a yes or a no, price and an ad for a yes during the auction. The system should be able to process billions of events each day peaking at several hundred thousand QPS, in order to choose a small subset of users from a large set of users who will be sent to your platform. The larger the pool of users you can have visibility into, the better it is for the advertisers. Data should be processed as soon as possible, at least for key metrics. Key technologies used were: HBase and Cassandra for counter aggregation and traditional datasets for managing users, accounts etc. Hbase was chosen for a high write performance and its ability to handle counters fairly well which work well for use cases of near real-time analytics. The primary language for the backend was Java . Although I’ve experimented with C++ and Erlang in the past, Java takes the cake as far as availability of skills go and JVM has matured into its own over the last few years. Google Protobuf for data transfers. Netty was chosen as the primary backend server, thanks to its simplicity and high performance characteristics. RocksDB was chosen for writes of user profiles as well as reads during ad serving. It is the embedded database within each bidder. User profiles were synced acorss RocksDB using Apache Kafka. Kafka was used as the primary messaging queue to stream data for processing. CQEngine was used as the primary in-memory, fast querying system while certain data was stored using atomic objects. Nginx was the primary reverse proxy. Much has been said about it, so we’ll leave it at that. Apache Spark was used for quick data processing for ML processing. Jenkins for CI. Nagios and Newrelic for monitoring servers. Zookeeper for distributed synchronization. Dozens of third parties for audience segments, etc. Bittorrent Sync was used to sync key data across nodes and data centres. Custom built quota manger based on Yahoo white paper for budget control. See presentation below for more details. Distributed Quota Management in An Ad Serving Environmen

How we’re building a system to scale for billions of requests per day

Trending Articles

雲林縣斗六市科 - 新東京夢公園

音频播放怎么设置音量

盜伐七里香42棵市價逾5千萬

[转载]煞貢、直星、人專吉日\金神七煞歌

亮亮视野推出消费级AR眼镜Leion Hey2

台湾萌妹COSer Misa米砂写真集赠送活动获奖名单揭晓

Adobe Photoshop 2025 (v26.5) Multilingual TestNoPopup

Delphi 12.2.5 绿色版

[正版購買] YT Saver 10.3.0 中文版 - 網路影片下載兼轉檔軟體支援私人影片下載

Navisworks 真实模型渲染

[字体]古风字体合集[百度云下载][1.68GB]

參賽即是一種榮耀與肯定

出售: Technics卡式座,收音頭(零件機)

盧金箴命理師大栽問(9)－黑道卜卦你敢講壞事嗎？

[冷番补完字幕组][永远的大和号 REBEL3199 第三章群青的星形线][ヤマトよ永遠に REBEL3199 第三章...

清水國小102年「日行一善活動」

香港虽变色大陆游客赴港仍可购到禁书

JavaScriptJavaBridge::CallInfo isn't valid!

[115网盘] 【新人超级福利】豆瓣电影TOP250

清心福全跨界音樂與永續打造手搖新里程