Why you should build your own NoSQL database

If you code for the web, you probably work with databases , which leads you to integrate, setup and most importantly understand how a database works.

Let us say at some point your project starts to draw attention, many users connecting and consuming data at the same time. You will have to scale!

One of the secrets behind a scalable architecture for a web application relies on knowing which database to use as well when and how to use them. For such, you must understand them.

In order to acquire such knowledge of the challenges and paradigms behind a project, it is interesting to look at the code and make sense of it.

Most of the times this process of diving into the code can be a demystifying adventure . The experience can provide you a clear picture of its functionalities.

Therefore, always remember:

I hear and I forget. I see and I remember. I do and I understand.

By creating your own, even if it is just for fun, you can understand the concepts and paradigms behind a project like so.

Found interesting? Keep scrolling !

What is a NoSQL database?

Before anything else, let us refresh our memory with some concepts

Relationalor SQL databases emerged in the 80’s and they are still largely used .

By establishing a common query language , providing persistence, reporting, and support to transactions, relational databases grounded their success and became a reliable base for applications.

It is important to highlight that this success is strictly related to the applications requirements at the time.

H owever, since its conception, relational databases had several problems, and maybe the most important one was a hard time on mapping real-world entities to a structured form.

In the 90’s object-oriented databases unsuccessfully attempted to solve this mapping problem by storing entire objects. The failure is usually associated with the fact that at the time relational databases were used as an integration interface between different applications .

If you have ever worked with multiple applications connecting to a single database you probably know that the effort to change the integration database is almost the same as a complete rewrite of the applications.

L ater on, in the late 2000’s the exponential growth of the internet had a direct effect on the requirements for web applications , pointing design flaws on current architectures. That drove big players , companies like Google and Amazon, to come up with their own solutions towards solving scalability issues. As an example: Bigtable and Dynamo emerged at that time.

Those solutions had several common factors , they were: non-relational, cluster-friendly, schema-less, and most of them open-source.

T he foundation of what is known today by NoSQL relies upon those initiatives. But the term “NoSQL” appeared for the first time around 2009/2010, it was supposed to be more of a joke than anything else. The fact that it became such a buzzword to identify those modern kinds of databases was purely accidental because it is supposed to mean “not only SQL” and not “no-sql”.

Unlike the usual SQL databases , where there is support for a structured relational schema to store data, NoSQL databases have no relational support for storing data what so ever. Although you may emulate this behavior, but let us leave that for another time.

T he catch here is to understand that scalability for those modern databases relies upon distribution. It is common sense that relational databases were not initially designed to be distributed , they were not able to handle consistency on cluster level, they were supposed to be vertically scaled only . Meaning that if you want to increase your demand you would increase the server's processing power or add more memory.

Although that was not much of an option for companies like Google and Amazon, mostly, because a single database server could not handle Google’s billions of requests. The solution to scale those applications was based on changing the disposition of the servers, instead of increasing memory and processors , they started creating small and distributed clusters, what we know today as horizontally scaling.

Is currently known that horizontally scaling has several benefits , it makes possible to easily increase or decrease the amount of servers over demand, which has a huge impact on the cost of hosting such applications.

Imagine Netflix for instance, let us say that they have usually 100.000 people online simultaneously on a regular day, but when a new season is released that number can reach the millions .

Maybe they would be able to come up with a huge server and keep it running all the time, but how much would it cost to maintain such an infrastructure? By horizontally scaling, they are able to grow the number of servers on demand in order to handle this temporary flood, and then decrease again to normal bases after the massive demand. They actually do this several times during the day .

Since horizontal scaling relies upon several small units /servers, it is crucial to understand about some concepts of distributed systems, like the CAP theorem.

CAP

Eric Brewer , a well-known scientist, once said it is impossible for a distributed system to simultaneously support the following three guarantees:

Why you should build your own NoSQL database

Consistency― all servers see the same data at the same time

Availability― every request receives a response whether it succeeded or failed

Partition tolerance― the system continues to operate despite arbitrary partitioning due to network failures

U sually, people will say that you have to choose two of the aspects, in the real world is not a binary option because databases can focus on consistency over availability for some operations and the other way around for others.

Even so, it is possible to divide known databases by their focus on two of the three guarantees, calling them CA, CP or AP.

Note that this is not a binary rule, it is more of a way to understand the focus.

CA ― Consistency & Availability― mysql, PostgreSQL

Those databases will always ensure consistency, the data will always be reliable, as well as you can be sure to have a response for every request.

However, since they do not consider partition failures they will not perform well on clusters. It is also interesting to consider that they will choose consistency over response time, that can be something you are not willing to afford.

CP ― Consistency & Partition Tolerance― MongoDB, Redis, MemcacheDB, HBase

Such databases will ensure consistency and partition tolerance over availability. Meaning that it is important to keep the data consistent between all node, if for some reason a node is not available the system will not operate.

Again, the response time can be directly affected by the consistency check between all nodes.

AP ― Availability & Partition Tolerance― Cassandra, CouchDB, Voldemort, Riak

Finally, this kind of database focus on availability, they will always be operating and responding all the requests. To afford that they may ignore the consistency between the nodes, if a node is not reachable the other nodes will skip the consistency check and perform the operation.

Why you should build your own NoSQL database

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本