Git’s man-pages state that it’s a stupid content tracker . It’s probably the most used version control system in the world. Which is very strange, since it doesn’t describe itself as being a source control system. And in fact, you can use git to track any type of content. You can create a Git NoSQL database for example.
The reason why it says stupid in the man-pages is that it makes no assumptions about what content you store in it. The underlying git model is rather basic. In this post I want to explore the possibilities of using git as a NoSQL database (a key-value store). You could use the file system as a data store and then use git add and git commit to save your files:
# saving a documentecho '{"id": 1, "name": "kenneth"}' > 1.json
git add 1.json
git commit -m "added a file"
# reading a document
git show master:1.json
=> {"id": 1, "name": "kenneth"}
That works, but you’re now using the file system as a database: paths are the keys, values are whatever you store in them. There are a few disadvantages:
We need to write all our data to disk before we can save them into git We’re saving data multiple times File storage is not deduplicated and we lose the benefit git provides us for automatic data deduplication If we want to work on multiple branches at the same time, we need multiple checked out directoriesWhat we want rather is a bare repository, one where none of the files exist in the file system, but only in the git database. Let’s have a look at git’s data model and the plumbing commands to make this work.
Git as a NoSQL databaseGit is a content-addressable file system . This means that it’s a simple key-value store. Whenever you insert content into it, it will give you back a key to retrieve that content later.
Let’s create some content:
#Initialize a repositorymkdir MyRepo
cd MyRepo
git init
# Save some content
echo {"id": 1, "name": "kenneth"} | git hash-object -w --stdin
da95f8264a0ffe3df10e94eed6371ea83aee9a4d
Hash-object is a git plumbing command which takes content, stores is it in the database and returns the key
The w switch tells it to store the content, otherwise it would just calculate the hash. the -stdin switch tells git to read the content from the input, instead of from a file.
The key it returns is a sha-1 based on the content. If you run the above commands on your machine, you’ll see it returns the exact same sha-1. Now that we have some content in the database, we can read it back:
git cat-file -p da95f8264a0ffe3df10e94eed6371ea83aee9a4d{"id": 1, "name": "kenneth"} Git Blobs
We now have a key-value store with one object, a blob:

There’s only one problem: we can’t update this, because if we update the content, the key will change. That would mean that for every version of our file, we’d have to remember a different key. What we want instead, is to specify our own key which we can use to track the versions.
Git TreesTrees solve two problems:
the need to remember the hashes of our objects and its version the possibility to storing groups of files.The best way to think about a tree is like a folder in the file system. To create a tree you have to follow two steps:
# Create and populate a staging areagit update-index --add --cacheinfo 100644 da95f8264a0ffe3df10e94eed6371ea83aee9a4d 1.json
# write the tree
git write-tree
d6916d3e27baa9ef2742c2ba09696f22e41011a1
This also gives you back a sha. Now we can read back that tree:
git cat-file -p d6916d3e27baa9ef2742c2ba09696f22e41011a1100644 blob da95f8264a0ffe3df10e94eed6371ea83aee9a4d 1.json
At this point our object database looks as follows:

To modify the file, we follow the same steps:
# Add a blobecho {"id": 1, "name": "kenneth truyers"} | git hash-object -w --stdin
42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda
# Create and populate a staging area
git update-index --add --cacheinfo 100644 42d0d209ecf70a96666f5a4c8ed97f3fd2b75dda 1.json
# Write the tree
git write-tree
2c59068b29c38db26eda42def74b7142de392212
That leaves us with the following situation:

We now have two trees that represent the different states of our files. That doesn’t help much, since we still need to remember the sha-1 values of the trees to get to our content.
Git CommitsOne level up, we get to commits. A commit holds 5 pieces of key information:
Author of the commit Date it was created Why it was created (message) A single tree object it points to One or more previous commits (for now we’ll only consider commits with only a single parent, commits with multiple parents are merge commits ).Let’s commit the above trees:
# Commit the first tree (without a parent)echo "commit 1st version" | git commit-tree d6916d3
05c1cec5685bbb84e806886dba0de5e2f120ab2a
# Commit the second tree with the first commit as a parent
echo "Commit 2nd version" | git commit-tree 2c59068 -p 05c1cec5
9918e46dfc4241f0782265285970a7c16bf499e4
This leaves us with the following state:

Now we have built up a complete history of our file. You could open the repository with any git client and you’ll see how 1.json is being tracked correctly. To demonstrate that, this is the output of running git log :
git log --stat 9918e469918e46dfc4241f0782265285970a7c16bf499e4 "Commit 2nd version"
1.json | 1 +
1 file changed, 1 insertions(+)
05c1cec5685bbb84e806886dba0de5e2f120ab2a "Commit 1st version"
1.json | 1 +
1 file changed, 1 insertion(+)
And to get the content of the file at the last commit:
git show 9918e46:1.json{"id": 1, "name": "kenneth truyers"}
We’re still not there though, because we have to remember the hash of the last commit. Up until now, all objects we have created are part of git’s object database. One characteristic of that database is that it stores only immutable objects. Once you write a blob, a tree or a commit, you can never modify it without changing the key. You can also not delete them (at least not directly, the git gc command does delete objects that are dangling ).
Git ReferencesYet another level up, are Git references. References are not a part of the object database, they are part of the reference database and are mutable . There are different types of references such as branches, tags and remotes. They are similar in nature with a few minor differences. For the moment, let’s just consider branches. A branch is a pointer to a commit. To create a branch we can write the hash of the commit to the file system:
echo 05c1cec5685bbb84e806886dba0de5e2f120ab2a > .git/refs/heads/masterWe now have a branch master , pointing at our first commit. To move the branch, we issue the following command:
git update-ref refs/heads/master 9918e46This leaves us with the following graph: