Inside DSE Graph: What Powers the Best Enterprise Graph Database

DSE Graph is a scalable, real-time graph database which was released at the end of Juneas a new addition to the DSE platform. After recovering from the turbulence of a major release, the time has come to peel back the curtain and look into the engine room: What are the major features and innovations that make DSE Graph an enterprise-grade graph database?

A graph database is a database system purpose-built for managing highly connected data. Unlike other database systems, including RDBMS and NoSQL, graph databases make it easy to model and query for relationships.

DSE Graph uses the property graph data model and Gremlin query language of the Apache TinkerPop project the open-source, vendor-neutral graph database standard governed by the Apache Software Foundation.

The property graph data model can express complex data models as-is without a logical mapping a characteristic that’s often described as “whiteboard friendly”. The Gremlin query language can succinctly express query paths and subgraph patterns without the need for cumbersome JOINs or custom application code, making it easy to retrieve entities connected via complex relationships from a big graph of data. Apache TinkerPop is a central ingredient and in many ways the primary interface to DSE Graph.

Implementing the property graph data model and supporting a graph query language is sufficient to expose a database system as a graph database, but like putting lipstick on a pig this often results in slow performance and unexpected system behavior. What makes a good graph database is a balanced combination of efficient property graph data representation, fast graph-centric index structures, and smart query optimization. DSE Graph achieves this combination in a distributed, scale-out environment with no single point of failure and continuous availability using the following technologies.

Index-Free Adjacency

DSE Graph stores graphs in their adjacency list representation . All properties and edges that touch a particular vertex are stored in a consecutive, sorted list on the node in the cluster to which the vertex is assigned. This representation allows us to navigate through the graph from vertex to vertex without having to call into an index structure. By contrast, storing edges in a large table which would be the normal approach for RDBMS or NoSQL stores requires an expensive, global index to locate vertex data.

Adjacency list sort order facilitates efficient retrieval of subsets of the adjacency list. As graphs grow in size, queries often only require small subsets of the entire vertex data. In those cases, we exploit the sort order to limit the data retrieval and speed up query processing.

A key innovation of DSE Graph is an efficient mapping of the adjacency list representation onto the tabular storage format of Cassandra. Its implementation required changes to Cassandra’s storage engine in 3.0 and changes throughout the entire DSE stack to propagate a graph-optimized data representation.

This innovation allows DSE Graph to stand on top of the powerful distributed database foundation provided by Apache Cassandra without having to sacrifice storage efficiency or query performance.

Furthermore, DSE Graph can plug directly into the enterprise features of the DSE platform : OpsCenter management, data encryption, authentication, secure communication, multi-instance support, and auditing.

Vertex-Centric Indexes
Inside DSE Graph: What Powers the Best Enterprise Graph Database

For large graphs, it is not unusual for a single vertex adjacency list to grow to thousands of edges. Iterating over all those edges can be very time consuming for certain access patterns.

For instance, suppose we want to retrieve a customer’s ten most recent messages. If that customer has written thousands of messages, finding those ten can take a significant amount of time and requires retrieving a lot of data.

Vertex-centric indexes are access-specific index structures built and maintained per vertex to speed up such queries. For the example above, we would install a vertex-centric index for `wroteMessage` edges by timestamp.

Unlike index structures in conventional database systems which scale logarithmically with the size of the entire dataset, vertex-centric indexes are maintained per vertex and hence the cost of maintenance is logarithmic in size of the adjacency list per individual vertex. In other words, maintaining and querying vertex-centric indexes remains inexpensive even as the overall graph grows huge. For that reason, vertex-centric indexes are essential for maintaining fast traversal query performance on very large graphs.

Vertex Partitioning
Inside DSE Graph: What Powers the Best Enterprise Graph Database

A vertex and its adjacency list is assigned to and stored on a single machine in the database cluster as its primary replica. This assignment determines data locality and DSE Graph aims to place vertices such that frequently co-traversed vertices end up on the same machine which improves traversal performance.

DSE Graph’s partitioning techniques will be covered in future posts.

Edge Partitioning

Most natural graphs have a scale-free degree distribution which means that some vertices are highly connected and have very large adjacency lists. Storing those vertices and their adjacency list on a single machine would create hotspots and may even be infeasible for huge graphs.

DSE Graph supports edge partitioning by which fragments of the adjacency list are partitioned across all machines in the cluster using a performance enhancing technique that supports co-processing with locally stored vertices without intra-cluster communication.

Query Optimizer
Inside DSE Graph: What Powers the Best Enterprise Graph Database

In addition to the index structures and partitioning techniques outlined above, DSE Graph also supports materialized view indexes in Cassandra, secondary indexes and the full indexing power of Solr via tight integration through

Inside DSE Graph: What Powers the Best Enterprise Graph Database

Trending Articles

【百度】Topaz Video AI Pro v7.0.0（Team V.R）

漫谈赵婷、李安、泰伦斯·马利克和摄影机的“上帝位置”

[Zero-Raws] Panty & Stocking with Garterbelt (BD 1920x1080 x264 FLAC)

詐騙猖獗網路名師也中鏢江兆君(小M老師)：學員勿上當！

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

jetBrains Product crack 2024 Java based

SM3268AB 8CE三星量产无法格式化

【英文字幕/OVA/冷门动画】装鬼兵系列美版两部全

出售: Splendour 'Scotti' 1/2 小提琴

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

【學界欖球】加拿大國際首奪全港賽冠軍基信相隔一年再封后

地方扫描－贩毒团火力强大遭破获

浪Live首位勇奪金曲男歌手Eason亦宸發行單曲「如果我不曾愛過」

致喬立建設道歉聲明

唐澤壽明渡部篤郎首次合作主演日劇《滅癌陷阱》

okhttp3: cocos creator 2.2.2 接入友盟+Mopub 编译APK报错

雷電模擬器 9.1.24.2 中文版 - 電腦玩手遊的必備模擬器

出售: PMC TWENTY5.22 連dynaudio stand 4 腳架

[桜都字幕組] 夜遊生活！ / Yoasobi Gurashi! [05][1080p][繁體內嵌]