Graph DB + Data Virtualization = Live dashboard for fraud analysis

The scenario

Retail banking: Your graph-based fraud detection system powered by Neo4j is being used as part of the controls run when processing line of credit applications or when accounts are provisioned. It’s job is to block -or at least to flag- potentially fraudulent submissionsas they come into your systems. It’salso sendingalarms tofraud operations analysts whenever unusual patterns aredetected in the graph so they can be individually investigatedASAP.

Thisis all working great but you wantotheranalysts in your organisation tobenefit from the super rich insights that your graph database can deliver, people whose job is not to react on the spot to individual fraud threats but rather understand the bigger picture. They are probably more strategic business analysts, maybe some data scientists doing predictive analysis tooand they will typically want to look at fraud patterns globally rather than individually, combine the information in your fraud detection graph with other datasources (external to the graph) for reporting purposes, to get new insights, or even to ‘learn’ new patterns by running algorithms or applying ML techniques.

In this post I’lldescribe through an examplehow Data Virtualization can be used to integrateyour Neo4j graph with other data sources providing a single unified view easy to consume by standard analytical/BI tools.

Don’t get confused by the name, DV is about data integration , nothing to do with hardware or infrastructure virtualization.

The objective

I thought a good example for this scenario could bethe creation of an integrated dashboard on your fraud detection platform aggregating data from a couple of different sources.

Nine out of ten times integration will besynonym of ETL-ing your data into a centralised store or data warehouse and then running your analytics/BI from there. Fine. This is of course a validapproach but it also has its shortcomings, specially regarding agility, time to solution and cost of evolution just to name a few. And as I said in the intro, I wanted to explore an alternative approach, more modern and agile, calleddata virtualization or as it’s called these days, I’ll be building a logical data warehouse .

The “logical” in the name comes from the fact that data is not necessarily replicated (materialised) into a storebut rather “wrapped” logically at the source and exposed as a set of virtual views that are run on demand. This is what makes this federated approach essentially different from the ETL based one.

Graph DB + Data Virtualization = Live dashboard for fraud analysis

The architecture of my experiment is not too ambitiousbut rich enough to prove the point. It uses an off the shelf commercial data virtualization platform ( Data Virtuality ) abstracting and integrating two data sources (one relational, one graph) and offering a unified view to a BI tool.

Before I go into the details,a quick note of gratitude: When I decided to go ahead with this experiment, I reached out toData Virtuality, and they very kindly gaveme access to aVM with their data virtualization platform preinstalled and supportedme along the way. So here is a big thank you to them, especially to Niklas Schmidtmer, a top solutions engineer who has been super helpful and answered all my technical questions on DV.

The data sources Neo4j for fraud detection

In this post I’m focusing on the integration aspects so I will not go into the details of what a graph-based fraud detection solution built on Neo4j looks like. I’ll just say that Neo4j is capable of keeping a real time view of your account holders’ information and detect potentially fraudulent patterns as they appear . By “real time” here, I mean as accounts are provisioned or updated in your system, or as transactions arrive, or in other words, as suspicious patterns are formed in your graph.

In our example, say we have a Cypher query returning the list of potential fraudsters. A potential fraudster in our example is an individual account holder involved in a suspicious ring pattern like the one in the Neo4j browser capture below. The query also returns some additional information derived from the graph like the size of the fraud ring and the financial risk associated with it. The list of fraudsters returned by this query will be driving my dashboard but wewill want to enrich them first with some additional information from the CRM.

For a detailed description of what first party bank fraud is and how graph databases can fight it read this post .

RDBMS backedCRMsystem

The second data sourceis any CRM system backed by a relational database. You can put here the name of your preferredone or whichever in-house built solution your organisation is currently using.

The data in a CRMis less frequently updated and contains additional information about our account holders.

Data Virtualization

As I saidbefore, data virtualization is a modern approach to data integration based on the idea of data on demand. A data virtualization platformwraps different types of data sources: relational, NoSQL, APIs, etc… and makes them all look like relational views. Theseviewscan then be combined through standard relational algebra operations to produce rich derived (integrated) viewsthat willultimately be consumed by all sorts of BI, analytics and reporting tools or environments as if they came from a single relational database.

The process of creating a virtual integrated view of a number of data sources can be broken down in three parts. 1) Connecting to the sources and virtualizing the relevant elements in them to create base views, 2) Combining the base views to create richer derived onesand 3) publishing them for consumption by analytical and BI applications. Let’s describe each step in a bit more detail.

Connecting to the sources from the data virtualization layer and creating base views

The easiestway to interact withyour Neo4j instance from a data virtualization platform is through the JDBC driver.The connection string andauthentication details is pretty much all that’s needed as we can see in the following screen capture.

Once the data source is created, we can easily define a virtual view on it based on our Cypher query with the standardCREATE VIEW… expression in SQL. Notice the usage of the ARRAYTABLE function totake thearray structure returned by the requestand produce a tabular output.

Once our fraudsters view is created, it can be queried just as if it was a relational one. The data virtualization layer will take care of the “translation” because obviously Neo4j actually talksCypher and not SQL.

If for whatever reason you want to hit directly Neo4j’sHTTP REST API, you can do that by creati

Graph DB + Data Virtualization = Live dashboard for fraud analysis

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本