Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Graph DB + Data Virtualization = Live dashboard for fraud analysis

$
0
0
The scenario

Retail banking: Your graph-based fraud detection system powered by Neo4j is being used as part of the controls run when processing line of credit applications or when accounts are provisioned. It’s job is to block -or at least to flag- potentially fraudulent submissionsas they come into your systems. It’salso sendingalarms tofraud operations analysts whenever unusual patterns aredetected in the graph so they can be individually investigatedASAP.

Thisis all working great but you wantotheranalysts in your organisation tobenefit from the super rich insights that your graph database can deliver, people whose job is not to react on the spot to individual fraud threats but rather understand the bigger picture. They are probably more strategic business analysts, maybe some data scientists doing predictive analysis tooand they will typically want to look at fraud patterns globally rather than individually, combine the information in your fraud detection graph with other datasources (external to the graph) for reporting purposes, to get new insights, or even to ‘learn’ new patterns by running algorithms or applying ML techniques.

In this post I’lldescribe through an examplehow Data Virtualization can be used to integrateyour Neo4j graph with other data sources providing a single unified view easy to consume by standard analytical/BI tools.

Don’t get confused by the name, DV is about data integration , nothing to do with hardware or infrastructure virtualization.

The objective

I thought a good example for this scenario could bethe creation of an integrated dashboard on your fraud detection platform aggregating data from a couple of different sources.

Nine out of ten times integration will besynonym of ETL-ing your data into a centralised store or data warehouse and then running your analytics/BI from there. Fine. This is of course a validapproach but it also has its shortcomings, specially regarding agility, time to solution and cost of evolution just to name a few. And as I said in the intro, I wanted to explore an alternative approach, more modern and agile, calleddata virtualization or as it’s called these days, I’ll be building a logical data warehouse .

The “logical” in the name comes from the fact that data is not necessarily replicated (materialised) into a storebut rather “wrapped” logically at the source and exposed as a set of virtual views that are run on demand. This is what makes this federated approach essentially different from the ETL based one.


Graph DB + Data Virtualization = Live dashboard for fraud analysis

The architecture of my experiment is not too ambitiousbut rich enough to prove the point. It uses an off the shelf commercial data virtualization platform ( Data Virtuality ) abstracting and integrating two data sources (one relational, one graph) and offering a unified view to a BI tool.

Before I go into the details,a quick note of gratitude: When I decided to go ahead with this experiment, I reached out toData Virtuality, and they very kindly gaveme access to aVM with their data virtualization platform preinstalled and supportedme along the way. So here is a big thank you to them, especially to Niklas Schmidtmer, a top solutions engineer who has been super helpful and answered all my technical questions on DV.

The data sources Neo4j for fraud detection

In this post I’m focusing on the integration aspects so I will not go into the details of what a graph-based fraud detection solution built on Neo4j looks like. I’ll just say that Neo4j is capable of keeping a real time view of your account holders’ information and detect potentially fraudulent patterns as they appear . By “real time” here, I mean as accounts are provisioned or updated in your system, or as transactions arrive, or in other words, as suspicious patterns are formed in your graph.

In our example, say we have a Cypher query returning the list of potential fraudsters. A potential fraudster in our example is an individual account holder involved in a suspicious ring pattern like the one in the Neo4j browser capture below. The query also returns some additional information derived from the graph like the size of the fraud ring and the financial risk associated with it. The list of fraudsters returned by this query will be driving my dashboard but wewill want to enrich them first with some additional information from the CRM.

For a detailed description of what first party bank fraud is and how graph databases can fight it read this post .


Graph DB + Data Virtualization = Live dashboard for fraud analysis
RDBMS backedCRMsystem

The second data sourceis any CRM system backed by a relational database. You can put here the name of your preferredone or whichever in-house built solution your organisation is currently using.

The data in a CRMis less frequently updated and contains additional information about our account holders.

Data Virtualization

As I saidbefore, data virtualization is a modern approach to data integration based on the idea of data on demand. A data virtualization platformwraps different types of data sources: relational, NoSQL, APIs, etc… and makes them all look like relational views. Theseviewscan then be combined through standard relational algebra operations to produce rich derived (integrated) viewsthat willultimately be consumed by all sorts of BI, analytics and reporting tools or environments as if they came from a single relational database.

The process of creating a virtual integrated view of a number of data sources can be broken down in three parts. 1) Connecting to the sources and virtualizing the relevant elements in them to create base views, 2) Combining the base views to create richer derived onesand 3) publishing them for consumption by analytical and BI applications. Let’s describe each step in a bit more detail.

Connecting to the sources from the data virtualization layer and creating base views

The easiestway to interact withyour Neo4j instance from a data virtualization platform is through the JDBC driver.The connection string andauthentication details is pretty much all that’s needed as we can see in the following screen capture.


Graph DB + Data Virtualization = Live dashboard for fraud analysis

Once the data source is created, we can easily define a virtual view on it based on our Cypher query with the standardCREATE VIEW… expression in SQL. Notice the usage of the ARRAYTABLE function totake thearray structure returned by the requestand produce a tabular output.


Graph DB + Data Virtualization = Live dashboard for fraud analysis

Once our fraudsters view is created, it can be queried just as if it was a relational one. The data virtualization layer will take care of the “translation” because obviously Neo4j actually talksCypher and not SQL.


Graph DB + Data Virtualization = Live dashboard for fraud analysis
If for whatever reason you want to hit directly Neo4j’sHTTP REST API, you can do that by creati

Viewing all articles
Browse latest Browse all 6262

Trending Articles