This article is the secondinstallment in a threepart series that coversone of the most critical issues facing the financial industry Investor & Market Integrity Protection via GlobalMarket Surveillance. While thefirst(and previous) post discussed the global scope of the problem across multiple global jurisdictions this post will discuss a candidate Big Data & Cloud Computing Architecture that can helpmarket participants (especially the front line regulators the Stock Exchanges themselves) & SROs (Self Regulatory Authorities) implement these capabilities in their applications & platforms.
Business Background
The first article in this threepart series laid out the five business trends that are causing a need to rethink existing Global & Cross Asset Surveillance based systems.
To recap them below
The rise of trade lifecycle automation across the Capital Markets value chain and the increasing use of technology across the lifecycle contributes to anenvironmentwhere speeds and feeds are contributing to a huge number of securitieschanging hands (in huge quantities) in milliseconds across 25+ global venues of trading; automation leads to increase in trading volumes which adds substantially to the increased risk of fraudThe presence of multiple avenues of trading (ATF alternative trading facilities and MTF multilateral trading facilities) creates opportunities for information and pricearbitragethat were never a huge problem before in terms of multiple markets and multiple products across multiple geographies with different regulatory requirements.This has been covered in a previous post in this blog at
http://www.vamsitalkstech.com/?p=412 As a natural consequence of all of the above (the globalization of trading where market participantsare spread acrossmultiple geographies) it makes it all the more difficult to provide a consolidated audit trail (CAT) to view all activity under a singlesource of truth ;as well as traceability of orders across those venues; this isextremely key as fraud is becoming increasingly sophisticated e.g the rise of insider trading rings Existing application (e.g ticker plants, surveillance systems, DevOps) architectures are becoming brittle and underperforming as data and transactionvolumes continue to go up &data storage requirements keep rising every year. This leads to massive gaps in compliance data. Another significant gap is found whileperforming a range of post trade analytics many of which are beyond the simple business rules being leveraged right now and now increasingly need to move into the machine learning & predictive domain. Surveillance now needs to include non traditional sources of data e.g trader email/chat/link analysis etc that can point to under the radar rogue trading activity before that causes the financial system huge losses. E.g. the London Whale, the LIBOR fixing scandal etc Again as a consequence of increased automation, backtesting of data has become a challenge aswell as being able to replaydataacrosshistorical intervals. This is key in mining for patterns of suspicious activity like bursty spikes in trading as well ascertainpatternsthat could indicate illegal insider sellingThe key issue becomes how do antiquated surveillance systems move into the era of Cloud & Big Data enabled innovation as a way of overcoming these business challenges?
Technology Requirements
An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system also needs to parallelize execution at scale to be able to meet demanding latency requirementsfor a market surveillance platform.
The most important technical essentials for such a system are
Support end to end monitoring across a variety of financial instruments across multiple venues of trading. Support a wide variety of analytics that enable the discovery of interrelationships between customers, traders & trades as the next major advancein surveillance technology. Provide a platformthat can ingest from tens of millions to billions of market events (spanning a range of financial instruments Equities, Bonds, Forex, Commoditiesand Derivatives etc) on a daily basis from thousands of institutional market participants The ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement. As we can see from the first post, market manipulation is an activity that seems to constantly push the boundaries in new and unforseen ways Provide advanced visualization techniques thushelping Compliance and Surveillance officers manage the information overload. The ability to perform deep cross-market analysis i.e. to be able to look at financial instruments & securities trading on multiple geographies and exchanges e.g. The ability to create views and correlate data that are both wide and deep. A wide view will look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading and unusual pricing. The ability to provide in-memory caches of data for rapid pre-trade compliance checks. Ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models . e.g. best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into R. Provide Data Scientists and Quants with development interfaces using tools like SAS andR. The results of the processing and queries need tobe exported in various data formats, a simple CSV/txt format or more optimized binary formats, JSONformats, or even into custom formats. The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean). Based on back testing and simulation, analysts should be able to tweak the model and also allow subscribers (typically compliance personnel) of the platform to customize their execution models. A wide range of Analytical tools need to be integrated that allow the best dashboards and visualizations.Application & Data Architecture
The dramatic technology advances in Big Data & Cloud Computing enable the realization of the above requirements. Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.
Toenumerate the various advantages of using Big Data
a) Real time insights Generate insights at a latency of a few millisecondsb) A Single View of Customer/Trade/Transaction