Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Handling the Extremes: Scaling and Streaming in Finance

$
0
0

Editor’s Note: At Strata+Hadoop World 2016 in New York, MapR Director of Enterprise Strategy & Architecture Jim Scottgave a presentation on “Handling the Extremes: Scaling and Streaming in Finance.”

As Jim explains, agility is king in the world of finance, and a message-driven architecture is a mechanism for building and managing discrete business functionality to enable agility. In order to accommodate rapid innovation, data pipelines must evolve. However, implementing microservices can create management problems, like the number of instances running in an environment.

Microservices can be leveraged on a message-driven architecture, but the concept must be thoughtfully implemented to show the true value. In this blog post, Jim outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance. Along the way, Jim covers financial use cases dealing with securities management and fraud starting with ingestion of data from potentially hundreds of data sources to the required fan-out of that data without sacrificing performance and discusses the pros and cons around operational capabilities and using the same data pipeline to support development and quality assurance practices.

You can watch Jim Scott’s presentation here or read his blog post below to learn more:

Legacy Business Platforms

To set the stage, let’s start off with a story. One day, my boss comes up to me and says, “We need to build out some new software for this new use case to meet our customer needs,” and asks me what I can do. In this situation, I look at the operational side, because I have to build a transactional-based system, and so that means: I need storage, and I need a relational database since that is the tool that everybody has been using for the last 20 years to solve every single business problem that exists. In addition, we’re Java-based, so we’re going to use a J2EE app server, plus I’ll want to implement a message-driven architecture.

We get this standard set-up in place; it works well, scales wonderfully, and meets all of our needs. But then, a handful of months later, my boss has a new request: “How can we cross-sell our products to our customers? Who are our top 10 users?” Now we need an analytical platform. So what do we do? We build out another set of specialized storage and we get another database. And since it’s going to be our data warehouse, we have to put an ETL tool in place, because we’ve got to get our data out of our operational database, and then of course, we need to have some BI tools there.


Handling the Extremes: Scaling and Streaming in Finance

After I build this out, my boss is pleased with the results and compliments me on my work. Then he says, “But,” and here’s the big but: “We need to have full depth of history, because we have new features we want to put back into this user interface. We want to customize the experience on the fly. We want to be able to make better live recommendations.”

Can you productionize that data warehouse to meet the same service level that you have on the operational side? Well, technically, you can, if you throw enough money at it. But you will still have to keep your fingers crossed, hoping that it does not fall over on its face under peak load. The problem here is that the IT team has to go through a lot of extra integration work. And there’s a significant cost to scale up in the enterprise data warehouse world. It is drastically expensive.

Converged Data Platform

We’re accustomed to having two sets of completely separate specialized storage and platforms with completely different data models one for analytical applications and one for operational applications. The data model looks different for operations versus data warehouse because of the speed and availability that you need to get the data to the user in time. Denormalization for the data is another concept in place for the same reason: we’ve had to make trade-offs to use a relational database to solve every problem we’ve had. While we’ve done it this way for a long time, we can make changes to our infrastructure to become a more nimble, more agile organization. The foundational piece of building a next generation platform is to converge these separate spheres.

When you’re dealing with two different data models, you often hear: “I want to pick the right tool for the job.” The RDBMS is the answer because the IT team supported it to get the software into production, and they have already figured out backup strategies. When you work on the analytics side, you’re creating BI reports or KPIs on data warehouse, or doing data science modeling, and so you need history. When you work on the developer side, you are create databases to support persistence and have a need for event-based applications. You have APIs to work against, you need to persist your data to a database, and you want to have a message-driven architecture, all of which support the operational side of your company.

The converged data platform plays a pivotal role in changing this dichotomy. The beauty of this model is that when you operate in a converged data platform, you have everything in one place. You don’t have to move your data from one schema to another. You don’t have to go through a transformation to be able to query the data. Both sides can work on this information at the same time. With converged applications, you have complete access to real-time and historical data in one platform.

We have customers that operate in this model, especially in the financial industry, and they typically ask us this important question: “I need to be able to ask the same question two times in a row and get the same answer, and if I’m doing analytics on my operational platform, how do I do that?” That, in fact, is where MapR comes into the picture, because for this type of platform, you can take a consistent point-in-time snapshot of all of the data, whether it’s an event stream, the database, or files in the system.

As a result, you can build your models against a consistent point-in-time view. When you’ve got all your analytics in place, you point at the real-time data, and you’re done. There’s no extra work, and it’s a copy-on-write filesystem, so you don’t have to figure out how to operationalize it or how to enable a team to support it.

Application Development and Deployment

Why is this important? Well, number one, you can reduce the total amount of time it takes with a document database to persist your data with a single line of code, because every single language has JSON serializers and deserializers. One line of code enables you to persist your data structures to the database or with one line of code you can pull it out of the database and put it in your data structure.

A decade ago, to build a software stack, they didn’t need to be ultra-creative. It was a simple design. They didn’t have the capabilities to spend crazy amounts of money on these different technologies. The open source movement with all of the options that we now see in the ecosystem hadn’t yet flourished. But today this is what it looks like:

You have different data stores for different use cases. As soon as you have more than one database, and pretty much every company has more than one database, you start creating data silos. Once your data is siloed, you have to figure out how to integrate and how to bring everything together to deliver your capabilities. However, with MapR, you can actually do all of these things on the same platform.


Handling the Extremes: Scaling and Streaming in Finance

That doesn’t necessarily mean that you do it all in one cluster, but you have all of the enterprise capabilities for any of those features. You can perform analytics across any or all of them simultaneously.

MapR Pla

Viewing all articles
Browse latest Browse all 6262

Trending Articles