Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Riak TS for time series analysis at scale

$
0
0

Until recently,doing time series analysis at scale was expensive and almost exclusively the domain of large enterprises. What made time series a hard andexpensive problem to tackle? Until the advent of the NoSQL database,scaling up to meet increasing velocity and volumes of data generally meant scaling hardware vertically by adding CPUs, memory, or additional hard drives. When combined with database licensing models that charged per processor core, the cost of scaling was simply out of reach for most.

Fortunately, the open source community is democratising large scale data analysis rapidly, and I am lucky enough to work at a companymaking contributions in this space. In my talk at All Things Open this year, I'llintroduce Riak TS ,a key-value database optimized to store and retrieve time series data for massive data sets,and demonstrate how to use it in conjunction with three other open source tools―python, Pandas, and Jupyter―to build a completely open source time series analysis platform. And it doesn't take all that long.

The basics you need to know to get started with Riak TS:

Installation:where to get Riak TS, how to install it, and how to scale it up as the size of your data problem grows How to get started interacting with Riak TS using the built in riak-shell and Python using the Riak Python Client How to create a new table in Riak TS and verify that it was created How to query Riak TS using both the riak-shell and Python

During my talk, I'll loadover 350,000records from the Bay Area Bike Share open data set to demonstrate how fast Riak TS is at both reading and writing data. I'll use thePython Data Analysis Library and Jupyter (two open source tools that every Python programmersshould know) to:

Query Riak TS Convert a Riak TS resultset into a Pandas DataFrame Demonstrate some of the built in data analysis features of Pandas Use the matplotlib library to demonstrate how to create data visualizations

Riak TS is a particularly exciting addition to the open source world of databases for a couple of reasons. One, you'dbe hard pressed to find a time series database that can scale from oneto over 100nodes on commodity hardware with so little effort in the ops department. And two, Riak TS automatically handles the distribution of data around your cluster of nodes, replicates your data three times to ensure high availability, and has a host of automated features that are designed specifically to maximize uptime.

For developing applications on top of Riak TS using Java, Python, Ruby, GO, Node.js, php, .Net, or Erlang, one of the coolest featuresis Riak TS’s use of ANSI compliant SQL. Using SQLmakes Riak TS accessible to a wide range of developers and,importantly, businessdata analysts.

If you are feeling particularly motivated to start analyzing time series data,you can grab all of my example code from GitHub.


Viewing all articles
Browse latest Browse all 6262

Trending Articles