Ruby for soul of BigData Nerds

Ruby for the soul of
BigData Nerds

Who Am I?
●
Engineering Team Lead
Analytics & Data Platforms @ Viki.com

●
Founder of http://BigData.SG

●
Contributor to fluentd, pfeed, cartographer, watir

BigData & Its Challenges
"big data" is when the size of the data itself becomes part of the problem
- Mike Loukides

●
Twitter produces over 230 million tweets per day
●
Wal-Mart is logging one million transactions per hour
●
Facebook creates over 30 billion pieces of content
ranging from web links, news, blogs, photo

Everyone has a big data problem

Evolving Trends

Batch Processing
Hadoop , HPCC, Google BigQuery

Stream Processing
STORM (Twitter) & S4 (Yahoo)

Common Engineering Challenges

●
Data Collection
●
Filtering / Segmentation
●
Data Storage
●
Analysis
●
Visualization
●
Prediction / Extrapolation

Data Collection + Filtering /
Segmentation

http://fluentd.org/

Data Collection + Filtering /
Segmentation
You send events as:
Http://domain:8080/namespace?key1=value1&key2=value2

Fluent forwards the data as:
<timestamp> <namespace> {key1:value1,key2:value2}

http://fluentd.org/

Screencast:
http://www.bigdata.sg/videos/fluentd/

Storage

Hadoop HDFS

OpenTSDB
(http://opentsdb.net)

SciDB (DMAS)

Analysis

Hadoop Streaming (Ruby)

Hadoop Hive (Using rbhive)

Visualization
Custom Dashboard
(Rails + Google Charts / d3.js)

Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com

STORM terminology
●
Streams
●
Spouts
●
Bolts
●
Topologies

RedStorm
(https://github.com/colinsurprenant/redstorm)

$ rvm use jruby-1.6.3
$ bundle install redstorm
$ bundle exec redstorm install

Visualizing average bandwidth
experienced by users while
watching videos on viki.com across
the globe.

Thank you!

Let's stay in touch :)
●
Signup for my newsletter at http://parolkar.com
●
Visit BigData.SG Meetup in Singapore.

Ruby for soul of BigData Nerds

More Related Content

What's hot

Similar to Ruby for soul of BigData Nerds

More from Abhishek Parolkar

Recently uploaded

Ruby for soul of BigData Nerds