Ruby for the soul of
  BigData Nerds
Who Am I?
●
    Engineering Team Lead
    Analytics & Data Platforms @ Viki.com




●
    Founder of http://BigData.SG




●
    Contributor to fluentd, pfeed, cartographer, watir
BigData & Its Challenges
"big data" is when the size of the data itself becomes part of the problem
                                                        - Mike Loukides



●
  Twitter produces over 230 million tweets per day
●
  Wal-Mart is logging one million transactions per hour
●
  Facebook creates over 30 billion pieces of content
ranging from web links, news, blogs, photo
Everyone has a big data problem
Evolving Trends

       Batch Processing
        Hadoop , HPCC, Google BigQuery



      Stream Processing
         STORM (Twitter) & S4 (Yahoo)
Common Engineering Challenges

●
    Data Collection
●
    Filtering / Segmentation
●
    Data Storage
●
    Analysis
●
    Visualization
●
    Prediction / Extrapolation
Data Collection + Filtering /
Segmentation




           http://fluentd.org/
Data Collection + Filtering /
Segmentation
                You send events as:
                Http://domain:8080/namespace?key1=value1&key2=value2



                Fluent forwards the data as:
                <timestamp> <namespace> {key1:value1,key2:value2}




           http://fluentd.org/
Screencast:
http://www.bigdata.sg/videos/fluentd/
Storage

          Hadoop HDFS

           OpenTSDB
           (http://opentsdb.net)



          SciDB (DMAS)
Analysis



   Hadoop Streaming (Ruby)

  Hadoop Hive (Using rbhive)
Visualization
          Custom Dashboard
                   (Rails + Google Charts / d3.js)




   Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
Stream Computing
What is STORM?
STORM terminology
●
 Streams
●
 Spouts
●
 Bolts
●
 Topologies
RedStorm
        (https://github.com/colinsurprenant/redstorm)


$ rvm use jruby-1.6.3
$ bundle install redstorm
$ bundle exec redstorm install
Visualizing average bandwidth
experienced by users while
watching videos on viki.com across
the globe.
Thank you!




    Let's stay in touch :)
●
    Signup for my newsletter at http://parolkar.com
●
    Visit BigData.SG Meetup in Singapore.

Ruby for soul of BigData Nerds

  • 1.
    Ruby for thesoul of BigData Nerds
  • 2.
    Who Am I? ● Engineering Team Lead Analytics & Data Platforms @ Viki.com ● Founder of http://BigData.SG ● Contributor to fluentd, pfeed, cartographer, watir
  • 3.
    BigData & ItsChallenges "big data" is when the size of the data itself becomes part of the problem - Mike Loukides ● Twitter produces over 230 million tweets per day ● Wal-Mart is logging one million transactions per hour ● Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo
  • 4.
    Everyone has abig data problem
  • 5.
    Evolving Trends Batch Processing Hadoop , HPCC, Google BigQuery Stream Processing STORM (Twitter) & S4 (Yahoo)
  • 6.
    Common Engineering Challenges ● Data Collection ● Filtering / Segmentation ● Data Storage ● Analysis ● Visualization ● Prediction / Extrapolation
  • 7.
    Data Collection +Filtering / Segmentation http://fluentd.org/
  • 8.
    Data Collection +Filtering / Segmentation You send events as: Http://domain:8080/namespace?key1=value1&key2=value2 Fluent forwards the data as: <timestamp> <namespace> {key1:value1,key2:value2} http://fluentd.org/
  • 9.
  • 10.
    Storage Hadoop HDFS OpenTSDB (http://opentsdb.net) SciDB (DMAS)
  • 11.
    Analysis Hadoop Streaming (Ruby) Hadoop Hive (Using rbhive)
  • 12.
    Visualization Custom Dashboard (Rails + Google Charts / d3.js) Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
  • 13.
  • 14.
  • 15.
    STORM terminology ● Streams ● Spouts ● Bolts ● Topologies
  • 16.
    RedStorm (https://github.com/colinsurprenant/redstorm) $ rvm use jruby-1.6.3 $ bundle install redstorm $ bundle exec redstorm install
  • 17.
    Visualizing average bandwidth experiencedby users while watching videos on viki.com across the globe.
  • 24.
    Thank you! Let's stay in touch :) ● Signup for my newsletter at http://parolkar.com ● Visit BigData.SG Meetup in Singapore.