Ruby for the soul of  BigData Nerds
Who Am I?●    Engineering Team Lead    Analytics & Data Platforms @ Viki.com●    Founder of http://BigData.SG●    Contribu...
BigData & Its Challenges"big data" is when the size of the data itself becomes part of the problem                        ...
Everyone has a big data problem
Evolving Trends       Batch Processing        Hadoop , HPCC, Google BigQuery      Stream Processing         STORM (Twitter...
Common Engineering Challenges●    Data Collection●    Filtering / Segmentation●    Data Storage●    Analysis●    Visualiza...
Data Collection + Filtering /Segmentation           http://fluentd.org/
Data Collection + Filtering /Segmentation                You send events as:                Http://domain:8080/namespace?k...
Screencast:http://www.bigdata.sg/videos/fluentd/
Storage          Hadoop HDFS           OpenTSDB           (http://opentsdb.net)          SciDB (DMAS)
Analysis   Hadoop Streaming (Ruby)  Hadoop Hive (Using rbhive)
Visualization          Custom Dashboard                   (Rails + Google Charts / d3.js)   Some Hosted Services: tableaup...
Stream Computing
What is STORM?
STORM terminology● Streams● Spouts● Bolts● Topologies
RedStorm        (https://github.com/colinsurprenant/redstorm)$ rvm use jruby-1.6.3$ bundle install redstorm$ bundle exec r...
Visualizing average bandwidthexperienced by users whilewatching videos on viki.com acrossthe globe.
Thank you!    Lets stay in touch :)●    Signup for my newsletter at http://parolkar.com●    Visit BigData.SG Meetup in Sin...
Ruby for soul of BigData Nerds
Ruby for soul of BigData Nerds
Ruby for soul of BigData Nerds
Ruby for soul of BigData Nerds
Ruby for soul of BigData Nerds
Ruby for soul of BigData Nerds
Upcoming SlideShare
Loading in …5
×

Ruby for soul of BigData Nerds

2,047
-1

Published on

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,047
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Ruby for soul of BigData Nerds

  1. 1. Ruby for the soul of BigData Nerds
  2. 2. Who Am I?● Engineering Team Lead Analytics & Data Platforms @ Viki.com● Founder of http://BigData.SG● Contributor to fluentd, pfeed, cartographer, watir
  3. 3. BigData & Its Challenges"big data" is when the size of the data itself becomes part of the problem - Mike Loukides● Twitter produces over 230 million tweets per day● Wal-Mart is logging one million transactions per hour● Facebook creates over 30 billion pieces of contentranging from web links, news, blogs, photo
  4. 4. Everyone has a big data problem
  5. 5. Evolving Trends Batch Processing Hadoop , HPCC, Google BigQuery Stream Processing STORM (Twitter) & S4 (Yahoo)
  6. 6. Common Engineering Challenges● Data Collection● Filtering / Segmentation● Data Storage● Analysis● Visualization● Prediction / Extrapolation
  7. 7. Data Collection + Filtering /Segmentation http://fluentd.org/
  8. 8. Data Collection + Filtering /Segmentation You send events as: Http://domain:8080/namespace?key1=value1&key2=value2 Fluent forwards the data as: <timestamp> <namespace> {key1:value1,key2:value2} http://fluentd.org/
  9. 9. Screencast:http://www.bigdata.sg/videos/fluentd/
  10. 10. Storage Hadoop HDFS OpenTSDB (http://opentsdb.net) SciDB (DMAS)
  11. 11. Analysis Hadoop Streaming (Ruby) Hadoop Hive (Using rbhive)
  12. 12. Visualization Custom Dashboard (Rails + Google Charts / d3.js) Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
  13. 13. Stream Computing
  14. 14. What is STORM?
  15. 15. STORM terminology● Streams● Spouts● Bolts● Topologies
  16. 16. RedStorm (https://github.com/colinsurprenant/redstorm)$ rvm use jruby-1.6.3$ bundle install redstorm$ bundle exec redstorm install
  17. 17. Visualizing average bandwidthexperienced by users whilewatching videos on viki.com acrossthe globe.
  18. 18. Thank you! Lets stay in touch :)● Signup for my newsletter at http://parolkar.com● Visit BigData.SG Meetup in Singapore.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×