Ruby for soul of BigData Nerds

  • 1,889 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,889
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Ruby for the soul of BigData Nerds
  • 2. Who Am I?● Engineering Team Lead Analytics & Data Platforms @ Viki.com● Founder of http://BigData.SG● Contributor to fluentd, pfeed, cartographer, watir
  • 3. BigData & Its Challenges"big data" is when the size of the data itself becomes part of the problem - Mike Loukides● Twitter produces over 230 million tweets per day● Wal-Mart is logging one million transactions per hour● Facebook creates over 30 billion pieces of contentranging from web links, news, blogs, photo
  • 4. Everyone has a big data problem
  • 5. Evolving Trends Batch Processing Hadoop , HPCC, Google BigQuery Stream Processing STORM (Twitter) & S4 (Yahoo)
  • 6. Common Engineering Challenges● Data Collection● Filtering / Segmentation● Data Storage● Analysis● Visualization● Prediction / Extrapolation
  • 7. Data Collection + Filtering /Segmentation http://fluentd.org/
  • 8. Data Collection + Filtering /Segmentation You send events as: Http://domain:8080/namespace?key1=value1&key2=value2 Fluent forwards the data as: <timestamp> <namespace> {key1:value1,key2:value2} http://fluentd.org/
  • 9. Screencast:http://www.bigdata.sg/videos/fluentd/
  • 10. Storage Hadoop HDFS OpenTSDB (http://opentsdb.net) SciDB (DMAS)
  • 11. Analysis Hadoop Streaming (Ruby) Hadoop Hive (Using rbhive)
  • 12. Visualization Custom Dashboard (Rails + Google Charts / d3.js) Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
  • 13. Stream Computing
  • 14. What is STORM?
  • 15. STORM terminology● Streams● Spouts● Bolts● Topologies
  • 16. RedStorm (https://github.com/colinsurprenant/redstorm)$ rvm use jruby-1.6.3$ bundle install redstorm$ bundle exec redstorm install
  • 17. Visualizing average bandwidthexperienced by users whilewatching videos on viki.com acrossthe globe.
  • 18. Thank you! Lets stay in touch :)● Signup for my newsletter at http://parolkar.com● Visit BigData.SG Meetup in Singapore.