Your SlideShare is downloading. ×
0
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
To Infinity and Beyond - OSDConf2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

To Infinity and Beyond - OSDConf2014

321

Published on

The story of how solving one problem the OpenSource way …

The story of how solving one problem the OpenSource way
opened doors to so much more. Talk presented by Pranav Prakash and Hari Prasanna at OSDConf 2014, New Delhi.

Published in: Technology, Education
2 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
321
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
2
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. TO INFINITY AND BEYOND Pranav Prakash in.linkedin.com/in/prakashpranav Search @LinkedIn Hari Prasanna in.linkedin.com/in/mostlycached BigData @LinkedIn The story of how solving one problem the OpenSource way opened doors to so much more
  • 2. OpenSource Chain Reaction How “it” begins
  • 3. OpenSource Chain Reaction How “it” begins How “it” grows
  • 4. OpenSource Chain Reaction How “it” begins How “it” grows How “it” contributes
  • 5. LUCENE Information Retrieval Library Started in 1999 as SourceForge.net project Joins Apache in 2001 in Jakarta’s family Top Level Project in 2005 LinkedIn, Twitter, Comcast
  • 6. LUCENE IR requirements What would you do next? Be better at searching Crawl the web
  • 7. Web Wrapper around Lucene Full Text Search, NRT Indexing Faceted Search, Clustering
  • 8. NUTCH Web Crawler Billions of pages on the internet Alternate to commercial engines
  • 9. From a single tool to an ecosystem • Breaking away from the initial problem statement • The Google factor - GFS(2003), BigTable(2006), Pregel(2009) leading to HDFS, HBase and Giraph • The thrill and chaos of working with alpha software - from dealing with compatibility issues to being a part of active development • Interoperability between various systems • Ever widening scope of the project and leveraging other tools in the ecosystem
  • 10. Ecosystem
  • 11. • Features: • Distributed storage - HDFS • Distributed processing - MapReduce • Fault tolerance • Horizontal scalability • Comparisons • RDBMS • Grid computing • Use Cases • Analytics (trends, predictions, summaries etc.,) • Searching and Indexing Hadoop
  • 12. • Features: • Column based storage • Horizontal scalability • Low latency reads • MapReduce support • SQL Support with Phoenix • Coprocessors and secondary indexes • RDBMS vs HBase • Use cases • Facebook messages • Monitoring with openTSDB HBase
  • 13. Vanilla MapReduce ! ! ! ! ! Higher Abstractions • Pig - data flow language • Hive - SQL to MapReduce adapter • Cascading - Pipeline primitives and other powerful abstractions • Even higher abstractions with Cascalog(cascading + prolog), PigPen(clojure for pig) and Pig libraries like datafu Java MapReduce Having run through how the MapReduce program works, the next step is to express it in code. We need three things: a map function, a reduce function, and some code to run the job. The map function is represented by the Mapper class, which declares an abstract map() method. Example 2-3 shows the implementation of our map method. Example 2-3. Mapper for maximum temperature example import java.io.IOException; Figure 2-1. MapReduce logical data flow Data Processing
  • 14. • Data collection, aggregation and forwarding with Kafka, Flume, Scribe. • Real time stream processing with Storm to enable online machine learning, real time analytics in twitter, groupon. • Graph processing a trillion edges in facebook with Apache Giraph
  • 15. • Quickstarting with the cloudera distribution • Getting one step through the door - SlideShare’s journey • Can your app survive without it? - Raising your bar • Programmer, Administrator, DBA, Data Scientist - what hat are you wearing today? • The road ahead • Keeping track of the developments and giving back Leveraging “Big Data”
  • 16. • Scientific Research - Scihadoop, decoding DNA • Finance - Fraud Detection, Algorithmic trading, Risk Management • Web - Network Analysis, Recommendation Engines, Personalization • Government - Election campaigns, intelligence systems • Supply chain optimization, Weather forecasting In the Wild

×