JAZOON'13 - Benoit Perroud - Realtime Queries
Upcoming SlideShare
Loading in...5
×
 

JAZOON'13 - Benoit Perroud - Realtime Queries

on

  • 429 views

http://guide13.jazoon.com/#/submissions/133

http://guide13.jazoon.com/#/submissions/133

Statistics

Views

Total Views
429
Views on SlideShare
390
Embed Views
39

Actions

Likes
0
Downloads
18
Comments
0

4 Embeds 39

http://guide.jazoon.com 33
http://localhost 3
http://guide13.jazoon.com 2
http://jazoonvote.appspot.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

JAZOON'13 - Benoit Perroud - Realtime Queries JAZOON'13 - Benoit Perroud - Realtime Queries Presentation Transcript

  • Enabling Real-time Queries to End Users Benoit Perroud
  • About me • Benoit Perroud • Software Engineer @Verisign • Leading Hadoop Team • Apache Committer • @killerwhile |
  • Agenda • What’s going on • Batch and Realtime • Hadoop Deployments • Next steps |
  • What’s going on • Mainframes are obsolete, replaced by commodity hardware’s cluster • TenG (10Gb/s) links are the new standard • RESTful APIs are everywhere • Everybody wants to visit Paxos island • Firehoses do not only carry water • Asynchronous non-blocking functional programming is taught at primary school • NoSQL is the new way to store data at scale • API management startups are rising (and raising) • Hadoop keywords boost your LinkedIn profile by 2000% • Public clouds are responsible for more than 50% of the global Internet traffic • … and counting … |
  • A Possible Deployment | Source: http://dev.datasift.com/blog/high-scalability Speaker’s Logo Note: the diagram is stamped from 2009, it is probably partially or even completely outdated today
  • Batch and Realtime |
  • Batch Processing Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 Batch 2 t2 t1 Batch 3 starts processing t4 t3 Query data from t1 Data gap Batch 3 Data gap | t5 Query data from t3 Time
  • Batch Processing in details Let some time for data to finish upload Load results in a data store Batch with data from yesterday Time New batch granularity period Processing time Query data from the day before yesterday? | Notify the retrieval system a new batch is ready to be served
  • Realtime Query • Interactive query • REST like request/response query type And • Query the latest version of the data • Latest meaning n seconds ago with n known and fixed |
  • Hybrid Approach Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 t1 Batch 2 t2 t4 t3 Time Complementary data for batch 1 Complementary data for batch 2 Query data from t1 snapshot AND complementary data | Query data from t2 snapshot AND complementary data
  • Hadoop Deployments |
  • | Naïve Hadoop Deployment NameNode JobTracker hdfs dfs -put Gateway mapred job …jar hdfs dfs -get DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode
  • | Industry Hadoop Deployment Gateway Data In GW Data Out GW NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Monitoring NameNode NameNode J DataNode DataNode DataN Dat D DataNode Research, DataNode DataNode Data Science DataNode DataNode DataNode DataNode DataNode DataNode Metadata Store
  • | Realtime Hadoop Deployment Gateway NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode Processing Data In GW DataNode DataNode DataNode DataNode RT processing RT Data Out GW
  • | Realtime Search with Hadoop Gateway Data In GW NameNode NameNode Generate Indexes DataNode DataNode DataNode DataNode Update indexes JobTracker JobTracker DataNode DataNode DataNode DataNode Coordinator RT Data Out GW
  • Next Steps |
  • Hadoop Ecosystem … is moving … really fast • Interactive Queries: Cloudera Impala, Apache Drills, Tez, … • Search: SolrCloud, ElasticSearch, Cloudera Search • Hybrid layer: Twitter SummingBird • … and counting … |
  • Thanks for the attention! Follow @killewhile bperroud@verisign.com “Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document. However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”