0
Enabling Real-time
Queries to End Users
Benoit Perroud
About me
•

Benoit Perroud

•

Software Engineer @Verisign

•

Leading Hadoop Team

•

Apache Committer

•

@killerwhile

...
Agenda
•

What’s going on

•

Batch and Realtime

•

Hadoop Deployments

•

Next steps

|
What’s going on
•

Mainframes are obsolete, replaced by commodity hardware’s cluster

•

TenG (10Gb/s) links are the new s...
A Possible Deployment

|

Source: http://dev.datasift.com/blog/high-scalability
Speaker’s Logo

Note: the diagram is stamp...
Batch and Realtime

|
Batch Processing
Batch 1 starts
processing

Batch 2 starts
processing

Batch 2 ready
to be served

Batch 1 ready
to be ser...
Batch Processing in details
Let some time
for data to finish
upload

Load results
in a data store

Batch with data from
ye...
Realtime Query
•

Interactive query
•

REST like request/response query type

And
•

Query the latest version of the data
...
Hybrid Approach
Batch 1 starts
processing

Batch 2 starts
processing

Batch 2 ready
to be served

Batch 1 ready
to be serv...
Hadoop Deployments

|
|

Naïve Hadoop Deployment
NameNode

JobTracker

hdfs dfs -put
Gateway

mapred job …jar

hdfs dfs -get

DataNode
DataNode
...
|

Industry Hadoop Deployment
Gateway

Data In GW

Data Out GW

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNod...
|

Realtime Hadoop Deployment
Gateway

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
Proce...
|

Realtime Search with Hadoop
Gateway

Data In GW

NameNode
NameNode

Generate
Indexes
DataNode
DataNode
DataNode
DataNod...
Next Steps

|
Hadoop Ecosystem
… is moving … really fast
•

Interactive Queries: Cloudera Impala, Apache Drills, Tez, …

•

Search: Solr...
Thanks for the attention!
Follow @killewhile
bperroud@verisign.com

“Copyright © 2013 VeriSign, Inc. All rights reserved. ...
Upcoming SlideShare
Loading in...5
×

JAZOON'13 - Benoit Perroud - Realtime Queries

326

Published on

http://guide13.jazoon.com/#/submissions/133

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
326
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "JAZOON'13 - Benoit Perroud - Realtime Queries"

  1. 1. Enabling Real-time Queries to End Users Benoit Perroud
  2. 2. About me • Benoit Perroud • Software Engineer @Verisign • Leading Hadoop Team • Apache Committer • @killerwhile |
  3. 3. Agenda • What’s going on • Batch and Realtime • Hadoop Deployments • Next steps |
  4. 4. What’s going on • Mainframes are obsolete, replaced by commodity hardware’s cluster • TenG (10Gb/s) links are the new standard • RESTful APIs are everywhere • Everybody wants to visit Paxos island • Firehoses do not only carry water • Asynchronous non-blocking functional programming is taught at primary school • NoSQL is the new way to store data at scale • API management startups are rising (and raising) • Hadoop keywords boost your LinkedIn profile by 2000% • Public clouds are responsible for more than 50% of the global Internet traffic • … and counting … |
  5. 5. A Possible Deployment | Source: http://dev.datasift.com/blog/high-scalability Speaker’s Logo Note: the diagram is stamped from 2009, it is probably partially or even completely outdated today
  6. 6. Batch and Realtime |
  7. 7. Batch Processing Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 Batch 2 t2 t1 Batch 3 starts processing t4 t3 Query data from t1 Data gap Batch 3 Data gap | t5 Query data from t3 Time
  8. 8. Batch Processing in details Let some time for data to finish upload Load results in a data store Batch with data from yesterday Time New batch granularity period Processing time Query data from the day before yesterday? | Notify the retrieval system a new batch is ready to be served
  9. 9. Realtime Query • Interactive query • REST like request/response query type And • Query the latest version of the data • Latest meaning n seconds ago with n known and fixed |
  10. 10. Hybrid Approach Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 t1 Batch 2 t2 t4 t3 Time Complementary data for batch 1 Complementary data for batch 2 Query data from t1 snapshot AND complementary data | Query data from t2 snapshot AND complementary data
  11. 11. Hadoop Deployments |
  12. 12. | Naïve Hadoop Deployment NameNode JobTracker hdfs dfs -put Gateway mapred job …jar hdfs dfs -get DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode
  13. 13. | Industry Hadoop Deployment Gateway Data In GW Data Out GW NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Monitoring NameNode NameNode J DataNode DataNode DataN Dat D DataNode Research, DataNode DataNode Data Science DataNode DataNode DataNode DataNode DataNode DataNode Metadata Store
  14. 14. | Realtime Hadoop Deployment Gateway NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode Processing Data In GW DataNode DataNode DataNode DataNode RT processing RT Data Out GW
  15. 15. | Realtime Search with Hadoop Gateway Data In GW NameNode NameNode Generate Indexes DataNode DataNode DataNode DataNode Update indexes JobTracker JobTracker DataNode DataNode DataNode DataNode Coordinator RT Data Out GW
  16. 16. Next Steps |
  17. 17. Hadoop Ecosystem … is moving … really fast • Interactive Queries: Cloudera Impala, Apache Drills, Tez, … • Search: SolrCloud, ElasticSearch, Cloudera Search • Hybrid layer: Twitter SummingBird • … and counting … |
  18. 18. Thanks for the attention! Follow @killewhile bperroud@verisign.com “Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document. However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×