SlideShare a Scribd company logo
1 of 18
Enabling Real-time
Queries to End Users
Benoit Perroud
About me
•

Benoit Perroud

•

Software Engineer @Verisign

•

Leading Hadoop Team

•

Apache Committer

•

@killerwhile

|
Agenda
•

What’s going on

•

Batch and Realtime

•

Hadoop Deployments

•

Next steps

|
What’s going on
•

Mainframes are obsolete, replaced by commodity hardware’s cluster

•

TenG (10Gb/s) links are the new standard

•

RESTful APIs are everywhere

•

Everybody wants to visit Paxos island

•

Firehoses do not only carry water

•

Asynchronous non-blocking functional programming is taught at primary school

•

NoSQL is the new way to store data at scale

•

API management startups are rising (and raising)

•

Hadoop keywords boost your LinkedIn profile by 2000%

•

Public clouds are responsible for more than 50% of the global Internet traffic

•

… and counting …

|
A Possible Deployment

|

Source: http://dev.datasift.com/blog/high-scalability
Speaker’s Logo

Note: the diagram is stamped from 2009, it is probably
partially or even completely outdated today
Batch and Realtime

|
Batch Processing
Batch 1 starts
processing

Batch 2 starts
processing

Batch 2 ready
to be served

Batch 1 ready
to be served

Batch 1

Batch 2
t2

t1

Batch 3 starts
processing

t4

t3

Query data from t1
Data gap

Batch 3

Data gap

|

t5

Query data from t3

Time
Batch Processing in details
Let some time
for data to finish
upload

Load results
in a data store

Batch with data from
yesterday
Time
New batch
granularity
period

Processing time

Query data from
the day before yesterday?

|

Notify the retrieval system
a new batch is ready
to be served
Realtime Query
•

Interactive query
•

REST like request/response query type

And
•

Query the latest version of the data
•

Latest meaning n seconds ago with n known and fixed

|
Hybrid Approach
Batch 1 starts
processing

Batch 2 starts
processing

Batch 2 ready
to be served

Batch 1 ready
to be served

Batch 1
t1

Batch 2
t2

t4

t3

Time

Complementary data for batch 1
Complementary data for batch 2
Query data from t1 snapshot
AND complementary data

|

Query data from t2 snapshot
AND complementary data
Hadoop Deployments

|
|

Naïve Hadoop Deployment
NameNode

JobTracker

hdfs dfs -put
Gateway

mapred job …jar

hdfs dfs -get

DataNode
DataNode
DataNode
DataNode
Processing
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
|

Industry Hadoop Deployment
Gateway

Data In GW

Data Out GW

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Processing
DataNode

DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode

Monitoring

NameNode
NameNode

J

DataNode
DataNode

DataN
Dat
D
DataNode
Research,
DataNode
DataNode Data Science
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Metadata Store
|

Realtime Hadoop Deployment
Gateway

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
Processing
Data In GW

DataNode
DataNode
DataNode
DataNode

RT
processing

RT Data Out GW
|

Realtime Search with Hadoop
Gateway

Data In GW

NameNode
NameNode

Generate
Indexes
DataNode
DataNode
DataNode
DataNode

Update
indexes

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
Coordinator

RT Data Out GW
Next Steps

|
Hadoop Ecosystem
… is moving … really fast
•

Interactive Queries: Cloudera Impala, Apache Drills, Tez, …

•

Search: SolrCloud, ElasticSearch, Cloudera Search

•

Hybrid layer: Twitter SummingBird

•

… and counting …

|
Thanks for the attention!
Follow @killewhile
bperroud@verisign.com

“Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks,
service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc.,
and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their
respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document.
However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained
herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions,
or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the
products, services, or materials described or referenced herein and specifically disclaims any representation that any such products,
services, or materials do not infringe upon any existing or future intellectual property rights.”

More Related Content

More from jazoon13

JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
jazoon13
 
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed TeamsJAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
jazoon13
 
JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
jazoon13
 
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next GenerationJAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
jazoon13
 
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
jazoon13
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
jazoon13
 
JAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git WorkflowsJAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git Workflows
jazoon13
 

More from jazoon13 (11)

JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
 
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed TeamsJAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
 
JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
 
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next GenerationJAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
 
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In RealtimeJAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
 
JAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronizedJAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronized
 
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experienceJAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
 
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
 
JAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great MigrationJAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great Migration
 
JAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git WorkflowsJAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git Workflows
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

JAZOON'13 - Benoit Perroud - Realtime Queries

  • 1. Enabling Real-time Queries to End Users Benoit Perroud
  • 2. About me • Benoit Perroud • Software Engineer @Verisign • Leading Hadoop Team • Apache Committer • @killerwhile |
  • 3. Agenda • What’s going on • Batch and Realtime • Hadoop Deployments • Next steps |
  • 4. What’s going on • Mainframes are obsolete, replaced by commodity hardware’s cluster • TenG (10Gb/s) links are the new standard • RESTful APIs are everywhere • Everybody wants to visit Paxos island • Firehoses do not only carry water • Asynchronous non-blocking functional programming is taught at primary school • NoSQL is the new way to store data at scale • API management startups are rising (and raising) • Hadoop keywords boost your LinkedIn profile by 2000% • Public clouds are responsible for more than 50% of the global Internet traffic • … and counting … |
  • 5. A Possible Deployment | Source: http://dev.datasift.com/blog/high-scalability Speaker’s Logo Note: the diagram is stamped from 2009, it is probably partially or even completely outdated today
  • 7. Batch Processing Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 Batch 2 t2 t1 Batch 3 starts processing t4 t3 Query data from t1 Data gap Batch 3 Data gap | t5 Query data from t3 Time
  • 8. Batch Processing in details Let some time for data to finish upload Load results in a data store Batch with data from yesterday Time New batch granularity period Processing time Query data from the day before yesterday? | Notify the retrieval system a new batch is ready to be served
  • 9. Realtime Query • Interactive query • REST like request/response query type And • Query the latest version of the data • Latest meaning n seconds ago with n known and fixed |
  • 10. Hybrid Approach Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 t1 Batch 2 t2 t4 t3 Time Complementary data for batch 1 Complementary data for batch 2 Query data from t1 snapshot AND complementary data | Query data from t2 snapshot AND complementary data
  • 12. | Naïve Hadoop Deployment NameNode JobTracker hdfs dfs -put Gateway mapred job …jar hdfs dfs -get DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode
  • 13. | Industry Hadoop Deployment Gateway Data In GW Data Out GW NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Monitoring NameNode NameNode J DataNode DataNode DataN Dat D DataNode Research, DataNode DataNode Data Science DataNode DataNode DataNode DataNode DataNode DataNode Metadata Store
  • 15. | Realtime Search with Hadoop Gateway Data In GW NameNode NameNode Generate Indexes DataNode DataNode DataNode DataNode Update indexes JobTracker JobTracker DataNode DataNode DataNode DataNode Coordinator RT Data Out GW
  • 17. Hadoop Ecosystem … is moving … really fast • Interactive Queries: Cloudera Impala, Apache Drills, Tez, … • Search: SolrCloud, ElasticSearch, Cloudera Search • Hybrid layer: Twitter SummingBird • … and counting … |
  • 18. Thanks for the attention! Follow @killewhile bperroud@verisign.com “Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document. However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”