SlideShare a Scribd company logo
1 of 24
Enabling Real-time
Queries to End
Users
Benoit Perroud
SoftShake, Geneva, October 24, 2013
About Me
•
•
•
•
•

Benoit Perroud
Software Engineer @ Verisign
Leading Hadoop Infrastructure Team
Apache Committer
@killerwhile

Verisign Public

2
Agenda
•
•
•
•
•

What’s going on
Data lifecycle
Batch and Realtime
Hadoop Deployments
Next Steps

Verisign Public

3
What’s going on
• Mainframes are obsolete, replaced by commodity hardware’s
cluster
• TenG (10Gb/s) links are the new standard
• RESTful APIs are everywhere
• Everybody wants to visit Paxos Island
• Firehoses do not only carry water
• Asynchronous non-blocking functional programming is taught
at primary school
• NoSQL is the new way to store data at scale
• API management startups are rising (and raising)
• Hadoop keywords boost your LinkedIn profile by 2000%
• Public clouds are responsible for more than 50% of the global
Internet traffic
• … and counting …
Verisign Public

4
A Possible Deployment

Verisign Public

Source: http://dev.datasift.com/blog/high-scalability
Note: the diagram is stamped from 2009, it is probably
partially or even completely outdated today
5
Data Lifecycle

Verisign Public

6
Data Lifecycle

Data Storage
Data Retrieval

Data Ingestion

Consumers

Producers

Data Processing

Verisign Public

7
• Copying internal and external sources of data into the
cluster
• Pre-processing: data cleanup, proper format, …
• Time vs. block-size tradeoff

• Targeted property: Availability
Source of
Data
Ingesting the
flow

Uploading to
HDFS

HDFS

Local
buffering

Verisign Public

8
• Hadoop HDFS is a well established distributed file
system
• File system is the central component of every datadriven approach
• Space vs. network tradeoff
• Targeted property: Reliability

DataNode1

DataNode2

File1

Upload to
HDFS

Verisign Public

DataNode3

DataNode4

9
• Hadoop MapReduce
• Higher level tools (Hive, Pig, Impala) help
• Data catalog needs to be maintained
Targeted property: parallelism

Verisign Public

10
•
•
•
•

Only way to make use of the data
Business driven need
At scale, data needs to be stored as they are queried.
DPI: Data Programmable Interfaces

Targeted property: user friendliness, reliability

Verisign Public

11
Batch and Realtime

Verisign Public

12
Batch Processing

Batch 1 starts
processing

Batch 2 starts
processing

Batch 2 ready
to be served

Batch 1 ready
to be served

Batch 1

Batch 2
t2

t1

Batch 3 starts
processing

t4

t3

Query data from batch 1
Data gap

Verisign Public

Batch 3
t5

Time

Query data from batch 2

Data gap

13
Batch Processing in details

Let some time
for data to finish
upload

Load results
in a data store

Batch with data from
yesterday
Time
New batch
granularity
period

Processing time

Query data from
the day before yesterday?

Verisign Public

Notify the retrieval system
a new batch is ready
to be served

14
Realtime Query
• Interactive query
• REST like request/response queries
• With SLA

And

• Query the latest version of the data
• Latest means n seconds ago with n predictible

Verisign Public

15
Hadoop Deployments

Verisign Public

16
Naïve Hadoop Deployment

hdfs dfs -put
Gateway
mapred job …jar

hdfs dfs -get

Verisign Public

NameNode

JobTracker

DataNode
DataNode
DataNode
DataNode
DataNode
Processing
DataNode
DataNode
DataNode
DataNode
DataNode

17
Industry Hadoop Deployment

Gateway

Data In GW

Data Out GW

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Processing
DataNode

DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode

Monitoring

Verisign Public

NameNode
NameNode

J

DataNode
DataNode

DataN
Dat
D
DataNode
Research,
DataNode
DataNode Data Science
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
Metadata Store

18
Realtime Hadoop Deployment

Gateway

NameNode
NameNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
Processing
Data In GW

DataNode
DataNode
DataNode
DataNode

RT Data Out GW

RT
processing

Verisign Public

19
Hybrid Approach

Batch 1 starts
processing

Batch 2 starts
processing
Batch 2 ready
to be served

Batch 1 ready
to be served

Batch 1
t1

Batch 2
t2

t3

t4

Time

Complementary data for batch 1
Complementary data for batch 2

Verisign Public

20
Realtime Search with Hadoop

Gateway

Data In GW

NameNode
NameNode

Generate
Indexes
DataNode
DataNode
DataNode
DataNode

JobTracker
JobTracker

DataNode
DataNode
DataNode
DataNode
Coordinator

RT Data Out GW

Update
indexes

Verisign Public

21
Next Steps

Verisign Public

22
Hadoop Ecosystem
… is moving … really fast

• Interactive Queries: Cloudera Impala, Apache Drills,
Tez, …
• Search: SolrCloud, ElasticSearch, Cloudera Search
• Hybrid layer: Twitter SummingBird
• … and counting…

Verisign Public

23
Thanks for the
attention!
Follow @killewhile
bperroud@verisign.com

“Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks,
service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc.,
and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their
respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document.
However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained
herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions,
or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the
products, services, or materials described or referenced herein and specifically disclaims any representation that any such products,
services, or materials do not infringe upon any existing or future intellectual property rights.”

More Related Content

What's hot

Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoophadooparchbook
 

What's hot (20)

Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 

Viewers also liked (8)

Ruby on hadoop
Ruby on hadoopRuby on hadoop
Ruby on hadoop
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)
 
Hadoop
HadoopHadoop
Hadoop
 
Do notdelete
Do notdeleteDo notdelete
Do notdelete
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
CBSE XII Boolean Algebra
CBSE XII Boolean AlgebraCBSE XII Boolean Algebra
CBSE XII Boolean Algebra
 

Similar to Soft-Shake 2013 : Enabling Realtime Queries to End Users

Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Chip Childers
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 
DBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application DevelopmentDBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application DevelopmentKyle Hailey
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application DevelopmentKyle Hailey
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago
 
Flintstones or Jetsons? Jump Start Your Virtual Test Lab
Flintstones or Jetsons? Jump Start Your Virtual Test LabFlintstones or Jetsons? Jump Start Your Virtual Test Lab
Flintstones or Jetsons? Jump Start Your Virtual Test LabTechWell
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...Daniel Bryant
 
Databases: The Neglected Technology in DevOps
Databases: The Neglected Technology in DevOpsDatabases: The Neglected Technology in DevOps
Databases: The Neglected Technology in DevOpsDevOps.com
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOpsDelphix
 
A proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionA proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionCollabNet
 
PCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 

Similar to Soft-Shake 2013 : Enabling Realtime Queries to End Users (20)

Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015Talk at the Boston Cloud Foundry Meetup June 2015
Talk at the Boston Cloud Foundry Meetup June 2015
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Data Science
Data ScienceData Science
Data Science
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
DBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application DevelopmentDBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application Development
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application Development
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Flintstones or Jetsons? Jump Start Your Virtual Test Lab
Flintstones or Jetsons? Jump Start Your Virtual Test LabFlintstones or Jetsons? Jump Start Your Virtual Test Lab
Flintstones or Jetsons? Jump Start Your Virtual Test Lab
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
 
Databases: The Neglected Technology in DevOps
Databases: The Neglected Technology in DevOpsDatabases: The Neglected Technology in DevOps
Databases: The Neglected Technology in DevOps
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
 
A proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionA proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversion
 
PCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest Software
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 

Recently uploaded

Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 

Recently uploaded (20)

Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 

Soft-Shake 2013 : Enabling Realtime Queries to End Users

  • 1. Enabling Real-time Queries to End Users Benoit Perroud SoftShake, Geneva, October 24, 2013
  • 2. About Me • • • • • Benoit Perroud Software Engineer @ Verisign Leading Hadoop Infrastructure Team Apache Committer @killerwhile Verisign Public 2
  • 3. Agenda • • • • • What’s going on Data lifecycle Batch and Realtime Hadoop Deployments Next Steps Verisign Public 3
  • 4. What’s going on • Mainframes are obsolete, replaced by commodity hardware’s cluster • TenG (10Gb/s) links are the new standard • RESTful APIs are everywhere • Everybody wants to visit Paxos Island • Firehoses do not only carry water • Asynchronous non-blocking functional programming is taught at primary school • NoSQL is the new way to store data at scale • API management startups are rising (and raising) • Hadoop keywords boost your LinkedIn profile by 2000% • Public clouds are responsible for more than 50% of the global Internet traffic • … and counting … Verisign Public 4
  • 5. A Possible Deployment Verisign Public Source: http://dev.datasift.com/blog/high-scalability Note: the diagram is stamped from 2009, it is probably partially or even completely outdated today 5
  • 7. Data Lifecycle Data Storage Data Retrieval Data Ingestion Consumers Producers Data Processing Verisign Public 7
  • 8. • Copying internal and external sources of data into the cluster • Pre-processing: data cleanup, proper format, … • Time vs. block-size tradeoff • Targeted property: Availability Source of Data Ingesting the flow Uploading to HDFS HDFS Local buffering Verisign Public 8
  • 9. • Hadoop HDFS is a well established distributed file system • File system is the central component of every datadriven approach • Space vs. network tradeoff • Targeted property: Reliability DataNode1 DataNode2 File1 Upload to HDFS Verisign Public DataNode3 DataNode4 9
  • 10. • Hadoop MapReduce • Higher level tools (Hive, Pig, Impala) help • Data catalog needs to be maintained Targeted property: parallelism Verisign Public 10
  • 11. • • • • Only way to make use of the data Business driven need At scale, data needs to be stored as they are queried. DPI: Data Programmable Interfaces Targeted property: user friendliness, reliability Verisign Public 11
  • 13. Batch Processing Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 Batch 2 t2 t1 Batch 3 starts processing t4 t3 Query data from batch 1 Data gap Verisign Public Batch 3 t5 Time Query data from batch 2 Data gap 13
  • 14. Batch Processing in details Let some time for data to finish upload Load results in a data store Batch with data from yesterday Time New batch granularity period Processing time Query data from the day before yesterday? Verisign Public Notify the retrieval system a new batch is ready to be served 14
  • 15. Realtime Query • Interactive query • REST like request/response queries • With SLA And • Query the latest version of the data • Latest means n seconds ago with n predictible Verisign Public 15
  • 17. Naïve Hadoop Deployment hdfs dfs -put Gateway mapred job …jar hdfs dfs -get Verisign Public NameNode JobTracker DataNode DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode 17
  • 18. Industry Hadoop Deployment Gateway Data In GW Data Out GW NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Processing DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Monitoring Verisign Public NameNode NameNode J DataNode DataNode DataN Dat D DataNode Research, DataNode DataNode Data Science DataNode DataNode DataNode DataNode DataNode DataNode Metadata Store 18
  • 19. Realtime Hadoop Deployment Gateway NameNode NameNode JobTracker JobTracker DataNode DataNode DataNode DataNode Processing Data In GW DataNode DataNode DataNode DataNode RT Data Out GW RT processing Verisign Public 19
  • 20. Hybrid Approach Batch 1 starts processing Batch 2 starts processing Batch 2 ready to be served Batch 1 ready to be served Batch 1 t1 Batch 2 t2 t3 t4 Time Complementary data for batch 1 Complementary data for batch 2 Verisign Public 20
  • 21. Realtime Search with Hadoop Gateway Data In GW NameNode NameNode Generate Indexes DataNode DataNode DataNode DataNode JobTracker JobTracker DataNode DataNode DataNode DataNode Coordinator RT Data Out GW Update indexes Verisign Public 21
  • 23. Hadoop Ecosystem … is moving … really fast • Interactive Queries: Cloudera Impala, Apache Drills, Tez, … • Search: SolrCloud, ElasticSearch, Cloudera Search • Hybrid layer: Twitter SummingBird • … and counting… Verisign Public 23
  • 24. Thanks for the attention! Follow @killewhile bperroud@verisign.com “Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document. However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”

Editor's Notes

  1. The remainder of this presentation focuses on the prevention of DNS or DDoS attacks and global server load balancing to add resiliency to your eCommerce architecture