SlideShare a Scribd company logo
1 of 33
POLYGLOT PROCESSING –
AN INTRODUCTION
Dr. Mohan K. Bavirisetty
Chief Scientist
Modern Renaissance
Agenda
1. Big Data Landscape
2. Lambda vs. Kappa Architecture
3. Spark vs. Storm vs. Flink
4. Demo 1 – Apache Spark
5. Demo 2 – Storm, Kafka and Redis
6. Demo 3 – Flink with Data Stream API?
7. Summary
8. Questions
The purpose of computing is insight not data – Richard Hamming
BIG DATA LANDSCAPE
What is Big Data?
Big data is high-volume, high-velocity and high-
variety information assets that demand cost-
effective, innovative forms of information processing
for enhanced insight and decision making.
Source: Gartner Research
What is a Real-time Analytics Platform?
• Batch Operations1
• Micro batch Operations2
• Real-time Streaming3
3 Common Kinds of Workloads
“Evidence-based decision-making (aka Big Data) is not just the latest fad, it's
the future of how we are going to guide and grow business.”
– Kristen Hammond, CTO, Narrative Sciences
8 Requirements of Real-time Computing
Keep Data Moving
Allow SQL Queries
Handle Stream Imperfections
Generate Predictable Outcomes
Integrate Streaming Data and Stored Data
Guarantee Data Safety and Availability
Partition and Scale Applications Automatically
Process and Respond Instantaneously
How do major data engines compare?
Real-time Streaming Architecture
Berkeley Data Analytics Stack
Polyglot …..
•One who is versed in many languages …Polyglot
•Different languages, frameworks and services
•Example Java with Scala, Clojure inside Trident
Polyglot
Programming
•Capacity to store data in multiple formats
•Structured, document, Log, GPS
Polyglot
Persistence
•Refers to capability to process any kind of data,
any kind of workload, any kind of workflow
Polyglot
Processing
LAMBDA VS. KAPPA
ARCHITECTURES
Lambda Architecture
What is Apache Storm?
Apache Storm is a free and open source
distributed real-time computation system it
makes it easy to reliably process unbounded
streams of data.
Why Apache Storm?
Storm is fast, horizontally scalable,
fault-tolerant, easy to setup and
operate and programming language
agnostic
Apache Storm
Apache Storm can be used to realize an APM Use Case
Apache Spark
Apache Spark is a fast and general
engine for large-scale data processing.
• Spark is fast
• Spark is easy
• Spark is extensible
Lambda Implementation with Spark
Kappa Architecture
Apache Flink
Apache Flink has unified runtime engine
DEMONSTRATION
SUMMARY
Summary
• Big Data Challenges are being met with new and
innovative approaches and architectures.
• Lambda Architecture is a pragmatic near-term
solution. Fidelity is already implementing it.
• Kappa Architecture could turn out to be long-term
elegant solution to Polyglot Processing.
• Apache Spark, Strom and Flink have their strengths
and niche areas of applicability.
• Apache Samoa, Apache Zappelin and Tacheon add
value further by providing additional capabilities
Maturity
Time
Descriptive
Preventive/
Prescriptive
Working Toward Analytics Mastery
Predictive
Next Stage of Data Explosion
QUESTIONS?
We do not learn by inference and deduction and the application of mathematics to
philosophy, but by direct intercourse …
- Henry David Thoreau
THANK YOU
Appendix- References and Resources
• 8 Requirements of Real-time Stream Processing
http://cs.brown.edu/~ugur/8rulesSigRec.pdf
• Design Patterns for Real-Time Streaming Analytics
http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774
• Big Data: Principles and Best Practices of Scalable Real-time Data Systems.
http://bit.ly/1LscB7z
• Real-time Stream Processing Next-Step for Apache Flink
http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/
• SAMOA – Scalable Advanced Massive Online Analysis
http://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf
• Lambda Architecture http://lambda-architecture.net/
• Kappa Architecture http://www.kappa-architecture.com/
• Apache Spark http://spark.apache.org/
• Apache Storm https://storm.apache.org/
• Apache Flink https://flink.apache.org/
• Apache SAMOA https://samoa.incubator.apache.org/
• Apache Zappelin https://zeppelin.incubator.apache.org/
• Tacheon http://tachyon-project.org/

More Related Content

What's hot

Les Base de Données NOSQL -Presentation -
Les Base de Données NOSQL -Presentation -Les Base de Données NOSQL -Presentation -
Les Base de Données NOSQL -Presentation -IliasAEA
 
NoSQL panorama - Jean Seiler Softeam
NoSQL panorama - Jean Seiler SofteamNoSQL panorama - Jean Seiler Softeam
NoSQL panorama - Jean Seiler SofteamTelecomValley
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Base de données graphe, Noe4j concepts et mise en oeuvre
Base de données graphe, Noe4j concepts et mise en oeuvreBase de données graphe, Noe4j concepts et mise en oeuvre
Base de données graphe, Noe4j concepts et mise en oeuvreMICHRAFY MUSTAFA
 
Les modèles NoSQL
Les modèles NoSQLLes modèles NoSQL
Les modèles NoSQLebiznext
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi
 
Cours Big Data Chap5
Cours Big Data Chap5Cours Big Data Chap5
Cours Big Data Chap5Amal Abid
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j
 
BigData_Chp3: Data Processing
BigData_Chp3: Data ProcessingBigData_Chp3: Data Processing
BigData_Chp3: Data ProcessingLilia Sfaxi
 
Introduction aux bases de données NoSQL
Introduction aux bases de données NoSQLIntroduction aux bases de données NoSQL
Introduction aux bases de données NoSQLAntoine Augusti
 
Introduction aux RDF & SPARQL
Introduction aux RDF & SPARQLIntroduction aux RDF & SPARQL
Introduction aux RDF & SPARQLOpen Data Support
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 

What's hot (20)

Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Les Base de Données NOSQL -Presentation -
Les Base de Données NOSQL -Presentation -Les Base de Données NOSQL -Presentation -
Les Base de Données NOSQL -Presentation -
 
NoSQL panorama - Jean Seiler Softeam
NoSQL panorama - Jean Seiler SofteamNoSQL panorama - Jean Seiler Softeam
NoSQL panorama - Jean Seiler Softeam
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Base de données graphe, Noe4j concepts et mise en oeuvre
Base de données graphe, Noe4j concepts et mise en oeuvreBase de données graphe, Noe4j concepts et mise en oeuvre
Base de données graphe, Noe4j concepts et mise en oeuvre
 
Les modèles NoSQL
Les modèles NoSQLLes modèles NoSQL
Les modèles NoSQL
 
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-Reduce
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Cours Big Data Chap5
Cours Big Data Chap5Cours Big Data Chap5
Cours Big Data Chap5
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
 
BigData_Chp3: Data Processing
BigData_Chp3: Data ProcessingBigData_Chp3: Data Processing
BigData_Chp3: Data Processing
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Introduction aux bases de données NoSQL
Introduction aux bases de données NoSQLIntroduction aux bases de données NoSQL
Introduction aux bases de données NoSQL
 
Introduction aux RDF & SPARQL
Introduction aux RDF & SPARQLIntroduction aux RDF & SPARQL
Introduction aux RDF & SPARQL
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Mahout clustering
Mahout clusteringMahout clustering
Mahout clustering
 

Viewers also liked

Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap Dr. Mohan K. Bavirisetty
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Dr. Mohan K. Bavirisetty
 
8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise ModernizartionDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmapDr. Mohan K. Bavirisetty
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmapDr. Mohan K. Bavirisetty
 
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0Dr. Mohan K. Bavirisetty
 

Viewers also liked (7)

Data scientist enablement dse 400 week 4 roadmap
Data scientist enablement   dse 400   week 4 roadmap Data scientist enablement   dse 400   week 4 roadmap
Data scientist enablement dse 400 week 4 roadmap
 
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
Advanced Analytics - Frameworks, Platforms and Metholodologies v 1.0
 
8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion8 disciplines of Enterprise Modernizartion
8 disciplines of Enterprise Modernizartion
 
Data scientist enablement dse 400 week 5 roadmap
Data scientist enablement   dse 400   week 5 roadmapData scientist enablement   dse 400   week 5 roadmap
Data scientist enablement dse 400 week 5 roadmap
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Data scientist enablement dse 400 week 3 roadmap
Data scientist enablement   dse 400   week 3 roadmapData scientist enablement   dse 400   week 3 roadmap
Data scientist enablement dse 400 week 3 roadmap
 
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0Mohan k. bavirisetty    introduction to semantic soa & bpm sept 14 2010 v 1.0
Mohan k. bavirisetty introduction to semantic soa & bpm sept 14 2010 v 1.0
 

Similar to Polyglot Processing - An Introduction 1.0

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMAlluxio, Inc.
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
 
Spark, spark streaming & tachyon
Spark, spark streaming & tachyonSpark, spark streaming & tachyon
Spark, spark streaming & tachyonJohan hong
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 

Similar to Polyglot Processing - An Introduction 1.0 (20)

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
INFO491FinalPaper
INFO491FinalPaperINFO491FinalPaper
INFO491FinalPaper
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Spark, spark streaming & tachyon
Spark, spark streaming & tachyonSpark, spark streaming & tachyon
Spark, spark streaming & tachyon
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 

More from Dr. Mohan K. Bavirisetty

Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 RoadmapDr. Mohan K. Bavirisetty
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr. Mohan K. Bavirisetty
 
Building Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalBuilding Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 6 roadmap
Data scientist enablement   dse 400   week 6 roadmapData scientist enablement   dse 400   week 6 roadmap
Data scientist enablement dse 400 week 6 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 week 2 roadmap
Data scientist enablement   dse 400   week 2 roadmapData scientist enablement   dse 400   week 2 roadmap
Data scientist enablement dse 400 week 2 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 - week 1 roadmap
Data scientist enablement   dse 400 - week 1 roadmapData scientist enablement   dse 400 - week 1 roadmap
Data scientist enablement dse 400 - week 1 roadmapDr. Mohan K. Bavirisetty
 
Data scientist enablement dse 400 - week 1
Data scientist enablement   dse 400 - week 1Data scientist enablement   dse 400 - week 1
Data scientist enablement dse 400 - week 1Dr. Mohan K. Bavirisetty
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Dr. Mohan K. Bavirisetty
 

More from Dr. Mohan K. Bavirisetty (11)

Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
 
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - FinalDr  Mohan K  Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
Dr Mohan K Bavirisetty - 8 Disciplines of Enterprise Modernization - Final
 
Building Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 FinalBuilding Big Data Analytics Center of Excellence v 3.0 Final
Building Big Data Analytics Center of Excellence v 3.0 Final
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Data scientist enablement dse 400 week 7 roadmap
Data scientist enablement   dse 400   week 7 roadmapData scientist enablement   dse 400   week 7 roadmap
Data scientist enablement dse 400 week 7 roadmap
 
Data scientist enablement dse 400 week 6 roadmap
Data scientist enablement   dse 400   week 6 roadmapData scientist enablement   dse 400   week 6 roadmap
Data scientist enablement dse 400 week 6 roadmap
 
Data scientist enablement dse 400 week 2 roadmap
Data scientist enablement   dse 400   week 2 roadmapData scientist enablement   dse 400   week 2 roadmap
Data scientist enablement dse 400 week 2 roadmap
 
Data scientist enablement dse 400 - week 1 roadmap
Data scientist enablement   dse 400 - week 1 roadmapData scientist enablement   dse 400 - week 1 roadmap
Data scientist enablement dse 400 - week 1 roadmap
 
Data scientist enablement dse 400 - week 1
Data scientist enablement   dse 400 - week 1Data scientist enablement   dse 400 - week 1
Data scientist enablement dse 400 - week 1
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence
 

Polyglot Processing - An Introduction 1.0

  • 1. POLYGLOT PROCESSING – AN INTRODUCTION Dr. Mohan K. Bavirisetty Chief Scientist Modern Renaissance
  • 2. Agenda 1. Big Data Landscape 2. Lambda vs. Kappa Architecture 3. Spark vs. Storm vs. Flink 4. Demo 1 – Apache Spark 5. Demo 2 – Storm, Kafka and Redis 6. Demo 3 – Flink with Data Stream API? 7. Summary 8. Questions The purpose of computing is insight not data – Richard Hamming
  • 4. What is Big Data? Big data is high-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making. Source: Gartner Research
  • 5.
  • 6. What is a Real-time Analytics Platform?
  • 7. • Batch Operations1 • Micro batch Operations2 • Real-time Streaming3 3 Common Kinds of Workloads “Evidence-based decision-making (aka Big Data) is not just the latest fad, it's the future of how we are going to guide and grow business.” – Kristen Hammond, CTO, Narrative Sciences
  • 8. 8 Requirements of Real-time Computing Keep Data Moving Allow SQL Queries Handle Stream Imperfections Generate Predictable Outcomes Integrate Streaming Data and Stored Data Guarantee Data Safety and Availability Partition and Scale Applications Automatically Process and Respond Instantaneously
  • 9. How do major data engines compare?
  • 12. Polyglot ….. •One who is versed in many languages …Polyglot •Different languages, frameworks and services •Example Java with Scala, Clojure inside Trident Polyglot Programming •Capacity to store data in multiple formats •Structured, document, Log, GPS Polyglot Persistence •Refers to capability to process any kind of data, any kind of workload, any kind of workflow Polyglot Processing
  • 13.
  • 16. What is Apache Storm? Apache Storm is a free and open source distributed real-time computation system it makes it easy to reliably process unbounded streams of data.
  • 17. Why Apache Storm? Storm is fast, horizontally scalable, fault-tolerant, easy to setup and operate and programming language agnostic
  • 19. Apache Storm can be used to realize an APM Use Case
  • 20. Apache Spark Apache Spark is a fast and general engine for large-scale data processing. • Spark is fast • Spark is easy • Spark is extensible
  • 24. Apache Flink has unified runtime engine
  • 25.
  • 28. Summary • Big Data Challenges are being met with new and innovative approaches and architectures. • Lambda Architecture is a pragmatic near-term solution. Fidelity is already implementing it. • Kappa Architecture could turn out to be long-term elegant solution to Polyglot Processing. • Apache Spark, Strom and Flink have their strengths and niche areas of applicability. • Apache Samoa, Apache Zappelin and Tacheon add value further by providing additional capabilities
  • 30. Next Stage of Data Explosion
  • 31. QUESTIONS? We do not learn by inference and deduction and the application of mathematics to philosophy, but by direct intercourse … - Henry David Thoreau
  • 33. Appendix- References and Resources • 8 Requirements of Real-time Stream Processing http://cs.brown.edu/~ugur/8rulesSigRec.pdf • Design Patterns for Real-Time Streaming Analytics http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774 • Big Data: Principles and Best Practices of Scalable Real-time Data Systems. http://bit.ly/1LscB7z • Real-time Stream Processing Next-Step for Apache Flink http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/ • SAMOA – Scalable Advanced Massive Online Analysis http://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf • Lambda Architecture http://lambda-architecture.net/ • Kappa Architecture http://www.kappa-architecture.com/ • Apache Spark http://spark.apache.org/ • Apache Storm https://storm.apache.org/ • Apache Flink https://flink.apache.org/ • Apache SAMOA https://samoa.incubator.apache.org/ • Apache Zappelin https://zeppelin.incubator.apache.org/ • Tacheon http://tachyon-project.org/