SlideShare a Scribd company logo
1 of 48
Aljoscha Krettek - Co-Founder & Software Engineer at data Artisans
THE EVOLUTION OF
(OPEN SOURCE) DATA PROCESSING
© 2018 data Artisans2
ABOUT DATA ARTISANS
Original Creators of
Apache Flink®
RealTime Stream Processing Enterprise
Ready
© 2018 data Artisans3
POWERED BY APACHE FLINK
© 2018 data Artisans4
Disclaimer:
I might forget systems or misrepresent their use or
when they were created.This is not intentional. Please
come discuss with me afterwards!
© 2018 data Artisans5
How do we process data and what are the systems
available to us?
© 2018 data Artisans6
PRE-HISTORIC
© 2018 data Artisans7
Purpose-built
programs
Since the beginning of computers.
© 2018 data Artisans8
Programming is kinda hard.
Data analysis is only available to a
small circle of
programmers/engineers.
© 2018 data Artisans9
(Big) Data Bases
Since the 1970s
© 2018 data Artisans10
SQL is approachable to a wider
range of people.
Data analysis is no longer
restricted to “programmers”.
There are even tools that create
SQL: BI tools and whatnot.
© 2018 data Artisans11
Application Services
talking to data
bases, event-driven
applications
Since quite a while… 😉
© 2018 data Artisans12
THE ADVENT OF BIG DATA
© 2018 data Artisans13
MapReduce
2004
© 2018 data Artisans14
Apache Hadoop®
2006
© 2018 data Artisans15
Store first, ask
questions later*
* we’ll get back to this later
© 2018 data Artisans16
Programming is kinda hard.
Data analysis is only available to a
small circle of
programmers/engineers.
© 2018 data Artisans17
Apache Hive™ 2009
Apache Pig™
2008
*it’s tricky with release dates and when they incubated and whatnot
© 2018 data Artisans18
SQL is approachable to a wider
range of people.
Data analysis is no longer
restricted to “programmers”.
There are even tools that create
SQL: BI tools and whatnot.
© 2018 data Artisans19
Apache Spark™
2012? – non-apache release
2014 – first apache release
© 2018 data Artisans20
THE RISE OF STREAMING
© 2018 data Artisans21
Apache Storm™
2011 – first non-apache release
2014 – Storm 0.9.1, first Apache release
© 2018 data Artisans22
Apache Kafka®
2011 – non-apache release
2013 – first apache release
© 2018 data Artisans23
Lambda Architecture
At some point in between.
Was a bit of a dead end.
© 2018 data Artisans24
Apache Flink®
2010 - under the name Stratosphere
2014 - Flink 0.6, first Apache release
2015 – Flink 0.9, first release with exactly-once stream processing
© 2018 data Artisans25
Reliable Stream Processing
No more need for the lambda architecture.
© 2018 data Artisans26
Ask questions
first, then wait for
things to happen*
* i.e., we put in place a program, and get real-
time updates when things happen
© 2018 data Artisans27
And of course…
Programming this was hard.
Then we had “SQL” on streams.
© 2018 data Artisans28
APACHE FLINK
© 2018 data Artisans29
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
The processing landscape
© 2018 data Artisans30
What’s in a processing system/framework?
1.
Engine
2.
APIs
3.
Connectors
© 2018 data Artisans31
1. Flink Engine
Deployment
• YARN
• Mesos
• Kubernetes
• Resource elasticity
Stateful stream processing
• Network shuffle
• State & timers
• Fault tolerance
• Exactly once
• Savepoints
© 2018 data Artisans32
2. Flink APIs
DataSet API
DataStream API
Table API/SQL
and more …
© 2018 data Artisans33
2. Flink APIs – DataStream API
• Stateful stream processing
• Windowing
• State & timers
• Complete control over what is
going on
© 2018 data Artisans34
2. Flink APIs –Table API/SQL
• Declarative/relational API
• “No programming required” SQL (ANSI SQL)
• Same SQL for batch and streaming
• Pluggable connectors / data formats
© 2018 data Artisans35 https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
© 2018 data Artisans36
3. Flink Connectors
The usual suspects: Kafka, Kinesis, HDFS/S3,
Elasticsearch, Cassandra, …
Table API / SQL has a modular library of connectors &
formats that can be extended by users.
© 2018 data Artisans37
SQL connector definition
- name:TaxiRides
type: source
update-mode: append
schema:
- name: rideId
type: LONG
- name: rowTime
type:TIMESTAMP
rowtime:
timestamps:
type: "from-field"
from: "rideTime"
watermarks:
type: "periodic-bounded"
delay: "60000"
- name: isStart
type: BOOLEAN
- name: lon
type: FLOAT
- name: lat
type: FLOAT
- name: taxiId
type: LONG
- name: driverId
type: LONG
- name: psgCnt
type: INT
connector:
property-version: 1
type: kafka
version: 0.11
topic:TaxiRides
startup-mode: earliest-offset
properties:
- key: zookeeper.connect
value: zookeeper:2181
- key: bootstrap.servers
value: kafka:9092
- key: group.id
value: testGroup
format:
property-version: 1
type: json
schema: "ROW(rideId LONG, isStart,
BOOLEAN, rideTimeTIMESTAMP,
lon FLOAT, lat FLOAT, psgCnt INT,
taxiId LONG, driverId LONG)"
© 2018 data Artisans38
DataSetAPI DataStreamAPI
TableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
© 2018 data Artisans39
What’s the next
evolution?
© 2018 data Artisans40
DataSetAPI DataStreamAPI
TableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
* this is where we are now
Different algorithms/data
structures optimized for the
use case.
© 2018 data Artisans41
Grand Unification
Truly unified runtime that adapts to the workload.
Seamless integration of batch and streaming data
sources.
© 2018 data Artisans42
DataStreamAPITableAPI / SQL
batch
streaming analytics &
continuous processing
event-driven applications
offline real-time
* possible future evolution
© 2018 data Artisans43
http://flink.apache.org
THANK YOU!
aljoscha@apache.org
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers
© 2018 data Artisans45
FREE TRIAL DOWNLOAD
data-artisans.com/download
© 2018 data Artisans46
DOWNLOAD REPORT
data-artisans.com/download-report-
stream-processing-da-platform-apache-flink
Stream processing for real-time businesses
powered by Apache FlinkⓇ
© 2018 data Artisans47
BACKUP
© 2018 data Artisans48
Akka
2010 – 0.5, first public release

More Related Content

What's hot

Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Flink Forward
 

What's hot (20)

Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming Benchmark
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearch
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
data Artisans Product Announcement
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 

Similar to The Evolution of (Open Source) Data Processing

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virender
vithakur
 

Similar to The Evolution of (Open Source) Data Processing (20)

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
SETCON'18 - Ilya labacheuski - GraphQL adventures
SETCON'18 - Ilya labacheuski - GraphQL adventuresSETCON'18 - Ilya labacheuski - GraphQL adventures
SETCON'18 - Ilya labacheuski - GraphQL adventures
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virender
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
 

More from Aljoscha Krettek

Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 

More from Aljoscha Krettek (13)

Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink(Past), Present, and Future of Apache Flink
(Past), Present, and Future of Apache Flink
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on Flink
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
 
Adventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and WindowsAdventures in Timespace - How Apache Flink Handles Time and Windows
Adventures in Timespace - How Apache Flink Handles Time and Windows
 
Flink 0.10 - Upcoming Features
Flink 0.10 - Upcoming FeaturesFlink 0.10 - Upcoming Features
Flink 0.10 - Upcoming Features
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Apache Flink Hands-On
Apache Flink Hands-OnApache Flink Hands-On
Apache Flink Hands-On
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

The Evolution of (Open Source) Data Processing

  • 1. Aljoscha Krettek - Co-Founder & Software Engineer at data Artisans THE EVOLUTION OF (OPEN SOURCE) DATA PROCESSING
  • 2. © 2018 data Artisans2 ABOUT DATA ARTISANS Original Creators of Apache Flink® RealTime Stream Processing Enterprise Ready
  • 3. © 2018 data Artisans3 POWERED BY APACHE FLINK
  • 4. © 2018 data Artisans4 Disclaimer: I might forget systems or misrepresent their use or when they were created.This is not intentional. Please come discuss with me afterwards!
  • 5. © 2018 data Artisans5 How do we process data and what are the systems available to us?
  • 6. © 2018 data Artisans6 PRE-HISTORIC
  • 7. © 2018 data Artisans7 Purpose-built programs Since the beginning of computers.
  • 8. © 2018 data Artisans8 Programming is kinda hard. Data analysis is only available to a small circle of programmers/engineers.
  • 9. © 2018 data Artisans9 (Big) Data Bases Since the 1970s
  • 10. © 2018 data Artisans10 SQL is approachable to a wider range of people. Data analysis is no longer restricted to “programmers”. There are even tools that create SQL: BI tools and whatnot.
  • 11. © 2018 data Artisans11 Application Services talking to data bases, event-driven applications Since quite a while… 😉
  • 12. © 2018 data Artisans12 THE ADVENT OF BIG DATA
  • 13. © 2018 data Artisans13 MapReduce 2004
  • 14. © 2018 data Artisans14 Apache Hadoop® 2006
  • 15. © 2018 data Artisans15 Store first, ask questions later* * we’ll get back to this later
  • 16. © 2018 data Artisans16 Programming is kinda hard. Data analysis is only available to a small circle of programmers/engineers.
  • 17. © 2018 data Artisans17 Apache Hive™ 2009 Apache Pig™ 2008 *it’s tricky with release dates and when they incubated and whatnot
  • 18. © 2018 data Artisans18 SQL is approachable to a wider range of people. Data analysis is no longer restricted to “programmers”. There are even tools that create SQL: BI tools and whatnot.
  • 19. © 2018 data Artisans19 Apache Spark™ 2012? – non-apache release 2014 – first apache release
  • 20. © 2018 data Artisans20 THE RISE OF STREAMING
  • 21. © 2018 data Artisans21 Apache Storm™ 2011 – first non-apache release 2014 – Storm 0.9.1, first Apache release
  • 22. © 2018 data Artisans22 Apache Kafka® 2011 – non-apache release 2013 – first apache release
  • 23. © 2018 data Artisans23 Lambda Architecture At some point in between. Was a bit of a dead end.
  • 24. © 2018 data Artisans24 Apache Flink® 2010 - under the name Stratosphere 2014 - Flink 0.6, first Apache release 2015 – Flink 0.9, first release with exactly-once stream processing
  • 25. © 2018 data Artisans25 Reliable Stream Processing No more need for the lambda architecture.
  • 26. © 2018 data Artisans26 Ask questions first, then wait for things to happen* * i.e., we put in place a program, and get real- time updates when things happen
  • 27. © 2018 data Artisans27 And of course… Programming this was hard. Then we had “SQL” on streams.
  • 28. © 2018 data Artisans28 APACHE FLINK
  • 29. © 2018 data Artisans29 batch streaming analytics & continuous processing event-driven applications offline real-time The processing landscape
  • 30. © 2018 data Artisans30 What’s in a processing system/framework? 1. Engine 2. APIs 3. Connectors
  • 31. © 2018 data Artisans31 1. Flink Engine Deployment • YARN • Mesos • Kubernetes • Resource elasticity Stateful stream processing • Network shuffle • State & timers • Fault tolerance • Exactly once • Savepoints
  • 32. © 2018 data Artisans32 2. Flink APIs DataSet API DataStream API Table API/SQL and more …
  • 33. © 2018 data Artisans33 2. Flink APIs – DataStream API • Stateful stream processing • Windowing • State & timers • Complete control over what is going on
  • 34. © 2018 data Artisans34 2. Flink APIs –Table API/SQL • Declarative/relational API • “No programming required” SQL (ANSI SQL) • Same SQL for batch and streaming • Pluggable connectors / data formats
  • 35. © 2018 data Artisans35 https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
  • 36. © 2018 data Artisans36 3. Flink Connectors The usual suspects: Kafka, Kinesis, HDFS/S3, Elasticsearch, Cassandra, … Table API / SQL has a modular library of connectors & formats that can be extended by users.
  • 37. © 2018 data Artisans37 SQL connector definition - name:TaxiRides type: source update-mode: append schema: - name: rideId type: LONG - name: rowTime type:TIMESTAMP rowtime: timestamps: type: "from-field" from: "rideTime" watermarks: type: "periodic-bounded" delay: "60000" - name: isStart type: BOOLEAN - name: lon type: FLOAT - name: lat type: FLOAT - name: taxiId type: LONG - name: driverId type: LONG - name: psgCnt type: INT connector: property-version: 1 type: kafka version: 0.11 topic:TaxiRides startup-mode: earliest-offset properties: - key: zookeeper.connect value: zookeeper:2181 - key: bootstrap.servers value: kafka:9092 - key: group.id value: testGroup format: property-version: 1 type: json schema: "ROW(rideId LONG, isStart, BOOLEAN, rideTimeTIMESTAMP, lon FLOAT, lat FLOAT, psgCnt INT, taxiId LONG, driverId LONG)"
  • 38. © 2018 data Artisans38 DataSetAPI DataStreamAPI TableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time
  • 39. © 2018 data Artisans39 What’s the next evolution?
  • 40. © 2018 data Artisans40 DataSetAPI DataStreamAPI TableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time * this is where we are now Different algorithms/data structures optimized for the use case.
  • 41. © 2018 data Artisans41 Grand Unification Truly unified runtime that adapts to the workload. Seamless integration of batch and streaming data sources.
  • 42. © 2018 data Artisans42 DataStreamAPITableAPI / SQL batch streaming analytics & continuous processing event-driven applications offline real-time * possible future evolution
  • 43. © 2018 data Artisans43 http://flink.apache.org
  • 45. © 2018 data Artisans45 FREE TRIAL DOWNLOAD data-artisans.com/download
  • 46. © 2018 data Artisans46 DOWNLOAD REPORT data-artisans.com/download-report- stream-processing-da-platform-apache-flink Stream processing for real-time businesses powered by Apache FlinkⓇ
  • 47. © 2018 data Artisans47 BACKUP
  • 48. © 2018 data Artisans48 Akka 2010 – 0.5, first public release

Editor's Notes

  1. • data Artisans was founded by the original creators of Apache Flink • We provide dA Platform, a complete stream processing infrastructure with open-source Apache Flink
  2. • These companies are among many users of Apache Flink, and during this conference you’ll meet folks from some of these companies as well as others using Flink. • If your company would like to be represented on the “Powered by Apache Flink” page, email me.
  3. Think Oracle IBM DB2 PostgreSQL MySQL Also think data warehouses, BI tools
  4. There have been other stream processing systems before this but Storm was the most popular, widely used. Open-sourced after being acquired by Twitter.
  5. Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  6. https://data-artisans.com/blog/flink-sql-powerful-querying-of-data-streams
  7. Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  8. Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  9. Batch is really just batch Most streaming analytics cases you can solve by doing more batches Event-driven applications need a streaming system
  10. • Also included is the Application Manager, which turns dA Platform into a self-service platform for stateful stream processing applications. • dA Platform is generally available, and you can download a free trial today!
  11. (Optional slide – may not be appropriate for advanced audience. Helps us capture leads.)