The Future of Data Pipelines

•

0 likes•96 views

All Things Open

Presented at: All Things Open 2019 Presented by: John Hammink, Aiven.io

Technology

The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
The Future of Data
Pipelines
John Hammink
Developer Advocate, Evangelist @ Aiven
john@aiven.io
All Things Open 2019, Raleigh NC

The Future of Data Pipelines
● Let’s start with where the future looked
from about a year ago
● Let’s assume that pub-sub and particularly
technologies like Kafka have created a pivot
point
● Let’s follow up with a discussion of trends
we’re all seeing
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

The Future of Data Pipelines
● Background
● Learnings
● Some numbers on data
● Functionality
● Design
● Compliance
● Reliability/Performance/Usability/Other
● Conclusions
● Q & A
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

An arbitrary timeline of data
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

What have we learned?
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Pipeline data has
grown/transformed
in terms of:

The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Core-to-edge computing

The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Functionality
● can autoscale, shard,
tolerate partitions
● can capture, fix and
requeue errored
events
● can be troubleshot
and configured on fly
● data can be used for
AI/ML
● agnostic:
accomodates all
possible formats
● can self-perpetuate
and automate
continuous
improvements

source: kafka.apache.org
Source: Martin Kleppmann’s Kafka Summit 2018 Presentation, Is Kafka a Database? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

Other design considerations
● Kill switch - when things go wrong; a way to stem the data
ﬂow.
● Query on the streams - like KSQL, Apache Flink, SQL on
Amazon Kinesis, etc. Ultimately: one query interface for all
data.
● Need to accomodate - core-to-edge, growing data volume,
near real-time velocity, bi-directionality
● Distributed ledgers - pipelines can support these as
tunably consistent distributed datastores
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

Compliance
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

Reliability/Performance/Usability/Other
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
● MTTF → ∞
● Components: from interpreted to native?
● Metadata: will continue to transform, to offer even more space
savings.
● Pipelines: from hard-coded to drag and drop designs?
● Which data is left volatile and which is stored?

Wrapping up
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Check out The Future of Data Pipelines at
https://aiven.io/blog/the-future-of-data-pipelines/

Discussion
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

What's hot

Is it harder to find a taxi when it is raining? Wilfried Hoge

Data Warehouse IntroductionAkkal Bahadur Bist

Neo4j GraphTour New York_Thomson Reuters SSNeo4j

Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagelsemanticsconference

Leveraging big data to maximize value from rail and power infrastructure assets.Chijioke “CJ” Ejimuda

Nyc web perf-final-july-23Dan Boutin

Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]Michelle Ufford

What's So Unique About a Columnar Database?FlyData Inc.

GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov

Driving the On-Demand Economy with Predictive AnalyticsSingleStore

Railroad Modeling at Hadoop ScaleDataWorks Summit

"Interactive Deep Analytics" DashboardYaniv Shalev

Idea behind Apache HivemallMakoto Yui

OpenAGRIS: using bibliographical data for linking into the agricultural knowl...AIMS (Agricultural Information Management Standards)

Creating stunning data analytics dashboard using php and flex10n Software, LLC

TechEvent biGenius What's NewTrivadis

Scalability and Autonomous AnalyticsInspirient

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...European Data Forum

Collections as Data National Forum (Elings)Mary Elings

Time travel and time series analysis with pandas + statsmodelsAlexander Hendorf

What's hot (20)

Is it harder to find a taxi when it is raining?

Data Warehouse Introduction

Neo4j GraphTour New York_Thomson Reuters SS

Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel

Leveraging big data to maximize value from rail and power infrastructure assets.

Nyc web perf-final-july-23

Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]

What's So Unique About a Columnar Database?

GraphDB Connectors – Powering Complex SPARQL Queries

Driving the On-Demand Economy with Predictive Analytics

Railroad Modeling at Hadoop Scale

"Interactive Deep Analytics" Dashboard

Idea behind Apache Hivemall

OpenAGRIS: using bibliographical data for linking into the agricultural knowl...

Creating stunning data analytics dashboard using php and flex

TechEvent biGenius What's New

Scalability and Autonomous Analytics

EDF2013: Selected Talk, Peter Haase: Optique: Scalable End-User Access to Big...

Collections as Data National Forum (Elings)

Time travel and time series analysis with pandas + statsmodels

Similar to The Future of Data Pipelines

The Future of Data PipelinesAll Things Open

Integrating Semantic Web in the Real World: A Journey between Two Cities Juan Sequeda

Netsoft19 Keynote: Fluid Network PlanesChristian Esteve Rothenberg

Network of Networks Discussion and Charter DocumentLora Cecere

Eecs6893 big dataanalytics-lecture1Aravindharamanan S

Yuan ding resumeYuan Ding

Present and future of unified, portable, and efficient data processing with A...DataWorks Summit

District Level IntegrationJohn Teeter

Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffHong-Linh Truong

Flink Meetup Septmeber 2017 2018Christos Hadjinikolis

How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...Brett Sheppard

Building the Data-Driven OrganizationLora Cecere

Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane

Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...confluent

World of Oracle Eloqua ReportingRon Corbisier

Modelers Datahub: Tool to manage model dataIEA-ETSAP

OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann

Openbar 2 - Leuven - Faros - Invisible InfrastructureOpenbar

Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks

Connect Data and Devices with Apache NiFiData Works MD

Similar to The Future of Data Pipelines (20)

The Future of Data Pipelines

Integrating Semantic Web in the Real World: A Journey between Two Cities

Netsoft19 Keynote: Fluid Network Planes

Network of Networks Discussion and Charter Document

Eecs6893 big dataanalytics-lecture1

Yuan ding resume

Present and future of unified, portable, and efficient data processing with A...

District Level Integration

Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff

Flink Meetup Septmeber 2017 2018

How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...

Building the Data-Driven Organization

Cloud-Scale BGP and NetFlow Analysis

Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...

World of Oracle Eloqua Reporting

Modelers Datahub: Tool to manage model data

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

Openbar 2 - Leuven - Faros - Invisible Infrastructure

Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark

Connect Data and Devices with Apache NiFi

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance

Top 10 CodeIgniter Development CompaniesTopCSSGallery

State of the Smart Building Startup Landscape 2024!Memoori

Generative AI Use Cases and Applications.pdfalexjohnson7307

Introduction to use of FHIR Documents in ABDMKumar Satyam

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc

How to Check GPS Location with a Live Tracker in Pakistandanishmna97

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz

Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55

Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance

Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson

الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda

How we scaled to 80K users by doing nothing!.pdfSrushith Repakula

How to Check CNIC Information Online with Pakdata cfdanishmna97

Google I/O Extended 2024 WarsawGDSC PJATK

Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance

Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx

Top 10 CodeIgniter Development Companies

State of the Smart Building Startup Landscape 2024!

Generative AI Use Cases and Applications.pdf

Introduction to use of FHIR Documents in ABDM

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

How to Check GPS Location with a Live Tracker in Pakistan

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)

Oauth 2.0 Introduction and Flows with MuleSoft

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots

الأمن السيبراني - ما لا يسع للمستخدم جهله

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...

How we scaled to 80K users by doing nothing!.pdf

How to Check CNIC Information Online with Pakdata cf

Google I/O Extended 2024 Warsaw

Introduction to FIDO Authentication and Passkeys.pptx

Intro to Passkeys and the State of Passwordless.pptx

The Future of Data Pipelines

1. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 The Future of Data Pipelines John Hammink Developer Advocate, Evangelist @ Aiven john@aiven.io All Things Open 2019, Raleigh NC

2. The Future of Data Pipelines ● Let’s start with where the future looked from about a year ago ● Let’s assume that pub-sub and particularly technologies like Kafka have created a pivot point ● Let’s follow up with a discussion of trends we’re all seeing The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

3. The Future of Data Pipelines ● Background ● Learnings ● Some numbers on data ● Functionality ● Design ● Compliance ● Reliability/Performance/Usability/Other ● Conclusions ● Q & A The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

4. An arbitrary timeline of data The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

5. What have we learned? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Pipeline data has grown/transformed in terms of:

6. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

7. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Core-to-edge computing

8. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Functionality ● can autoscale, shard, tolerate partitions ● can capture, fix and requeue errored events ● can be troubleshot and configured on fly ● data can be used for AI/ML ● agnostic: accomodates all possible formats ● can self-perpetuate and automate continuous improvements

9. source: kafka.apache.org Source: Martin Kleppmann’s Kafka Summit 2018 Presentation, Is Kafka a Database? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

10. Other design considerations ● Kill switch - when things go wrong; a way to stem the data ﬂow. ● Query on the streams - like KSQL, Apache Flink, SQL on Amazon Kinesis, etc. Ultimately: one query interface for all data. ● Need to accomodate - core-to-edge, growing data volume, near real-time velocity, bi-directionality ● Distributed ledgers - pipelines can support these as tunably consistent distributed datastores The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

11. Compliance The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

12. Reliability/Performance/Usability/Other The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 ● MTTF → ∞ ● Components: from interpreted to native? ● Metadata: will continue to transform, to offer even more space savings. ● Pipelines: from hard-coded to drag and drop designs? ● Which data is left volatile and which is stored?

13. Wrapping up The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Check out The Future of Data Pipelines at https://aiven.io/blog/the-future-of-data-pipelines/

14. Discussion The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

The Future of Data Pipelines

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Future of Data Pipelines

Similar to The Future of Data Pipelines (20)

More from All Things Open

More from All Things Open (20)

Recently uploaded

Recently uploaded (20)

The Future of Data Pipelines