SlideShare a Scribd company logo
The Evolving Landscape of
Data Engineering
Andrei Savu - Event Co-organizer
Staff Engineer @ Twitter
Follow me @andreisavu
Andrei Savu
Staff Engineer @ Twitter:
* MoPub Backend & Data Pipelines
* Mobile App Monetization
Co-organizer of the Data Engineering
Club.
Previously Tech Lead at Cloudera via
the Axemblr acquisition. Started the
Cloud engineering team.
The Past:
● OSS communities
● AWS history
● Google Cloud history
The Present: Patterns
The Future: Wish List
Topics
Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS
Visionary Business
Fast iterations
Data Management as a
key platform use case
Incredible Scale
Transition to “serverless”
OPEX & Elastic
The Past - AWS
Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud
The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature.
We have a broad set of options.
Big Data is much Bigger (e.g. x1e.32xlarge: 3TB
mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Cluster locality is important.
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
Various flavors of containers.
The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Scalability
has a cost.
Thanks!
Join the community on Meetup.com!
www.meetup.com/Data-Engineering-Club
www.dataeng.club
Do you want to present? Get in touch.
Feedback #dataengclub

More Related Content

What's hot

Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
Wlodek Bielski
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data TricksCloudera, Inc.
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
Derek Gottfrid
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
Willem Hendriks
 
Momentum v2.0
Momentum v2.0Momentum v2.0
Momentum v2.0
Shamshad Ansari
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
SingleStore
 
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
PROIDEA
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google Cloud
Tu Pham
 
Google cloud
Google cloudGoogle cloud
Google cloud
vipin ojha
 
re:Invent re:Peat
re:Invent re:Peatre:Invent re:Peat
re:Invent re:Peat
Steve Houël
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Satwik resume
Satwik resumeSatwik resume
Satwik resume
Satwik Mishra
 
Cloud computer
Cloud computerCloud computer
Cloud computer
Rajesh Kumar
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
Data Driven Innovation
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Coburn Watson
 
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
TeK Charnsilp Chinprasert
 
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaNear-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Radu Vunvulea
 

What's hot (20)

Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data Tricks
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
 
Momentum v2.0
Momentum v2.0Momentum v2.0
Momentum v2.0
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
 
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google Cloud
 
Google cloud
Google cloudGoogle cloud
Google cloud
 
re:Invent re:Peat
re:Invent re:Peatre:Invent re:Peat
re:Invent re:Peat
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Satwik resume
Satwik resumeSatwik resume
Satwik resume
 
Cloud computer
Cloud computerCloud computer
Cloud computer
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
 
Resume_Kai Qian
Resume_Kai QianResume_Kai Qian
Resume_Kai Qian
 
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaNear-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
 

Similar to The Evolving Landscape of Data Engineering

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
Andrei Savu
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
lantianlcdx
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
Netcetera
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
Collin Bennett
 
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Amazon Web Services
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
Data Con LA
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the Cloud
Amazon Web Services
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
Tu Pham
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
Tu Le Dinh
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
Amazon Web Services
 
Cloud Computing Fundamentals
Cloud Computing FundamentalsCloud Computing Fundamentals
Cloud Computing Fundamentals
Aravind Udayashankara
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
Jongwook Woo
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
webscale
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
Amazon Web Services
 

Similar to The Evolving Landscape of Data Engineering (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
 
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the Cloud
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
Cloud Computing Fundamentals
Cloud Computing FundamentalsCloud Computing Fundamentals
Cloud Computing Fundamentals
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
 

More from Andrei Savu

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
Andrei Savu
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
Andrei Savu
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
Andrei Savu
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
Andrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
Andrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverAndrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
Andrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudAndrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
Andrei Savu
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 

More from Andrei Savu (20)

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 

The Evolving Landscape of Data Engineering

  • 1. The Evolving Landscape of Data Engineering Andrei Savu - Event Co-organizer Staff Engineer @ Twitter Follow me @andreisavu
  • 2. Andrei Savu Staff Engineer @ Twitter: * MoPub Backend & Data Pipelines * Mobile App Monetization Co-organizer of the Data Engineering Club. Previously Tech Lead at Cloudera via the Axemblr acquisition. Started the Cloud engineering team.
  • 3. The Past: ● OSS communities ● AWS history ● Google Cloud history The Present: Patterns The Future: Wish List Topics
  • 4. Weeks of Provisioning Static Infrastructure Commodity Hardware Commodity Networking Data Locality Important Running in the Public Cloud was unusual CAPEX The Past - OSS
  • 5. Visionary Business Fast iterations Data Management as a key platform use case Incredible Scale Transition to “serverless” OPEX & Elastic The Past - AWS
  • 6. Visionary Products Fast iterations Machine Learning as a key use case State of the Art data platform Last 3 years on fast forward Intelligent Billing OPEX & Elastic The Past - Google Cloud
  • 7. The Present: Patterns Weeks to Minutes to Seconds Hadoop/Spark ecosystem is mature. We have a broad set of options. Big Data is much Bigger (e.g. x1e.32xlarge: 3TB mem, 128 vCPUs, 14Gbps network) Scale continues to be hard. Cloud economics can be very disruptive (especially for data workloads) High-performance networks are common. Storage can be decoupled from compute. Cluster locality is important. Service Endpoints (not clusters, aka serverless, aka managed etc.). Sophisticated Auto-scaling (batch & streaming, spot vs. on-demand, multi-az). Multi-DC and Multi-Region from Day 1. Various flavors of containers.
  • 8. The Future: Wish List A Data Catalog product as the center of the universe. Data Monitoring Systems: * statistical properties, anomaly detection, schema changes, consumption patterns etc. More intelligence at the data infrastructure level: * data format migrations, intelligent caching based on access patterns. Declarative data transformation vs. explicit ETL. Intelligent data sampling products. Scalability has a cost.
  • 9. Thanks! Join the community on Meetup.com! www.meetup.com/Data-Engineering-Club www.dataeng.club Do you want to present? Get in touch. Feedback #dataengclub