The Evolving Landscape of Data Engineering

The Evolving Landscape of
Data Engineering
Andrei Savu - Event Co-organizer
Staff Engineer @ Twitter
Follow me @andreisavu

Andrei Savu
Staff Engineer @ Twitter:
* MoPub Backend & Data Pipelines
* Mobile App Monetization
Co-organizer of the Data Engineering
Club.
Previously Tech Lead at Cloudera via
the Axemblr acquisition. Started the
Cloud engineering team.

The Past:
● OSS communities
● AWS history
● Google Cloud history
The Present: Patterns
The Future: Wish List
Topics

Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS

Visionary Business
Fast iterations
Data Management as a
key platform use case
Incredible Scale
Transition to “serverless”
OPEX & Elastic
The Past - AWS

Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud

The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature.
We have a broad set of options.
Big Data is much Bigger (e.g. x1e.32xlarge: 3TB
mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Cluster locality is important.
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
Various flavors of containers.

The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Scalability
has a cost.

Thanks!
Join the community on Meetup.com!
www.meetup.com/Data-Engineering-Club
www.dataeng.club
Do you want to present? Get in touch.
Feedback #dataengclub

The Evolving Landscape of Data Engineering

More Related Content

What's hot

Similar to The Evolving Landscape of Data Engineering

More from Andrei Savu

Recently uploaded

The Evolving Landscape of Data Engineering