INTERFACE, by apidays - APIs: the next 10 years
June 8, 9 & 10 2022
The Evolution of Data Movement
Michel Tricot, Co-founder and CEO at Airbyte
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
2. 2022 SERIES OF EVENTS
New York
JULY
(HYBRID)
Australia
SEPTEMBER
(HYBRID)
Singapore
APRIL
(VIRTUAL)
Helsinki & North
MARCH
(VIRTUAL)
Paris
DECEMBER
(HYBRID)
London
OCTOBER
(HYBRID)
Hong Kong
AUGUST
(VIRTUAL)
JUNE (VIRTUAL)
India
MAY
(VIRTUAL)
APRIL (VIRTUAL)
Dubai & Middle East
JUNE
(VIRTUAL)
Check out our API Conferences
www.a pida ys .globa l
Want to talk at one of our conferences?
apidays.typeform.com/to/ILJeAaV8
3. Airbyte
Open-Source data integration
30,000 Deployments
7,900 Slack members
7,000 GitHub stars
Hello!
I am Michel Tricot
Co-Founder & CEO of Airbyte
@MichelTricot
michel-tricot
/in/micheltricot
5. The rise of the Cloud
compute era
1. Exponential growth in the amount of data sources and
data
2. Plummeting cost of cloud-based computation and
storage
➡ Data consumption model has
changed
6. APIs are ubiquitous
➡ Data access model has changed
1. APIs are both a product and a datastore
2. Data is siloed and access has become a key challenge
7. Extract - Load - Transform
A new paradigm for modern teams
ELT is replacing ETL
8. Extract
Source-specific routines
to pull selected data from
an external system.
Transform
Business logic specific to
your organization to serve
an analytics or
operational use case.
Load
Destination specific
routines to push data
where it is going to be
consumed.
9. ETL doesn’t work in today’s world
Inflexible
● Friction when
changing an existing
pipeline.
● Hard to add new
data.
● Most issues force
data to be
re-extracted.
Lack of Autonomy
● Warehouses made data
consumers more autonomous.
● Changes require engineering
involvement.
Complex
● Custom DSL.
● Force adoption of a
data stack.
● Address 70% of the
needs, 30% still
built and
maintained
in-house.
10. Extract
General-purpose routines
to pull selected data from
a source.
Load
General-purpose routines
to push raw data where it
is going to be consumed.
Transform
Business logic specific to
your organization to serve
an analytics or operational
use case with SQL / dbt / ...
11. ELT fixes the ETL-related issues
Flexibility
● All the data available
on the destination.
● Data consumers are
free to use what they
need for the insights
they want.
Autonomy
● Data consumers can
leverage SQL queries to
transform the data the way
they want.
● No need to involve the
engineering team.
Future proof
● Issues during
transformation don’t
prevent access to the
data.
● Easy to update
transformation
schemas.
12. What about
the long-tail of APIs?
1,000's of new apps/APIs emerging every year
➡ Data is more and more fragmented
➡ Rising need to break down data silos
13. Open-source communities
solve the long-tail of APIs
1. Don’t reinvent the wheel, leverage existing connectors
2. Share the work of maintenance across a community
OSS is the only way to solve data integration
14. Developer tooling is crucial
We empower people to build good connectors
with the Airbyte CDK
1. Offer developers tools
2. Build developer leverage
15. Predictions for APIs
An API is not just about exposing data, it is the
programmatic version of a product with all the
business logic that ties to it.
Because of it, there will always be
fragmentation in the API world and the need
to cover the long tail to break down these silos.
19. Limitations of current ELT explain the
growing need for data engineers.
Only the most popular connectors
They plateau at ~170 connectors, and can’t cover the long tail
because of maintenance costs and ROI consideration.
Can’t handle custom use cases
Customers can't customize pre-built connectors, nor create new ones.
Counter-productive row-based
pricing
Charging on active rows prevents mid- and high-scale replications
(APIs, databases...) and is unpredictable.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tempor
incididunt ut labore et
dolore magna aliqua. Ut
enim ad minim veniam,
quis nostrud
exercitation ullamco
laboris nisi ut aliquip
ex ea commodo
consequat. Duis aute
irure dolor in
reprehenderit in
voluptate velit esse
cillum dolore eu fugiat
nulla pariatur.
Excepteur sint occaecat
cupidatat non proident,
sunt in culpa qui
officia deserunt mollit
anim id est laborum.
X X
20. Data Engineers need a scalable
way to cover all data pipelines
Covers the long tail of connectors
Extensible and non-opinionated to
address your exact needs
A fair compute-based pricing
26. We grew the biggest community
around data integration. [updated]
GitHub stars Slack members Code contributors
0
2,000
4,000
6,000
0 0
Oct. Jan. Apr. Jul. Sep.
Grouparoo Rudderstack Meltano
Nov.
Oct. Jan. Apr. Jul. Sep. Nov.
Oct. Jan. Apr. Jul. Sep. Nov.
Airbyte
2,000
4,000
6,000
100
200
300
27. “We are past the golden age of
Hadoop and Spark”
28. Topics (notes from our call with event organizers)
*they do want Michel to talk about whatever he thinks is important*
20 min talk + 5 min Q&A
Talking at 10:40am PST on 6/8
Need a slide deck
Michel will be speaking directly after the Keynote speaker (author of Platform Revolution)
Some ideas for the talk:
1. APIs
2. OSS connectors
3. The whole vision
a. Why it makes sense to have OSS connectors
b. Why is makes sense to maintain certain APIs
c. “Airbyte has the community and platform to rule them all”
4. Integration is fragmented
a. History of integrations and types of integrations overview
5. He can do a plug for maintainer program and ask people contribute to airbyte
a. This is the best community/audience to give a call to action to contribute to Airbyte
They really want to hear about the Airbyte’s VISION
● Moving data from A to B
● Community led growth
● Long-tail of APIs
● How we see APIs changing and evolving
● Fragmentation in integrations today is a “trillion dollar issue” and airbyte aims to be the platform to solve it all
Title for the talk: The Evolution of Data Movement
29. Potential agenda (in order)
*This is the airbyte vision + our thoughts on evolution of data movement
1. API Evolution 1990 → 2000→ Today (Cheaper Storage move all data)
2. And now ETL —> ELT
3. To solve the long-tail of APIs, you need a Community based approach
4. OSS - why it’s critical for the future of API integrations (and the scalability of it)
5. CDK: Why developer tooling is important (API Specific)
6. Future predictions for APIs?
30. 1890’s Data Movement and Analytics
In 1880, prior to computers, it took over seven years for the U.S.
Census Bureau to process the collected information and complete
a final report. In response, inventor Herman Hollerith produced the
“tabulating machine,” which was used in the 1890 census. The
tabulating machine could systematically process data recorded on
punch cards. With this device, the 1890 census was finished in 18
months.
Interesting Read -
https://www.dataversity.net/brief-history-analytics/#