Version 1.0
Airbyte for Data Engineering
In Data Engineer's Lunch #50, we will introduce Airbyte and discuss
how it can be used for data engineering
Arpan Patel
Engineer @ Anant
Airbyte
● Open-source data integration tool -> EL(T)
● 140+ out-of-the-box connectors
● Custom or new connectors, access to CDK
● Database replication with Change Data Capture
● Normalization and custom transformations via dbt
● Full-grade scheduler
● Real-time monitoring
● Incremental updates
● Manual full refresh
● Integration with Kubernetes and Airflow
● Cloud hosting & management
Airbyte Data Syncing
● Select the data streams to replicate
○ Airbyte supports all API streams, and lets you
select the ones that you want to replicate
specifically.
● Opt for normalized schemas or JSON format
○ Explode all nested API objects into separate
tables, or get a serialized JSON.
● Transform your data via dbt
○ You can add your own custom transformations
using the dbt integration right in the app.
● Airbyte’s API
● Recipes
Airbyte Pipeline Visibility
● Real time monitoring
○ Error logging
● Notification for failed syncs
○ You can set up a webhook to get notified when a
sync fails.
● Debugging autonomy
○ Modify and debug pipelines as you see fit,
without waiting.
Airbyte Open Source Deploy Options
● Local -> Docker
○ Some users using Macs with an M1 chip are
facing some problems running Airbyte
● Airbyte Cloud
● AWS -> EC2
● GCP -> Compute Engine
● Azure -> VM
● K8
● Digital Ocean
● Oracle -> Cloud Infrastructure VM
Demo
● Spin up Airbyte on Gitpod
● Get CSV from a GitHub file and store to local as JSON
● E+L from one instance of Postgres to another instance of Postgres
Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

Data Engineer's Lunch #50: Airbyte for Data Engineering

  • 1.
    Version 1.0 Airbyte forData Engineering In Data Engineer's Lunch #50, we will introduce Airbyte and discuss how it can be used for data engineering Arpan Patel Engineer @ Anant
  • 2.
    Airbyte ● Open-source dataintegration tool -> EL(T) ● 140+ out-of-the-box connectors ● Custom or new connectors, access to CDK ● Database replication with Change Data Capture ● Normalization and custom transformations via dbt ● Full-grade scheduler ● Real-time monitoring ● Incremental updates ● Manual full refresh ● Integration with Kubernetes and Airflow ● Cloud hosting & management
  • 3.
    Airbyte Data Syncing ●Select the data streams to replicate ○ Airbyte supports all API streams, and lets you select the ones that you want to replicate specifically. ● Opt for normalized schemas or JSON format ○ Explode all nested API objects into separate tables, or get a serialized JSON. ● Transform your data via dbt ○ You can add your own custom transformations using the dbt integration right in the app. ● Airbyte’s API ● Recipes
  • 4.
    Airbyte Pipeline Visibility ●Real time monitoring ○ Error logging ● Notification for failed syncs ○ You can set up a webhook to get notified when a sync fails. ● Debugging autonomy ○ Modify and debug pipelines as you see fit, without waiting.
  • 5.
    Airbyte Open SourceDeploy Options ● Local -> Docker ○ Some users using Macs with an M1 chip are facing some problems running Airbyte ● Airbyte Cloud ● AWS -> EC2 ● GCP -> Compute Engine ● Azure -> VM ● K8 ● Digital Ocean ● Oracle -> Cloud Infrastructure VM
  • 6.
    Demo ● Spin upAirbyte on Gitpod ● Get CSV from a GitHub file and store to local as JSON ● E+L from one instance of Postgres to another instance of Postgres
  • 7.
    Strategy: Scalable FastData Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037