Successfully reported this slideshow.
Your SlideShare is downloading. ×

Airbyte-ModernDataStack

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Terratest
Terratest
Loading in …3
×

Check these out next

1 of 11 Ad

Airbyte-ModernDataStack

Download to read offline

"Airbyte - Data acquisition for Modern Data Platforms
In this session, you will learn the basic tenets of a modern data platform. Subsequently, you will understand the role of Airbyte in building the data platform. We will deep dive into how Airbyte's plugin type architecture enables rapid data acquisition of data from variety of data sources. We will also explore extensibility of Airbyte, by going through its Connector development kit."

"Airbyte - Data acquisition for Modern Data Platforms
In this session, you will learn the basic tenets of a modern data platform. Subsequently, you will understand the role of Airbyte in building the data platform. We will deep dive into how Airbyte's plugin type architecture enables rapid data acquisition of data from variety of data sources. We will also explore extensibility of Airbyte, by going through its Connector development kit."

Advertisement
Advertisement

More Related Content

Similar to Airbyte-ModernDataStack (20)

More from Knoldus Inc. (20)

Advertisement

Recently uploaded (20)

Airbyte-ModernDataStack

  1. 1. Presented By: Ram Indukuri Airbyte - ELT in Modern Data Stack
  2. 2. KnolX Etiquettes Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time! Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  3. 3. Our Agenda 01 The Holy Grail 02 Modern Data Stack 03 Transformations 04 Airbyte 05 Demo
  4. 4. Holy Grail of Modern Data Architecture - Data Mesh
  5. 5. Modern data stack ELT Data Pipeline Cloud Based Data Lake Data Transformation Tool Modern BI Tools 1 2 3 4
  6. 6. Modern data stack 1 2 3 4
  7. 7. # PYTHON # connect to db using wrapper around psycopg2 db = DatabaseConnection(db='db', user='username', password='password') # grab data from db and load into memory df = db.run_query("SELECT * FROM cleaned_table;") df = pd.DataFrame(df, columns=['user_id', 'series_id', 'timestamp']) # calculate time since first visit df = df.assign(time_since_first=df.groupby('user_id', sort=False).timestamp.apply(lambda x: x - x.min())) # calculate time since last visit df = df.assign(time_since_last=df.sort_values(['timestamp'], ascending=True).groupby('user_id', sort=False)['timestamp'].transform(pd.Series.diff)) # save df to compressed csv df.to_csv('transform_time_test.gz', compression='gzip') -- SQL equivalent -- increase the working memory (be careful with this) set work_mem='600MB'; -- create a dual index on the partition CREATE INDEX IF NOT EXISTS user_time_index ON table(user_id, timestamp); -- calculate time since last visit and time since first visit in one pass SELECT *, AGE(timestamp, LAG(timestamp, 1, timestamp) OVER w) AS time_since_last, AGE(timestamp, FIRST_VALUE(timestamp) OVER w) AS time_since_first FROM table WINDOW w AS (PARTITION BY user_id ORDER BY timestamp); SQL has taken over as dominant transformation tool
  8. 8. dbt
  9. 9. Airbyte - Plugin Architecture
  10. 10. Airbyte Internal Architecture

×