4. AIRBYTE FEATURES
Scheduled updates
Manual full refresh
Change Data Capture for
databases
DATA
REPLICATION
Full control over the data
Normalized schemas
Custom transformation via
dbt
TRANSFORMATION
Real time monitoring
Notification for failed
syncs
Debugging autonomy
IN FULL CONTROL
5. Airbyte
● Concept, Installation, Source, destination, configuration
○ Pandas under the hood
○ No way to synchronize folders, only files
● S3, csv 10k, 10M, 1K files with 10K rows
● S3 to Snowflake
○ 10M does not work, maybe AWS instance issue?
○ From local file to SF works
○
● Oracle RDS to Snowflake
○ Override and incrementals
○ Custom transformations
6. Test
● EC2 instance with airbyte
● Ingest data from S3 to Snowflake
○ 10K
○ 10M
● Ingest data from RDS Oracle DB to Snowflake
● Custom DBT transformation
9. EFFICIENCY
Do more with less. 10x price
performance advantage
through greater HW
efficiency & choice
SPEED
Up to 182x faster speed at
scale with optimizing
storage, indexing and
engine
The Firebolt difference
SCALE
Elastic scale at speed
across ETL,
semi-structured data and
thousands of users
A new speed focus on speed and efficiency at scale
12. Second try
● Mock generated data
○ Star schema
○ 1M rows (Fact)
○ 1K rows (Dimensions)
● Parquet files in S3
● Loading data with firebolt script
● External, Fact and dimensions
tables
● 7 different queries (joins, where,
like clauses and windows
functions)
● No aggregate or join indexes
(Firebolt)
13. Query Sentence Rows
1
SELECT dim_employee.employee_id, dim_product.product_id, dim_time.time_id, dim_store.store_id, dim_sales_type.sales_type_id, fs.price, fs.quantity, fs.yn
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
INNER JOIN dim_time ON fs.time_id=dim_time.time_id
INNER JOIN dim_store ON fs.store_id=dim_store.store_id
INNER JOIN dim_sales_type ON fs.sales_type_id=dim_sales_type.sales_type_id;
994,969
2
SELECT dim_employee.employee_id, dim_product.product_id, dim_time.time_id, dim_store.store_id, dim_sales_type.sales_type_id, fs.price, fs.quantity, fs.my_date
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
INNER JOIN dim_time ON fs.time_id=dim_time.time_id
INNER JOIN dim_store ON fs.store_id=dim_store.store_id
INNER JOIN dim_sales_type ON fs.sales_type_id=dim_sales_type.sales_type_id
WHERE fs.my_date BETWEEN '2050-01-01' and '2100-12-31'
531,438
3
SELECT dim_employee.employee_id, dim_product.product_id, dim_time.time_id, dim_store.store_id, dim_sales_type.sales_type_id, fs.price, fs.quantity, fs.my_date,
dim_product.product_name
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
INNER JOIN dim_time ON fs.time_id=dim_time.time_id
INNER JOIN dim_store ON fs.store_id=dim_store.store_id
INNER JOIN dim_sales_type ON fs.sales_type_id=dim_sales_type.sales_type_id
WHERE fs.my_date BETWEEN '2050-01-01' and '2100-12-31'
AND dim_product.product_name like '%eto%'
ORDER BY dim_product.product_name
3,305
4
SELECT dim_employee.employee_id, dim_product.product_id, sum(fs.price) as total, max(fs.quantity) max_quantity
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
GROUP BY dim_employee.employee_id, dim_product.product_id
ORDER BY total desc
551,277
14. Query Sentence Rows
5
SELECT dim_employee.employee_id, dim_product.product_id, fs.category, fs.random,
RANK() OVER(PARTITION BY fs.category ORDER BY fs.random DESC) as rank
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
ORDER BY rank
997,933
6
SELECT distinct dim_employee.employee_id, dim_product.product_id
, sum(fs.price) over(partition by dim_employee.employee_id, dim_product.product_id) as total
, max(fs.quantity) over(partition by dim_employee.employee_id, dim_product.product_id) max_quantity
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
ORDER BY total desc
552,133
7
SELECT distinct fs.employee_id, fs.product_id
, sum(fs.price) over(partition by dim_employee.employee_id, dim_product.product_id) as total
, max(fs.quantity) over(partition by dim_employee.employee_id, dim_product.product_id) max_quantity
FROM fact_sales fs
INNER JOIN dim_employee ON fs.employee_id=dim_employee.employee_id
INNER JOIN dim_product ON fs.product_id=dim_product.product_id
ORDER BY total desc
552,133