SlideShare a Scribd company logo
1 of 26
Download to read offline
Traveloka’s
Data
Journey
Stories and lessons learned
on building a scalable data
pipeline at Traveloka.
Very Early
Days...
Stories and lessons learned on building a
scalable data pipeline at Traveloka.
Very Early days
Applications
& Services
Summarizer
Internal
Dashboard
Report Scripts +
Crontab
- Raw Activity
- Key Value
- Time Series
Full... Split & Shard!
Raw, KV, and Time Series DB
Applications
& Services Internal
Dashboard
Report Scripts +
Crontab
Raw Activity
(Sharded)
Time Series
SummarySummarizer
Lesson Learned
1. UNIX principle: “Do One Thing and Do It Well”
2. Split use cases based on SLA & query pattern
3. Scalable tech based on growth estimation
Key Value DB
(Sharded)
Throughput?
Kafka comes into rescue
Applications
& Services
Raw Activity
(Sharded)
Lesson Learned
1. Use something that can handle
higher throughput for cases with
high write volume like tracking
2. Decouple publish and consume
Kafka as
Datahub
Raw data
consumer
Key Value
(Sharded)
insert
update
We need Data Warehouse
and BI Tool, and we need it fast!
Raw Activity
(Sharded)
Other sources
Python ETL
(temporary
solution)
Star Schema
DW on
Postgres
Periscope BI
Tool
Lesson Learned
1. Think DW since the beginning of data pipeline
2. BI Tools: Do not reinvent the wheel
“Have” to
adopt big data
Stories and lessons learned on building a
scalable data pipeline at Traveloka.
Postgres couldn’t handle the load!
Raw Activity
(Sharded)
Other sources
Python ETL
(temporary
solution)
Star Schema
DW on
Redshift
Periscope BI
Tool
Lesson Learned
1. Choose specific tech that best fit the use case
Scaling out in MongoDB
every so often is not manageable...
Lesson Learned
1. MongoDB Shard: Scalability need to be tested!
Kafka as
Datahub
Gobblin as
Consumer
Raw Activity
on S3
“Have” to adopt big data
Lesson Learned
1. Processing have to be easily scaled
2. Scale processing separately for: day to day job,
backfill job
Kafka as
Datahub
Gobblin as
Consumer
Raw Activity
on S3
Processing on
Spark
Star Schema
DW on
Redshift
Near Real Time on Big Data
is challenging
Lesson Learned
1.Dig requirement until it is very
specific, for data it is related to:
1) latency SLA
2) query pattern
3) accuracy
4) processing requirement
5) tools integration
Kafka as
Datahub
MemSQL for Near
Real Time DB
No OPS!!!
Stories and lessons learned on building a
scalable data pipeline at Traveloka.
Open your mind for
any combination of tech!
Lesson Learned
1. Combination of cloud provider is possible, but
be careful of latency concern
2. During a research project, always prepare plan
B & C plus proper buffer on timeline
3. Autoscale!
PubSub as
Datahub
DataFlow for
Stream
Processing
Key Value on
DynamoDB
More autoscale!
Lesson Learned
1. Autoscale = cost monitoring
Caveat
Autoscale != everything solved
e.g. PubSub default quota 200MB/s (could be
increased, but manually request)
PubSub as
Datahub
BigQuery for Near
Real Time DB
More autoscale!
Lesson Learned
1. Scalable as granular as
possible, in this case
separate compute and
storage scalability
2. Separate BI with well
defined SLA and
exploration use case
Kafka as
Datahub
Gobblin as
Consumer
Raw Activity
on S3
Processing on
Spark
Hive & Presto on
Qubole as Query
Engine
BI & Exploration
Tools
WRAP UP
Stories and lessons learned on building a
scalable data pipeline at Traveloka.
Consumer of Data
Streaming
Batch
Traveloka
App
Kafka
ETL
Data
Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
DOMO
Analytics
UI
NoSQL DB
Traveloka
Services
Inges
t
Cloud
Pub/Sub
Storag
e
Cloud
Storage
Pipeline
s Cloud
Dataflow
Analytic
s
BigQuery
Monitoring
Logging
Hive, Presto
Query
Key Lessons Learned
● Scalability in mind -- esp disk full.. :)
● Scalable as granular as possible -- compute, storage
● Scalability need to be tested (of course!)
● Do one thing, and do it well, dig your requirement
-- SLA, query pattern
● Decouple publish and consume
-- publisher availability is very important!
● Choose tech that is specific to the use case
● Careful of Gotchas! There's no silver bullet...
THE FUTURE
Stories and lessons learned on building a
scalable data pipeline at Traveloka.
Future Roadmap
● In the past, we see problems/needs, see what technology
can solve it, and plug it to the existing pipeline.
● It works well.
● But after some time, we need to maintain a lot of different
components.
● Multiple clusters:
○ Kafka
○ Spark
○ Hive/Presto
○ Redshift
○ etc
● Multiple data entry points for analyst:
○ BigQuery
○ Hive/Presto
○ Redshift
Our Goal
● Simplifying our data architecture.
● Single data entry point for data analysts/scientists,
both streaming and batch data.
● Without compromising what we can do now.
● Reliability, speed, and scale.
● Less or no ops.
● We also want to make migration as simple/easy as
possible.
How will we achieve this?
● There are few options that we are considering right
now.
● Some of them introducing new
technologies/components.
● Some of them is making use of our existing
technology to its maximum potential.
● We are trying exciting new (relatively) technologies:
○ Google BigQuery
○ Google Dataprep on Dataflow
○ AWS Athena
○ AWS Redshift Spectrum
○ etc
Plan to simplify
Cloud Pub/Sub
Cloud Dataflow
BigQuery Cloud Storage
Kubernetes Cluster
Collector
Managed services
BI &
Analytics UI
BigTable
REST API
ML Models
Plan to simplify
● Seems promising, but…
● Need to be tested.
● Cover all use cases that we need ?
● Query migration ?
● Costs ?
● Maintainability ?
● Potential problems ?
See You On
Next Event!
Thank You

More Related Content

What's hot

Sistem Informasi Produksi
Sistem Informasi ProduksiSistem Informasi Produksi
Sistem Informasi ProduksiLuthfi Nk
 
CONTOH SOAL MATKUL MANAJEMEN LOGISTIK
CONTOH SOAL MATKUL MANAJEMEN LOGISTIKCONTOH SOAL MATKUL MANAJEMEN LOGISTIK
CONTOH SOAL MATKUL MANAJEMEN LOGISTIKNihayatul Mashumah
 
Analisis swot pt bank central asia tbk
Analisis swot pt bank central asia tbkAnalisis swot pt bank central asia tbk
Analisis swot pt bank central asia tbkDewanti Andayani
 
Logistik dan distribusi 5 desember 2011
Logistik dan distribusi 5 desember 2011Logistik dan distribusi 5 desember 2011
Logistik dan distribusi 5 desember 2011Togar Simatupang
 
Kelompok 7 : Analisis Proses Bisnis Perusahaan Traveloka
Kelompok 7 : Analisis Proses Bisnis Perusahaan TravelokaKelompok 7 : Analisis Proses Bisnis Perusahaan Traveloka
Kelompok 7 : Analisis Proses Bisnis Perusahaan TravelokaAngeliaChristy1
 
Tantangan Rekrutmen dan Seleksi di Masa Depan
Tantangan Rekrutmen dan Seleksi di Masa DepanTantangan Rekrutmen dan Seleksi di Masa Depan
Tantangan Rekrutmen dan Seleksi di Masa DepanSeta Wicaksana
 
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...Institute of Technology Sepuluh Nopember
 
Proposal penawaran kerjasama
Proposal penawaran kerjasamaProposal penawaran kerjasama
Proposal penawaran kerjasamaisht43
 
Manajemen Strategi PT Indofood Sukses Makmur Tbk
Manajemen Strategi PT Indofood Sukses Makmur TbkManajemen Strategi PT Indofood Sukses Makmur Tbk
Manajemen Strategi PT Indofood Sukses Makmur TbkWily Yoga
 
Strategic Management Assignment - PT Indofood
Strategic Management Assignment - PT IndofoodStrategic Management Assignment - PT Indofood
Strategic Management Assignment - PT IndofoodArfan Akbar
 
Presentasi ( procurement management ) manajemen pengadaan
Presentasi  ( procurement management ) manajemen pengadaanPresentasi  ( procurement management ) manajemen pengadaan
Presentasi ( procurement management ) manajemen pengadaanArif Boulbous
 
Peramalan Forecasting
Peramalan ForecastingPeramalan Forecasting
Peramalan ForecastingINDAHMAWARNI1
 
Makalah Business Plan Catering
Makalah Business Plan CateringMakalah Business Plan Catering
Makalah Business Plan CateringNafiah RR
 
Analisis SWOT and Matrix Space PT Indofood
Analisis SWOT and Matrix Space PT IndofoodAnalisis SWOT and Matrix Space PT Indofood
Analisis SWOT and Matrix Space PT IndofoodAlfrianty Sauran
 
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)Putri Sanuria
 
CONTOH PROPOSAL CORCOM
CONTOH PROPOSAL CORCOMCONTOH PROPOSAL CORCOM
CONTOH PROPOSAL CORCOMMarcom Agency
 

What's hot (20)

Sistem Informasi Produksi
Sistem Informasi ProduksiSistem Informasi Produksi
Sistem Informasi Produksi
 
CONTOH SOAL MATKUL MANAJEMEN LOGISTIK
CONTOH SOAL MATKUL MANAJEMEN LOGISTIKCONTOH SOAL MATKUL MANAJEMEN LOGISTIK
CONTOH SOAL MATKUL MANAJEMEN LOGISTIK
 
Analisis swot pt bank central asia tbk
Analisis swot pt bank central asia tbkAnalisis swot pt bank central asia tbk
Analisis swot pt bank central asia tbk
 
Logistik dan distribusi 5 desember 2011
Logistik dan distribusi 5 desember 2011Logistik dan distribusi 5 desember 2011
Logistik dan distribusi 5 desember 2011
 
TOKOPEDIA
TOKOPEDIA TOKOPEDIA
TOKOPEDIA
 
Kelompok 7 : Analisis Proses Bisnis Perusahaan Traveloka
Kelompok 7 : Analisis Proses Bisnis Perusahaan TravelokaKelompok 7 : Analisis Proses Bisnis Perusahaan Traveloka
Kelompok 7 : Analisis Proses Bisnis Perusahaan Traveloka
 
Tantangan Rekrutmen dan Seleksi di Masa Depan
Tantangan Rekrutmen dan Seleksi di Masa DepanTantangan Rekrutmen dan Seleksi di Masa Depan
Tantangan Rekrutmen dan Seleksi di Masa Depan
 
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...
Manajemen operasi bab 11 (manajemen rantai pasok) kelompok 1 statistika its s...
 
CONTOH JOBDES LENGKAP
CONTOH JOBDES LENGKAPCONTOH JOBDES LENGKAP
CONTOH JOBDES LENGKAP
 
Proposal penawaran kerjasama
Proposal penawaran kerjasamaProposal penawaran kerjasama
Proposal penawaran kerjasama
 
Manajemen Strategi PT Indofood Sukses Makmur Tbk
Manajemen Strategi PT Indofood Sukses Makmur TbkManajemen Strategi PT Indofood Sukses Makmur Tbk
Manajemen Strategi PT Indofood Sukses Makmur Tbk
 
Strategic Management Assignment - PT Indofood
Strategic Management Assignment - PT IndofoodStrategic Management Assignment - PT Indofood
Strategic Management Assignment - PT Indofood
 
PT INDOFOOD
PT INDOFOODPT INDOFOOD
PT INDOFOOD
 
Presentasi ( procurement management ) manajemen pengadaan
Presentasi  ( procurement management ) manajemen pengadaanPresentasi  ( procurement management ) manajemen pengadaan
Presentasi ( procurement management ) manajemen pengadaan
 
Proposal Aqua (Tugas IMC)
Proposal Aqua (Tugas IMC) Proposal Aqua (Tugas IMC)
Proposal Aqua (Tugas IMC)
 
Peramalan Forecasting
Peramalan ForecastingPeramalan Forecasting
Peramalan Forecasting
 
Makalah Business Plan Catering
Makalah Business Plan CateringMakalah Business Plan Catering
Makalah Business Plan Catering
 
Analisis SWOT and Matrix Space PT Indofood
Analisis SWOT and Matrix Space PT IndofoodAnalisis SWOT and Matrix Space PT Indofood
Analisis SWOT and Matrix Space PT Indofood
 
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)
Makalah MSDM (REKRUITMEN DAN SELEKSI KARYAWAN)
 
CONTOH PROPOSAL CORCOM
CONTOH PROPOSAL CORCOMCONTOH PROPOSAL CORCOM
CONTOH PROPOSAL CORCOM
 

Similar to Traveloka's data journey — Traveloka data meetup #2

Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungRendy Bambang Junior
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesRose Toomey
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesDriven Inc.
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureYaroslav Tkachenko
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connectorDenny Lee
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 

Similar to Traveloka's data journey — Traveloka data meetup #2 (20)

Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev Bandung
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Traveloka's data journey — Traveloka data meetup #2

  • 1. Traveloka’s Data Journey Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 2. Very Early Days... Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 3. Very Early days Applications & Services Summarizer Internal Dashboard Report Scripts + Crontab - Raw Activity - Key Value - Time Series
  • 4. Full... Split & Shard! Raw, KV, and Time Series DB Applications & Services Internal Dashboard Report Scripts + Crontab Raw Activity (Sharded) Time Series SummarySummarizer Lesson Learned 1. UNIX principle: “Do One Thing and Do It Well” 2. Split use cases based on SLA & query pattern 3. Scalable tech based on growth estimation Key Value DB (Sharded)
  • 5. Throughput? Kafka comes into rescue Applications & Services Raw Activity (Sharded) Lesson Learned 1. Use something that can handle higher throughput for cases with high write volume like tracking 2. Decouple publish and consume Kafka as Datahub Raw data consumer Key Value (Sharded) insert update
  • 6. We need Data Warehouse and BI Tool, and we need it fast! Raw Activity (Sharded) Other sources Python ETL (temporary solution) Star Schema DW on Postgres Periscope BI Tool Lesson Learned 1. Think DW since the beginning of data pipeline 2. BI Tools: Do not reinvent the wheel
  • 7. “Have” to adopt big data Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 8. Postgres couldn’t handle the load! Raw Activity (Sharded) Other sources Python ETL (temporary solution) Star Schema DW on Redshift Periscope BI Tool Lesson Learned 1. Choose specific tech that best fit the use case
  • 9. Scaling out in MongoDB every so often is not manageable... Lesson Learned 1. MongoDB Shard: Scalability need to be tested! Kafka as Datahub Gobblin as Consumer Raw Activity on S3
  • 10. “Have” to adopt big data Lesson Learned 1. Processing have to be easily scaled 2. Scale processing separately for: day to day job, backfill job Kafka as Datahub Gobblin as Consumer Raw Activity on S3 Processing on Spark Star Schema DW on Redshift
  • 11. Near Real Time on Big Data is challenging Lesson Learned 1.Dig requirement until it is very specific, for data it is related to: 1) latency SLA 2) query pattern 3) accuracy 4) processing requirement 5) tools integration Kafka as Datahub MemSQL for Near Real Time DB
  • 12. No OPS!!! Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 13. Open your mind for any combination of tech! Lesson Learned 1. Combination of cloud provider is possible, but be careful of latency concern 2. During a research project, always prepare plan B & C plus proper buffer on timeline 3. Autoscale! PubSub as Datahub DataFlow for Stream Processing Key Value on DynamoDB
  • 14. More autoscale! Lesson Learned 1. Autoscale = cost monitoring Caveat Autoscale != everything solved e.g. PubSub default quota 200MB/s (could be increased, but manually request) PubSub as Datahub BigQuery for Near Real Time DB
  • 15. More autoscale! Lesson Learned 1. Scalable as granular as possible, in this case separate compute and storage scalability 2. Separate BI with well defined SLA and exploration use case Kafka as Datahub Gobblin as Consumer Raw Activity on S3 Processing on Spark Hive & Presto on Qubole as Query Engine BI & Exploration Tools
  • 16. WRAP UP Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 17.
  • 18. Consumer of Data Streaming Batch Traveloka App Kafka ETL Data Warehouse S3 Data Lake Batch Ingest Android, iOS DOMO Analytics UI NoSQL DB Traveloka Services Inges t Cloud Pub/Sub Storag e Cloud Storage Pipeline s Cloud Dataflow Analytic s BigQuery Monitoring Logging Hive, Presto Query
  • 19. Key Lessons Learned ● Scalability in mind -- esp disk full.. :) ● Scalable as granular as possible -- compute, storage ● Scalability need to be tested (of course!) ● Do one thing, and do it well, dig your requirement -- SLA, query pattern ● Decouple publish and consume -- publisher availability is very important! ● Choose tech that is specific to the use case ● Careful of Gotchas! There's no silver bullet...
  • 20. THE FUTURE Stories and lessons learned on building a scalable data pipeline at Traveloka.
  • 21. Future Roadmap ● In the past, we see problems/needs, see what technology can solve it, and plug it to the existing pipeline. ● It works well. ● But after some time, we need to maintain a lot of different components. ● Multiple clusters: ○ Kafka ○ Spark ○ Hive/Presto ○ Redshift ○ etc ● Multiple data entry points for analyst: ○ BigQuery ○ Hive/Presto ○ Redshift
  • 22. Our Goal ● Simplifying our data architecture. ● Single data entry point for data analysts/scientists, both streaming and batch data. ● Without compromising what we can do now. ● Reliability, speed, and scale. ● Less or no ops. ● We also want to make migration as simple/easy as possible.
  • 23. How will we achieve this? ● There are few options that we are considering right now. ● Some of them introducing new technologies/components. ● Some of them is making use of our existing technology to its maximum potential. ● We are trying exciting new (relatively) technologies: ○ Google BigQuery ○ Google Dataprep on Dataflow ○ AWS Athena ○ AWS Redshift Spectrum ○ etc
  • 24. Plan to simplify Cloud Pub/Sub Cloud Dataflow BigQuery Cloud Storage Kubernetes Cluster Collector Managed services BI & Analytics UI BigTable REST API ML Models
  • 25. Plan to simplify ● Seems promising, but… ● Need to be tested. ● Cover all use cases that we need ? ● Query migration ? ● Costs ? ● Maintainability ? ● Potential problems ?
  • 26. See You On Next Event! Thank You