SlideShare a Scribd company logo
Traveloka Data
Meetup v1.0.0
How to Feed a Data Hungry Organization
Part One
Traveloka Data Culture
Part 1: Traveloka Data Culture
Five Characteristics of Data Hungry Organization
Driven Decision
Learn from Mistakes
Better Understanding
Uncertainty and Variation
High Quality Data
Data
Hungry
Organization
Part 1: Traveloka Data Culture
Our responsibility is to turn data into consumable insights
DATA
TEAM
BETTER
BUSINESS
DECISION
Part 1: Traveloka Data Culture
We need the brightest people to fill our needs and create the future
Mathematics
Business
Programming
Skills
Part 1: Traveloka Data Culture
Some of the skills in mathematics
Mathematics
Optimization
Decision Theory
Statistics
Differential Equations
Time Series
Part 1: Traveloka Data Culture
Some of the skills in business
Business
Strategy
Finance
Economics
Part 1: Traveloka Data Culture
Some of the skills in programming
Programming
Data Wrangling
Modelling
Big Data
Part 1: Traveloka Data Culture
This is how we structure our team
Data
Team
Data Governance
Machine Learning Engineering
Data Analysis
Data Science
Data Engineering
Part 1: Traveloka Data Culture
Houston,
We have
a problem.
DW
Tens of Terabytes
Hundreds of ETLs
Kafka
Hundreds of topics
Millions of Messages per Hour
Hundreds of Megabytes per Second
S3
Hundreds of Terabytes
Redshift
Tens of Thousand Queries Daily
DOMO
Thousands of Cards
Hundreds of Users
PeriscopeData
Thousands of Dashboards
Hundreds of Users
Part 1: Traveloka Data Culture
We need
state of the art
technology
to feed data
hungry people
Ingestion
Gobblin
Data Lake
AWS S3
Batch Processing
Spark, Airflow, Hadoop2,
Python, Java App
Data Warehouse
Redshift, MongoDB,
PostgreSQL
Datahub
Pubsub, Kafka Stream Processing
DataFlow, MemSQL
Pipeline
Near Real Time DW
GCP BigQuery, MemSQL
Real Time DB
AWS DynamoDB
Ingestion Processin
g
Storage Presentation
Source DB
Mongo, PostgreSQL
App / Services
Java App
Analytics Tools
PeriscopeData, Spark, R,
Domo Dataiku Holistics, Keboola
ML Tools, Library, and Services
Jupyter, Zeppelin, Caffe, DataDog,
TensorFlow, Cloud Vision API
Query Engine
Qubole, Presto,
Hive
Part Two
Data Engineering
Part 2: Data Engineering
Fast Food,
Or…?
Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
Part 2: Data Engineering
Real Time Pipeline
5 min data delivery SLA. Real latency ~ 10s
100 ms query SLA. Real latency ~ 10ms (p95)
Key value data, query by service/app
Autoscale - Self service for each engineering team
we provide governance, guidance, building blocks, and consultation
Part 2: Data Engineering
Real
Time
Pipeline
Part 2: Data Engineering
Near Real Time Pipeline
Raw data, query by BI Tools
5 min data delivery SLA. Real latency ~ 5s
Using Yaml for Schema definition (built and defined by ourselves)
Self service for data analysts! with guidance and governance
Part 2: Data Engineering
Near Real Time Pipeline
Part 2: Data Engineering
Near Real Time Pipeline
But, MemSQL is not managed service, it is on EC2.
It is easy to scale, but not autoscale yet.
So we are moving to… v2!!
Currently on usability testing test by analysts.
Self service, of course!
Part 2: Data Engineering
Near Real Time Pipeline
Part 2: Data Engineering
Analytical Pipeline
Heavy data
processing
query by BI Tools
6 hour data
delivery SLA
Part 2: Data Engineering
Analytical Pipeline
Interesting features:
• Custom dev/prod environment, for self service!
• Custom framework, on top of Spark
• Custom airflow, separated queue for backfill
• EMR autoscale for backfill
• Redshift microbatch bulk load
• etc...
Part 2: Data Engineering
Summary
Part Three
Data Science in Traveloka
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
Novia is 25 years old. She is single, outspoken, and
mathematically gifted. As a student, she was deeply
interested in calculus and statistics, and also participated in
International Mathematical Olympiad.
a. Novia is a data scientist
b. Novia is a data scientist and is active as mathematical
Olympiad tutor
Part 3: Data Science in Traveloka
Part 3: Data Science in Traveloka
Consider a regular six-sided die with four green faces and
two red faces. The die will be rolled 20 times and the
sequence of greens (G) and reds (R) will be recorded.
Choose one sequence from a set of three. Which one is the
more likely outcome?
RGRRR
GRGRRR
GRRRRR
Part 3: Data Science in Traveloka
Part 3: Data Science in Traveloka
Remember This:
The goal of data science exercise is to help us make
a good business decision
Logic
Alternatives
Information
Preferences
Part 3: Data Science in Traveloka
“if they learn nothing else about decision
analysis from their studies, distinction between
outcome and decisions will have been worth
the price of admission”
Ron Howard, Professor at Stanford University
Father of Decision Analysis
Part 3: Data Science in Traveloka
Good Bad
Good Took a taxi and arrived safely Drive home and arrived safely
Bad Took a taxi and involved in accident Drive home and involved in accident
Decisions
Outcome
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
Data Science Framework: CRISP-DM
Business
Data
Data Prep
Model
Evaluation
Deployment
Common
Sense
Part 3: Data Science in Traveloka
“Hiding within those
mounds of data is
knowledge that could
change the life of a
patient, or change the
world”
-Atul Butte, Stanford-
We use open source library
for data science
Wrangling
• data.table
• dplyr
• sparkR
• sparklyr
• pandas
• pyspark
Visualizatio
n
• ggplot
• matplotlib
• seaborn
• shiny
Statistics
• R
• JAGS
• STAN
• Python
• Julia
Machine
Learning
• scikit-learn
• caret
• e1071
• fbprophet
Part 3: Data Science in Traveloka
Are we using the algorithm? Or being used by it?
Classification Linear Models
Naïve Bayes
Classifier
Support Vector
Classifier
Vowpal Wabbit
Classifier
Random Forest
Decision Trees
Neural Network
Extreme Gradient
Boosted Trees
Many more algos!
Prediction
Linear Models
Nystroem
Regressor
Support Vector
Regressor
Vowpal Wabbit
Regressor
Random Forest
Decision Trees
Neural Network
Extreme Gradient
Boosted Trees
More Algos!
• Scikit-learn
• Caret
• TensorFlow
• …
Part 3: Data Science in Traveloka
We need more than just off the shelf libraries to
feed data hungry people
Bayesian Network Markov Chain Monte Carlo
Part 3: Data Science in Traveloka
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
Model Evaluation: judging the usefulness of your model
Rule #1
Never ever peek at the test set during training/validation
Rule #2
You can never satisfy all the metrics,
pick one or two metrics as your decision criteria beforehand
Rule #3
Always do comparative statics on the final model
Part 3: Data Science in Traveloka
Comparative
Statics
commonly used as
feature importance
analysis
Part 3: Data Science in Traveloka
Remember the end goal: decisions
What should
we do?
What
might
happen
Part 3: Data Science in Traveloka
“But in my view,
obsessive customer focus
is by far the most protective of
Day 1 vitality”
Our data is telling us:
• What do they want?
• Do we serve their needs?
• Are they trying to leave us?
Part 3: Data Science in Traveloka
My name is Jeff
Thank you!

More Related Content

What's hot

Deliveroo Pitch Deck designed by Zlides
Deliveroo Pitch Deck designed by ZlidesDeliveroo Pitch Deck designed by Zlides
Deliveroo Pitch Deck designed by Zlides
Zlides
 
Wiktor Leo Burnett Credential 2011
Wiktor Leo Burnett Credential 2011Wiktor Leo Burnett Credential 2011
Wiktor Leo Burnett Credential 2011
wiktorleoburnett
 
Th milk activation_thu
Th milk activation_thuTh milk activation_thu
Th milk activation_thu
hemisphere1234
 
PPT DIGITAL PAYMENT (1).pptx
PPT DIGITAL PAYMENT (1).pptxPPT DIGITAL PAYMENT (1).pptx
PPT DIGITAL PAYMENT (1).pptx
UswatunHasanah455929
 
Nielsen insights pocketbook 2018
Nielsen insights pocketbook 2018Nielsen insights pocketbook 2018
Nielsen insights pocketbook 2018
Duy, Vo Hoang
 
Fashinza pitch deck
Fashinza pitch deckFashinza pitch deck
Fashinza pitch deck
Tech in Asia
 
Mondelez diji-touch Interactive Vending Machines
Mondelez diji-touch Interactive Vending MachinesMondelez diji-touch Interactive Vending Machines
Mondelez diji-touch Interactive Vending Machines
BroadSign
 
KoinX pitch deck
KoinX pitch deckKoinX pitch deck
KoinX pitch deck
Tech in Asia
 
Ssgb test
Ssgb   testSsgb   test
Pitch Deck for Kangarooo
Pitch Deck for KangaroooPitch Deck for Kangarooo
Pitch Deck for Kangarooo
Pitch Decks
 
Tripomatic - pitch deck
Tripomatic - pitch deckTripomatic - pitch deck
Tripomatic - pitch deck
tripomatic
 
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
Shandy Aditya
 
Uber Pitch Deck Makeover by SlideTeam
Uber Pitch Deck Makeover by SlideTeamUber Pitch Deck Makeover by SlideTeam
Uber Pitch Deck Makeover by SlideTeam
SlideTeam
 
SelfCheckout - frictionless shopping experience
SelfCheckout - frictionless shopping experienceSelfCheckout - frictionless shopping experience
SelfCheckout - frictionless shopping experience
ashish2509
 
MYCELIA - PITCH DECK.pdf
MYCELIA - PITCH DECK.pdfMYCELIA - PITCH DECK.pdf
MYCELIA - PITCH DECK.pdf
Mycelia1
 
The Facebook Pitch Deck
The Facebook Pitch DeckThe Facebook Pitch Deck
The Facebook Pitch Deck
Tech in Asia ID
 
Specsavers case study
Specsavers case studySpecsavers case study
Specsavers case study
Newsworks
 
F88 pitch deck
F88 pitch deckF88 pitch deck
F88 pitch deck
Tech in Asia
 
Yatra.Com - Brand Analysis Presentation
Yatra.Com - Brand Analysis PresentationYatra.Com - Brand Analysis Presentation
Yatra.Com - Brand Analysis Presentation
Gautam Sinha
 

What's hot (20)

Deliveroo Pitch Deck designed by Zlides
Deliveroo Pitch Deck designed by ZlidesDeliveroo Pitch Deck designed by Zlides
Deliveroo Pitch Deck designed by Zlides
 
Wiktor Leo Burnett Credential 2011
Wiktor Leo Burnett Credential 2011Wiktor Leo Burnett Credential 2011
Wiktor Leo Burnett Credential 2011
 
Th milk activation_thu
Th milk activation_thuTh milk activation_thu
Th milk activation_thu
 
PPT DIGITAL PAYMENT (1).pptx
PPT DIGITAL PAYMENT (1).pptxPPT DIGITAL PAYMENT (1).pptx
PPT DIGITAL PAYMENT (1).pptx
 
Nielsen insights pocketbook 2018
Nielsen insights pocketbook 2018Nielsen insights pocketbook 2018
Nielsen insights pocketbook 2018
 
Fashinza pitch deck
Fashinza pitch deckFashinza pitch deck
Fashinza pitch deck
 
Mondelez diji-touch Interactive Vending Machines
Mondelez diji-touch Interactive Vending MachinesMondelez diji-touch Interactive Vending Machines
Mondelez diji-touch Interactive Vending Machines
 
Presentasi KP
Presentasi KPPresentasi KP
Presentasi KP
 
KoinX pitch deck
KoinX pitch deckKoinX pitch deck
KoinX pitch deck
 
Ssgb test
Ssgb   testSsgb   test
Ssgb test
 
Pitch Deck for Kangarooo
Pitch Deck for KangaroooPitch Deck for Kangarooo
Pitch Deck for Kangarooo
 
Tripomatic - pitch deck
Tripomatic - pitch deckTripomatic - pitch deck
Tripomatic - pitch deck
 
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...
 
Uber Pitch Deck Makeover by SlideTeam
Uber Pitch Deck Makeover by SlideTeamUber Pitch Deck Makeover by SlideTeam
Uber Pitch Deck Makeover by SlideTeam
 
SelfCheckout - frictionless shopping experience
SelfCheckout - frictionless shopping experienceSelfCheckout - frictionless shopping experience
SelfCheckout - frictionless shopping experience
 
MYCELIA - PITCH DECK.pdf
MYCELIA - PITCH DECK.pdfMYCELIA - PITCH DECK.pdf
MYCELIA - PITCH DECK.pdf
 
The Facebook Pitch Deck
The Facebook Pitch DeckThe Facebook Pitch Deck
The Facebook Pitch Deck
 
Specsavers case study
Specsavers case studySpecsavers case study
Specsavers case study
 
F88 pitch deck
F88 pitch deckF88 pitch deck
F88 pitch deck
 
Yatra.Com - Brand Analysis Presentation
Yatra.Com - Brand Analysis PresentationYatra.Com - Brand Analysis Presentation
Yatra.Com - Brand Analysis Presentation
 

Similar to How to Feed a Data Hungry Organization – by Traveloka Data Team

Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
TJ Stalcup
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
sumitkumar600840
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
Mitch Sanders
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
Dr.Shweta
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
Vrishit Saraswat
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
Samet KILICTAS
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
Sara-Jayne Terp
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG: connecting the knowledge community
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
CodePolitan
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
Turi, Inc.
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
RAKESHG79
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
Ta-Wei (David) Huang
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
stelligence
 

Similar to How to Feed a Data Hungry Organization – by Traveloka Data Team (20)

Data science presentation
Data science presentationData science presentation
Data science presentation
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 

How to Feed a Data Hungry Organization – by Traveloka Data Team

  • 1. Traveloka Data Meetup v1.0.0 How to Feed a Data Hungry Organization
  • 3. Part 1: Traveloka Data Culture Five Characteristics of Data Hungry Organization Driven Decision Learn from Mistakes Better Understanding Uncertainty and Variation High Quality Data Data Hungry Organization
  • 4. Part 1: Traveloka Data Culture Our responsibility is to turn data into consumable insights DATA TEAM BETTER BUSINESS DECISION
  • 5. Part 1: Traveloka Data Culture We need the brightest people to fill our needs and create the future Mathematics Business Programming Skills
  • 6. Part 1: Traveloka Data Culture Some of the skills in mathematics Mathematics Optimization Decision Theory Statistics Differential Equations Time Series
  • 7. Part 1: Traveloka Data Culture Some of the skills in business Business Strategy Finance Economics
  • 8. Part 1: Traveloka Data Culture Some of the skills in programming Programming Data Wrangling Modelling Big Data
  • 9. Part 1: Traveloka Data Culture This is how we structure our team Data Team Data Governance Machine Learning Engineering Data Analysis Data Science Data Engineering
  • 10. Part 1: Traveloka Data Culture Houston, We have a problem. DW Tens of Terabytes Hundreds of ETLs Kafka Hundreds of topics Millions of Messages per Hour Hundreds of Megabytes per Second S3 Hundreds of Terabytes Redshift Tens of Thousand Queries Daily DOMO Thousands of Cards Hundreds of Users PeriscopeData Thousands of Dashboards Hundreds of Users
  • 11. Part 1: Traveloka Data Culture We need state of the art technology to feed data hungry people Ingestion Gobblin Data Lake AWS S3 Batch Processing Spark, Airflow, Hadoop2, Python, Java App Data Warehouse Redshift, MongoDB, PostgreSQL Datahub Pubsub, Kafka Stream Processing DataFlow, MemSQL Pipeline Near Real Time DW GCP BigQuery, MemSQL Real Time DB AWS DynamoDB Ingestion Processin g Storage Presentation Source DB Mongo, PostgreSQL App / Services Java App Analytics Tools PeriscopeData, Spark, R, Domo Dataiku Holistics, Keboola ML Tools, Library, and Services Jupyter, Zeppelin, Caffe, DataDog, TensorFlow, Cloud Vision API Query Engine Qubole, Presto, Hive
  • 13. Part 2: Data Engineering Fast Food, Or…?
  • 14. Part 2: Data Engineering MINDSETS Managed service for focus So we could focus more on the use cases
  • 15. Part 2: Data Engineering MINDSETS Managed service for focus So we could focus more on the use cases
  • 16. Part 2: Data Engineering Real Time Pipeline 5 min data delivery SLA. Real latency ~ 10s 100 ms query SLA. Real latency ~ 10ms (p95) Key value data, query by service/app Autoscale - Self service for each engineering team we provide governance, guidance, building blocks, and consultation
  • 17. Part 2: Data Engineering Real Time Pipeline
  • 18. Part 2: Data Engineering Near Real Time Pipeline Raw data, query by BI Tools 5 min data delivery SLA. Real latency ~ 5s Using Yaml for Schema definition (built and defined by ourselves) Self service for data analysts! with guidance and governance
  • 19. Part 2: Data Engineering Near Real Time Pipeline
  • 20. Part 2: Data Engineering Near Real Time Pipeline But, MemSQL is not managed service, it is on EC2. It is easy to scale, but not autoscale yet. So we are moving to… v2!! Currently on usability testing test by analysts. Self service, of course!
  • 21. Part 2: Data Engineering Near Real Time Pipeline
  • 22. Part 2: Data Engineering Analytical Pipeline Heavy data processing query by BI Tools 6 hour data delivery SLA
  • 23. Part 2: Data Engineering Analytical Pipeline Interesting features: • Custom dev/prod environment, for self service! • Custom framework, on top of Spark • Custom airflow, separated queue for backfill • EMR autoscale for backfill • Redshift microbatch bulk load • etc...
  • 24. Part 2: Data Engineering Summary
  • 25. Part Three Data Science in Traveloka
  • 26. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  • 27. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  • 28. Novia is 25 years old. She is single, outspoken, and mathematically gifted. As a student, she was deeply interested in calculus and statistics, and also participated in International Mathematical Olympiad. a. Novia is a data scientist b. Novia is a data scientist and is active as mathematical Olympiad tutor Part 3: Data Science in Traveloka
  • 29. Part 3: Data Science in Traveloka Consider a regular six-sided die with four green faces and two red faces. The die will be rolled 20 times and the sequence of greens (G) and reds (R) will be recorded. Choose one sequence from a set of three. Which one is the more likely outcome? RGRRR GRGRRR GRRRRR
  • 30. Part 3: Data Science in Traveloka
  • 31. Part 3: Data Science in Traveloka
  • 32. Remember This: The goal of data science exercise is to help us make a good business decision Logic Alternatives Information Preferences Part 3: Data Science in Traveloka
  • 33. “if they learn nothing else about decision analysis from their studies, distinction between outcome and decisions will have been worth the price of admission” Ron Howard, Professor at Stanford University Father of Decision Analysis Part 3: Data Science in Traveloka Good Bad Good Took a taxi and arrived safely Drive home and arrived safely Bad Took a taxi and involved in accident Drive home and involved in accident Decisions Outcome
  • 34. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  • 35. Data Science Framework: CRISP-DM Business Data Data Prep Model Evaluation Deployment Common Sense Part 3: Data Science in Traveloka
  • 36. “Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world” -Atul Butte, Stanford- We use open source library for data science Wrangling • data.table • dplyr • sparkR • sparklyr • pandas • pyspark Visualizatio n • ggplot • matplotlib • seaborn • shiny Statistics • R • JAGS • STAN • Python • Julia Machine Learning • scikit-learn • caret • e1071 • fbprophet Part 3: Data Science in Traveloka
  • 37. Are we using the algorithm? Or being used by it? Classification Linear Models Naïve Bayes Classifier Support Vector Classifier Vowpal Wabbit Classifier Random Forest Decision Trees Neural Network Extreme Gradient Boosted Trees Many more algos! Prediction Linear Models Nystroem Regressor Support Vector Regressor Vowpal Wabbit Regressor Random Forest Decision Trees Neural Network Extreme Gradient Boosted Trees More Algos! • Scikit-learn • Caret • TensorFlow • … Part 3: Data Science in Traveloka
  • 38. We need more than just off the shelf libraries to feed data hungry people Bayesian Network Markov Chain Monte Carlo Part 3: Data Science in Traveloka
  • 39. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  • 40. Model Evaluation: judging the usefulness of your model Rule #1 Never ever peek at the test set during training/validation Rule #2 You can never satisfy all the metrics, pick one or two metrics as your decision criteria beforehand Rule #3 Always do comparative statics on the final model Part 3: Data Science in Traveloka
  • 41. Comparative Statics commonly used as feature importance analysis Part 3: Data Science in Traveloka
  • 42. Remember the end goal: decisions What should we do? What might happen Part 3: Data Science in Traveloka
  • 43. “But in my view, obsessive customer focus is by far the most protective of Day 1 vitality” Our data is telling us: • What do they want? • Do we serve their needs? • Are they trying to leave us? Part 3: Data Science in Traveloka My name is Jeff