Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Traveloka Data
Meetup v1.0.0
How to Feed a Data Hungry Organization
Part One
Traveloka Data Culture
Part 1: Traveloka Data Culture
Five Characteristics of Data Hungry Organization
Driven Decision
Learn from Mistakes
Better...
Part 1: Traveloka Data Culture
Our responsibility is to turn data into consumable insights
DATA
TEAM
BETTER
BUSINESS
DECIS...
Part 1: Traveloka Data Culture
We need the brightest people to fill our needs and create the future
Mathematics
Business
P...
Part 1: Traveloka Data Culture
Some of the skills in mathematics
Mathematics
Optimization
Decision Theory
Statistics
Diffe...
Part 1: Traveloka Data Culture
Some of the skills in business
Business
Strategy
Finance
Economics
Part 1: Traveloka Data Culture
Some of the skills in programming
Programming
Data Wrangling
Modelling
Big Data
Part 1: Traveloka Data Culture
This is how we structure our team
Data
Team
Data Governance
Machine Learning Engineering
Da...
Part 1: Traveloka Data Culture
Houston,
We have
a problem.
DW
Tens of Terabytes
Hundreds of ETLs
Kafka
Hundreds of topics
...
Part 1: Traveloka Data Culture
We need
state of the art
technology
to feed data
hungry people
Ingestion
Gobblin
Data Lake
...
Part Two
Data Engineering
Part 2: Data Engineering
Fast Food,
Or…?
Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
Part 2: Data Engineering
Real Time Pipeline
5 min data delivery SLA. Real latency ~ 10s
100 ms query SLA. Real latency ~ 1...
Part 2: Data Engineering
Real
Time
Pipeline
Part 2: Data Engineering
Near Real Time Pipeline
Raw data, query by BI Tools
5 min data delivery SLA. Real latency ~ 5s
Us...
Part 2: Data Engineering
Near Real Time Pipeline
Part 2: Data Engineering
Near Real Time Pipeline
But, MemSQL is not managed service, it is on EC2.
It is easy to scale, bu...
Part 2: Data Engineering
Near Real Time Pipeline
Part 2: Data Engineering
Analytical Pipeline
Heavy data
processing
query by BI Tools
6 hour data
delivery SLA
Part 2: Data Engineering
Analytical Pipeline
Interesting features:
• Custom dev/prod environment, for self service!
• Cust...
Part 2: Data Engineering
Summary
Part Three
Data Science in Traveloka
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations ...
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations ...
Novia is 25 years old. She is single, outspoken, and
mathematically gifted. As a student, she was deeply
interested in cal...
Part 3: Data Science in Traveloka
Consider a regular six-sided die with four green faces and
two red faces. The die will b...
Part 3: Data Science in Traveloka
Part 3: Data Science in Traveloka
Remember This:
The goal of data science exercise is to help us make
a good business decision
Logic
Alternatives
Informatio...
“if they learn nothing else about decision
analysis from their studies, distinction between
outcome and decisions will hav...
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations ...
Data Science Framework: CRISP-DM
Business
Data
Data Prep
Model
Evaluation
Deployment
Common
Sense
Part 3: Data Science in ...
“Hiding within those
mounds of data is
knowledge that could
change the life of a
patient, or change the
world”
-Atul Butte...
Are we using the algorithm? Or being used by it?
Classification Linear Models
Naïve Bayes
Classifier
Support Vector
Classi...
We need more than just off the shelf libraries to
feed data hungry people
Bayesian Network Markov Chain Monte Carlo
Part 3...
Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations ...
Model Evaluation: judging the usefulness of your model
Rule #1
Never ever peek at the test set during training/validation
...
Comparative
Statics
commonly used as
feature importance
analysis
Part 3: Data Science in Traveloka
Remember the end goal: decisions
What should
we do?
What
might
happen
Part 3: Data Science in Traveloka
“But in my view,
obsessive customer focus
is by far the most protective of
Day 1 vitality”
Our data is telling us:
• What ...
Thank you!
Upcoming SlideShare
Loading in …5
×

of

How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 1 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 2 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 3 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 4 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 5 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 6 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 7 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 8 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 9 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 10 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 11 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 12 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 13 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 14 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 15 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 16 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 17 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 18 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 19 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 20 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 21 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 22 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 23 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 24 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 25 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 26 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 27 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 28 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 29 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 30 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 31 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 32 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 33 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 34 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 35 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 36 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 37 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 38 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 39 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 40 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 41 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 42 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 43 How to Feed a Data Hungry Organization – by Traveloka Data Team Slide 44
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

5 Likes

Share

Download to read offline

How to Feed a Data Hungry Organization – by Traveloka Data Team

Download to read offline

In Traveloka's Inaugural Data Meetup held in April 2017, Ainun Najib (Head of Data), Dr. Philip Thomas (Lead Data Scientist), and Rendy B. Junior (Lead Data Engineer) shared about the journey that Traveloka's Data Team have taken so far so that the audience can learn from the struggles and triumphs in managing Traveloka's burgeoning data.

You will learn more about:
1) Data culture in Traveloka
2) Data engineering in Traveloka
3) Data science in Traveloka

To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage


Safe Harbor Statement

Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.

How to Feed a Data Hungry Organization – by Traveloka Data Team

  1. 1. Traveloka Data Meetup v1.0.0 How to Feed a Data Hungry Organization
  2. 2. Part One Traveloka Data Culture
  3. 3. Part 1: Traveloka Data Culture Five Characteristics of Data Hungry Organization Driven Decision Learn from Mistakes Better Understanding Uncertainty and Variation High Quality Data Data Hungry Organization
  4. 4. Part 1: Traveloka Data Culture Our responsibility is to turn data into consumable insights DATA TEAM BETTER BUSINESS DECISION
  5. 5. Part 1: Traveloka Data Culture We need the brightest people to fill our needs and create the future Mathematics Business Programming Skills
  6. 6. Part 1: Traveloka Data Culture Some of the skills in mathematics Mathematics Optimization Decision Theory Statistics Differential Equations Time Series
  7. 7. Part 1: Traveloka Data Culture Some of the skills in business Business Strategy Finance Economics
  8. 8. Part 1: Traveloka Data Culture Some of the skills in programming Programming Data Wrangling Modelling Big Data
  9. 9. Part 1: Traveloka Data Culture This is how we structure our team Data Team Data Governance Machine Learning Engineering Data Analysis Data Science Data Engineering
  10. 10. Part 1: Traveloka Data Culture Houston, We have a problem. DW Tens of Terabytes Hundreds of ETLs Kafka Hundreds of topics Millions of Messages per Hour Hundreds of Megabytes per Second S3 Hundreds of Terabytes Redshift Tens of Thousand Queries Daily DOMO Thousands of Cards Hundreds of Users PeriscopeData Thousands of Dashboards Hundreds of Users
  11. 11. Part 1: Traveloka Data Culture We need state of the art technology to feed data hungry people Ingestion Gobblin Data Lake AWS S3 Batch Processing Spark, Airflow, Hadoop2, Python, Java App Data Warehouse Redshift, MongoDB, PostgreSQL Datahub Pubsub, Kafka Stream Processing DataFlow, MemSQL Pipeline Near Real Time DW GCP BigQuery, MemSQL Real Time DB AWS DynamoDB Ingestion Processin g Storage Presentation Source DB Mongo, PostgreSQL App / Services Java App Analytics Tools PeriscopeData, Spark, R, Domo Dataiku Holistics, Keboola ML Tools, Library, and Services Jupyter, Zeppelin, Caffe, DataDog, TensorFlow, Cloud Vision API Query Engine Qubole, Presto, Hive
  12. 12. Part Two Data Engineering
  13. 13. Part 2: Data Engineering Fast Food, Or…?
  14. 14. Part 2: Data Engineering MINDSETS Managed service for focus So we could focus more on the use cases
  15. 15. Part 2: Data Engineering MINDSETS Managed service for focus So we could focus more on the use cases
  16. 16. Part 2: Data Engineering Real Time Pipeline 5 min data delivery SLA. Real latency ~ 10s 100 ms query SLA. Real latency ~ 10ms (p95) Key value data, query by service/app Autoscale - Self service for each engineering team we provide governance, guidance, building blocks, and consultation
  17. 17. Part 2: Data Engineering Real Time Pipeline
  18. 18. Part 2: Data Engineering Near Real Time Pipeline Raw data, query by BI Tools 5 min data delivery SLA. Real latency ~ 5s Using Yaml for Schema definition (built and defined by ourselves) Self service for data analysts! with guidance and governance
  19. 19. Part 2: Data Engineering Near Real Time Pipeline
  20. 20. Part 2: Data Engineering Near Real Time Pipeline But, MemSQL is not managed service, it is on EC2. It is easy to scale, but not autoscale yet. So we are moving to… v2!! Currently on usability testing test by analysts. Self service, of course!
  21. 21. Part 2: Data Engineering Near Real Time Pipeline
  22. 22. Part 2: Data Engineering Analytical Pipeline Heavy data processing query by BI Tools 6 hour data delivery SLA
  23. 23. Part 2: Data Engineering Analytical Pipeline Interesting features: • Custom dev/prod environment, for self service! • Custom framework, on top of Spark • Custom airflow, separated queue for backfill • EMR autoscale for backfill • Redshift microbatch bulk load • etc...
  24. 24. Part 2: Data Engineering Summary
  25. 25. Part Three Data Science in Traveloka
  26. 26. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  27. 27. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  28. 28. Novia is 25 years old. She is single, outspoken, and mathematically gifted. As a student, she was deeply interested in calculus and statistics, and also participated in International Mathematical Olympiad. a. Novia is a data scientist b. Novia is a data scientist and is active as mathematical Olympiad tutor Part 3: Data Science in Traveloka
  29. 29. Part 3: Data Science in Traveloka Consider a regular six-sided die with four green faces and two red faces. The die will be rolled 20 times and the sequence of greens (G) and reds (R) will be recorded. Choose one sequence from a set of three. Which one is the more likely outcome? RGRRR GRGRRR GRRRRR
  30. 30. Part 3: Data Science in Traveloka
  31. 31. Part 3: Data Science in Traveloka
  32. 32. Remember This: The goal of data science exercise is to help us make a good business decision Logic Alternatives Information Preferences Part 3: Data Science in Traveloka
  33. 33. “if they learn nothing else about decision analysis from their studies, distinction between outcome and decisions will have been worth the price of admission” Ron Howard, Professor at Stanford University Father of Decision Analysis Part 3: Data Science in Traveloka Good Bad Good Took a taxi and arrived safely Drive home and arrived safely Bad Took a taxi and involved in accident Drive home and involved in accident Decisions Outcome
  34. 34. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  35. 35. Data Science Framework: CRISP-DM Business Data Data Prep Model Evaluation Deployment Common Sense Part 3: Data Science in Traveloka
  36. 36. “Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world” -Atul Butte, Stanford- We use open source library for data science Wrangling • data.table • dplyr • sparkR • sparklyr • pandas • pyspark Visualizatio n • ggplot • matplotlib • seaborn • shiny Statistics • R • JAGS • STAN • Python • Julia Machine Learning • scikit-learn • caret • e1071 • fbprophet Part 3: Data Science in Traveloka
  37. 37. Are we using the algorithm? Or being used by it? Classification Linear Models Naïve Bayes Classifier Support Vector Classifier Vowpal Wabbit Classifier Random Forest Decision Trees Neural Network Extreme Gradient Boosted Trees Many more algos! Prediction Linear Models Nystroem Regressor Support Vector Regressor Vowpal Wabbit Regressor Random Forest Decision Trees Neural Network Extreme Gradient Boosted Trees More Algos! • Scikit-learn • Caret • TensorFlow • … Part 3: Data Science in Traveloka
  38. 38. We need more than just off the shelf libraries to feed data hungry people Bayesian Network Markov Chain Monte Carlo Part 3: Data Science in Traveloka
  39. 39. Part 3: Data Science in Traveloka Three Things to Discuss Today Data Science Purpose Tools of the Trade Model Evaluations and Applications
  40. 40. Model Evaluation: judging the usefulness of your model Rule #1 Never ever peek at the test set during training/validation Rule #2 You can never satisfy all the metrics, pick one or two metrics as your decision criteria beforehand Rule #3 Always do comparative statics on the final model Part 3: Data Science in Traveloka
  41. 41. Comparative Statics commonly used as feature importance analysis Part 3: Data Science in Traveloka
  42. 42. Remember the end goal: decisions What should we do? What might happen Part 3: Data Science in Traveloka
  43. 43. “But in my view, obsessive customer focus is by far the most protective of Day 1 vitality” Our data is telling us: • What do they want? • Do we serve their needs? • Are they trying to leave us? Part 3: Data Science in Traveloka My name is Jeff
  44. 44. Thank you!
  • HaririShafa

    Oct. 3, 2019
  • TCiAPNattakarn

    Apr. 16, 2019
  • sadarbaskoro

    Mar. 9, 2018
  • itayulianti

    Mar. 5, 2018
  • HYDN

    Oct. 17, 2017

In Traveloka's Inaugural Data Meetup held in April 2017, Ainun Najib (Head of Data), Dr. Philip Thomas (Lead Data Scientist), and Rendy B. Junior (Lead Data Engineer) shared about the journey that Traveloka's Data Team have taken so far so that the audience can learn from the struggles and triumphs in managing Traveloka's burgeoning data. You will learn more about: 1) Data culture in Traveloka 2) Data engineering in Traveloka 3) Data science in Traveloka To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage Safe Harbor Statement Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.

Views

Total views

1,724

On Slideshare

0

From embeds

0

Number of embeds

61

Actions

Downloads

109

Shares

0

Comments

0

Likes

5

×