In Traveloka's Inaugural Data Meetup held in April 2017, Ainun Najib (Head of Data), Dr. Philip Thomas (Lead Data Scientist), and Rendy B. Junior (Lead Data Engineer) shared about the journey that Traveloka's Data Team have taken so far so that the audience can learn from the struggles and triumphs in managing Traveloka's burgeoning data.
You will learn more about:
1) Data culture in Traveloka
2) Data engineering in Traveloka
3) Data science in Traveloka
To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage
Safe Harbor Statement
Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.
Traveloka's data journey — Traveloka data meetup #2Traveloka
Discover the journey that Traveloka's Data Team have taken so far and learn from our struggles and triumphs in managing Traveloka's burgeoning data!
In this slide, you will learn more about the stories and lessons learned on building a scalable data pipeline at Traveloka.
Presenters of the slide:
Nisrina Luthfiyati - Data Engineer
Rendy B. Junior - Data System Architect
Wilson lauw - Lead Data Engineer
To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage
Safe Harbor Statement
Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.
Bukalapak is one of the largest ecommerce platform in Indonesia. This is the first deck that created by Bukalapak founders to pitch in Jakarta Ventures Night 18 March 2011.
Slides from a talk I gave about Uber Eats mobile work. We are a three-sided food delivery marketplace that needs to coordinate work across eaters, restaurants, and delivery partners.
Talk given March 22, 2018
Traveloka's data journey — Traveloka data meetup #2Traveloka
Discover the journey that Traveloka's Data Team have taken so far and learn from our struggles and triumphs in managing Traveloka's burgeoning data!
In this slide, you will learn more about the stories and lessons learned on building a scalable data pipeline at Traveloka.
Presenters of the slide:
Nisrina Luthfiyati - Data Engineer
Rendy B. Junior - Data System Architect
Wilson lauw - Lead Data Engineer
To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage
Safe Harbor Statement
Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.
Bukalapak is one of the largest ecommerce platform in Indonesia. This is the first deck that created by Bukalapak founders to pitch in Jakarta Ventures Night 18 March 2011.
Slides from a talk I gave about Uber Eats mobile work. We are a three-sided food delivery marketplace that needs to coordinate work across eaters, restaurants, and delivery partners.
Talk given March 22, 2018
Deliveroo pitch deck designed by Zlides
Want to create a pitch deck that inspires your audience? Get your FREE presentation kit designed by Zlides: http://bit.ly/slideshare_zlides
Presented by Michael Miller, Director of Marketing Foodservice at Mondelez International.
diji•touch was born when a 46” LCD touch screen met a not-so-ordinary vending machine and the interactive vending machines are situated in hospitals, universities and transit locations - all experiencing high traffic, high repeat visit frequency and high dwell time.
diji•touch interactive vending machines are powered by BroadSign digital signage software.
Pitch Deck for Kangarooo, travel guides and marketplace that makes it easy, fast and secure to discover and book tourist activities across all Australian continent. Purpose of the presentation: to gather feedback and critic regarding business idea. Build on BaseTemplates Pitch Deck Template.
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...Shandy Aditya
Berdasarkan buku Loudon, K. C., & Travel, C. G. (2014). E-Commerce: Business, Technology, Society. New Jersey: Pearson Education.
kali ini kita akan membahas chapter 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, AND APPS (D3 A 2019)
Video Presentation Link:
https://youtu.be/lE_UYpDzH2Y
Looking for design inspiration for your Startup Pitch Deck? Check out this design makeover of Uber's original Pitch Deck that it had shared during its launch in 2008.
SelfCheckout is a mobile self checkout solution that curates the online
checkout experience with physical retail outlets.
SelfCheckout eliminates the pain of queuing during shopping and
allows the customers to make a frictionless shopping journey with
Touch-Pay-Go process inside the physical shop.
Our highly curated process makes the SelfCheckout solution secure
and user friendly.
https://fewerclicks.in/
Mycelia is a disruptive project that connects indigenous peoples to the Web3 community in order to preserve future generations. To this end, Mycelia aims to enable and accelerate the creation of global, digital, and decentralized indigenous institutions based on the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP), using cutting-edge web3 technologies.
Contoh pitch deck dari The Facebook ketika Eduardo Saverin mencoba untuk menawarkan The Facebook kepada pemasang iklan pada tahun 2004.
Sumber: Digiday.
Specsavers teamed-up with The Evening Standard to appeal to consumers of fashion and to communicate that the known-for-value specs are also 'trend followers' frames.
Deliveroo pitch deck designed by Zlides
Want to create a pitch deck that inspires your audience? Get your FREE presentation kit designed by Zlides: http://bit.ly/slideshare_zlides
Presented by Michael Miller, Director of Marketing Foodservice at Mondelez International.
diji•touch was born when a 46” LCD touch screen met a not-so-ordinary vending machine and the interactive vending machines are situated in hospitals, universities and transit locations - all experiencing high traffic, high repeat visit frequency and high dwell time.
diji•touch interactive vending machines are powered by BroadSign digital signage software.
Pitch Deck for Kangarooo, travel guides and marketplace that makes it easy, fast and secure to discover and book tourist activities across all Australian continent. Purpose of the presentation: to gather feedback and critic regarding business idea. Build on BaseTemplates Pitch Deck Template.
E-Commerce Chap 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, ...Shandy Aditya
Berdasarkan buku Loudon, K. C., & Travel, C. G. (2014). E-Commerce: Business, Technology, Society. New Jersey: Pearson Education.
kali ini kita akan membahas chapter 4: BUILDING AN E-COMMERCE PRESENCE: WEB SITES, MOBILE SITES, AND APPS (D3 A 2019)
Video Presentation Link:
https://youtu.be/lE_UYpDzH2Y
Looking for design inspiration for your Startup Pitch Deck? Check out this design makeover of Uber's original Pitch Deck that it had shared during its launch in 2008.
SelfCheckout is a mobile self checkout solution that curates the online
checkout experience with physical retail outlets.
SelfCheckout eliminates the pain of queuing during shopping and
allows the customers to make a frictionless shopping journey with
Touch-Pay-Go process inside the physical shop.
Our highly curated process makes the SelfCheckout solution secure
and user friendly.
https://fewerclicks.in/
Mycelia is a disruptive project that connects indigenous peoples to the Web3 community in order to preserve future generations. To this end, Mycelia aims to enable and accelerate the creation of global, digital, and decentralized indigenous institutions based on the United Nations Declaration on the Rights of Indigenous Peoples (UNDRIP), using cutting-edge web3 technologies.
Contoh pitch deck dari The Facebook ketika Eduardo Saverin mencoba untuk menawarkan The Facebook kepada pemasang iklan pada tahun 2004.
Sumber: Digiday.
Specsavers teamed-up with The Evening Standard to appeal to consumers of fashion and to communicate that the known-for-value specs are also 'trend followers' frames.
Overview of a Machine Learning 11 week course I developed and trained software engineers at Dell on their way to become Data Scientists. Class is outline of Predictive Analytics methods using Python. I taught this class 8 separate occasions over 3 years.
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
How Graph Databases used in Police Department?Samet KILICTAS
This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
If you’re learning data science, you’re probably on the lookout for cool data science projects. Look no further! We have a wide variety of guided projects that’ll get you working with real data in real-world scenarios while also helping you learn and apply new data science skills.
The projects in the list below are also designed to help you get a job! Each project was designed by a data scientist on our content team, and they’re representative examples of the real projects working data analysts and data scientists do every day. They’re designed to guide you through the process while also challenging your skills, and they’re open-ended so that you can put your own twist on each project and use it for your data science portfolio.
You can complete each project right in your browser, or you can download the data set to your computer and work locally! If you work on our site, you’ll also be able to download your code at any time so that you can continue locally, or upload your project to GitHub.
The sky is the limit here and what you decide to look into further is completely up to you and your imagination!
1. Learning by Doing
Learning by doing refers to a theory of education expounded by American philosopher John Dewey. It is a hands-on approach to learning, meaning students must interact with their environment in order to adapt and learn. This way of learning sharpen your current skills and knowledge and also helps in gaining new skills that could only be acquired by doing.
Car driving is a perfect example of this, you can read as much as you would like about the theory of driving and the rules, and this is very important, and the more you understand the theory the better you get in the practical part. But you will only be able to drive better by applying this knowledge on the real road. In addition to that, there are some skills and knowledge that will be only gained by actually driving.
Data science is the same as driving. It is very important to have solid theoretical knowledge and to regularly increase them to be able to get better while working on a project. However, you should always apply this theoretical knowledge to projects. By this, you will deepen your understanding of these concepts and Knowledge, have a better point of view of how they work in a real-life, and will also show others that you have strong theoretical knowledge and are able to put them into practice.
There are different types of guided projects. One of them is a guided project for
There are a lot of benefits for it:
It removes the barriers between you and doing projects
Saves you much time thinking about the project and preparing the data.
It allows you to apply the theoretical knowledge without getting distracted by obstacles.
Practical tips that can save your effort and time in the future.
#datasciencefree
#rohitdubey
#teachtechtoe
#linkedin.com/in/therohitdubey
This session will demystify (generative) AI by exploring its workings as an advanced statistical modelling tool (suitable for any level of technical knowledge). Not only will this session explain the technological underpinnings of AI, it will also address concerns and (long-term) requirements around ethical and practical usage of AI. This includes data preparation and cleaning, data ownership, and the value of data-generated - but not owned - by libraries. It will also discuss the potentials for (hypothetical) use cases of AI in collections environments and making collections data AI-ready; providing examples of AI capabilities and applications beyond chatbots.
Slide presentasi ini dibawakan oleh Imron Zuhri dalam acara Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDO pada tanggal 14 Mei 2016.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. Part 1: Traveloka Data Culture
Five Characteristics of Data Hungry Organization
Driven Decision
Learn from Mistakes
Better Understanding
Uncertainty and Variation
High Quality Data
Data
Hungry
Organization
4. Part 1: Traveloka Data Culture
Our responsibility is to turn data into consumable insights
DATA
TEAM
BETTER
BUSINESS
DECISION
5. Part 1: Traveloka Data Culture
We need the brightest people to fill our needs and create the future
Mathematics
Business
Programming
Skills
6. Part 1: Traveloka Data Culture
Some of the skills in mathematics
Mathematics
Optimization
Decision Theory
Statistics
Differential Equations
Time Series
7. Part 1: Traveloka Data Culture
Some of the skills in business
Business
Strategy
Finance
Economics
8. Part 1: Traveloka Data Culture
Some of the skills in programming
Programming
Data Wrangling
Modelling
Big Data
9. Part 1: Traveloka Data Culture
This is how we structure our team
Data
Team
Data Governance
Machine Learning Engineering
Data Analysis
Data Science
Data Engineering
10. Part 1: Traveloka Data Culture
Houston,
We have
a problem.
DW
Tens of Terabytes
Hundreds of ETLs
Kafka
Hundreds of topics
Millions of Messages per Hour
Hundreds of Megabytes per Second
S3
Hundreds of Terabytes
Redshift
Tens of Thousand Queries Daily
DOMO
Thousands of Cards
Hundreds of Users
PeriscopeData
Thousands of Dashboards
Hundreds of Users
11. Part 1: Traveloka Data Culture
We need
state of the art
technology
to feed data
hungry people
Ingestion
Gobblin
Data Lake
AWS S3
Batch Processing
Spark, Airflow, Hadoop2,
Python, Java App
Data Warehouse
Redshift, MongoDB,
PostgreSQL
Datahub
Pubsub, Kafka Stream Processing
DataFlow, MemSQL
Pipeline
Near Real Time DW
GCP BigQuery, MemSQL
Real Time DB
AWS DynamoDB
Ingestion Processin
g
Storage Presentation
Source DB
Mongo, PostgreSQL
App / Services
Java App
Analytics Tools
PeriscopeData, Spark, R,
Domo Dataiku Holistics, Keboola
ML Tools, Library, and Services
Jupyter, Zeppelin, Caffe, DataDog,
TensorFlow, Cloud Vision API
Query Engine
Qubole, Presto,
Hive
14. Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
15. Part 2: Data Engineering
MINDSETS
Managed service
for focus
So we could focus more on
the use cases
16. Part 2: Data Engineering
Real Time Pipeline
5 min data delivery SLA. Real latency ~ 10s
100 ms query SLA. Real latency ~ 10ms (p95)
Key value data, query by service/app
Autoscale - Self service for each engineering team
we provide governance, guidance, building blocks, and consultation
18. Part 2: Data Engineering
Near Real Time Pipeline
Raw data, query by BI Tools
5 min data delivery SLA. Real latency ~ 5s
Using Yaml for Schema definition (built and defined by ourselves)
Self service for data analysts! with guidance and governance
19. Part 2: Data Engineering
Near Real Time Pipeline
20. Part 2: Data Engineering
Near Real Time Pipeline
But, MemSQL is not managed service, it is on EC2.
It is easy to scale, but not autoscale yet.
So we are moving to… v2!!
Currently on usability testing test by analysts.
Self service, of course!
21. Part 2: Data Engineering
Near Real Time Pipeline
22. Part 2: Data Engineering
Analytical Pipeline
Heavy data
processing
query by BI Tools
6 hour data
delivery SLA
23. Part 2: Data Engineering
Analytical Pipeline
Interesting features:
• Custom dev/prod environment, for self service!
• Custom framework, on top of Spark
• Custom airflow, separated queue for backfill
• EMR autoscale for backfill
• Redshift microbatch bulk load
• etc...
26. Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
27. Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
28. Novia is 25 years old. She is single, outspoken, and
mathematically gifted. As a student, she was deeply
interested in calculus and statistics, and also participated in
International Mathematical Olympiad.
a. Novia is a data scientist
b. Novia is a data scientist and is active as mathematical
Olympiad tutor
Part 3: Data Science in Traveloka
29. Part 3: Data Science in Traveloka
Consider a regular six-sided die with four green faces and
two red faces. The die will be rolled 20 times and the
sequence of greens (G) and reds (R) will be recorded.
Choose one sequence from a set of three. Which one is the
more likely outcome?
RGRRR
GRGRRR
GRRRRR
32. Remember This:
The goal of data science exercise is to help us make
a good business decision
Logic
Alternatives
Information
Preferences
Part 3: Data Science in Traveloka
33. “if they learn nothing else about decision
analysis from their studies, distinction between
outcome and decisions will have been worth
the price of admission”
Ron Howard, Professor at Stanford University
Father of Decision Analysis
Part 3: Data Science in Traveloka
Good Bad
Good Took a taxi and arrived safely Drive home and arrived safely
Bad Took a taxi and involved in accident Drive home and involved in accident
Decisions
Outcome
34. Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
35. Data Science Framework: CRISP-DM
Business
Data
Data Prep
Model
Evaluation
Deployment
Common
Sense
Part 3: Data Science in Traveloka
36. “Hiding within those
mounds of data is
knowledge that could
change the life of a
patient, or change the
world”
-Atul Butte, Stanford-
We use open source library
for data science
Wrangling
• data.table
• dplyr
• sparkR
• sparklyr
• pandas
• pyspark
Visualizatio
n
• ggplot
• matplotlib
• seaborn
• shiny
Statistics
• R
• JAGS
• STAN
• Python
• Julia
Machine
Learning
• scikit-learn
• caret
• e1071
• fbprophet
Part 3: Data Science in Traveloka
37. Are we using the algorithm? Or being used by it?
Classification Linear Models
Naïve Bayes
Classifier
Support Vector
Classifier
Vowpal Wabbit
Classifier
Random Forest
Decision Trees
Neural Network
Extreme Gradient
Boosted Trees
Many more algos!
Prediction
Linear Models
Nystroem
Regressor
Support Vector
Regressor
Vowpal Wabbit
Regressor
Random Forest
Decision Trees
Neural Network
Extreme Gradient
Boosted Trees
More Algos!
• Scikit-learn
• Caret
• TensorFlow
• …
Part 3: Data Science in Traveloka
38. We need more than just off the shelf libraries to
feed data hungry people
Bayesian Network Markov Chain Monte Carlo
Part 3: Data Science in Traveloka
39. Part 3: Data Science in Traveloka
Three
Things to
Discuss
Today
Data Science Purpose
Tools of the Trade
Model Evaluations and Applications
40. Model Evaluation: judging the usefulness of your model
Rule #1
Never ever peek at the test set during training/validation
Rule #2
You can never satisfy all the metrics,
pick one or two metrics as your decision criteria beforehand
Rule #3
Always do comparative statics on the final model
Part 3: Data Science in Traveloka
42. Remember the end goal: decisions
What should
we do?
What
might
happen
Part 3: Data Science in Traveloka
43. “But in my view,
obsessive customer focus
is by far the most protective of
Day 1 vitality”
Our data is telling us:
• What do they want?
• Do we serve their needs?
• Are they trying to leave us?
Part 3: Data Science in Traveloka
My name is Jeff