SlideShare a Scribd company logo
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data Science at Udemy
Larry Wai
Principal Data Scientist @Udemy
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Overview of talk
● What is data science?
● Udemy in a nutshell
● Data science projects at Udemy
● Data science work cycle
● What does it mean to be a data scientist?
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
What is data science?
data science in consumer internet = application of the scientific method using big data computational methods to
ascertain, predict, and utilize user behavior for business purposes
Inherits from three historical schools of thought
1. Research of natural phenomena using the scientific method
○ e.g. physics, astronomy
○ data science arises from substituting the study of natural phenomena with study of user behavior
2. Research of computational methods
○ e.g. mathematics, computer science
○ data science arises from pushing the limits of existing methods to compute that which could not be
computed before
3. Research of human behavior
○ e.g. economics, psychology
○ data science arises from applying big data to the study of microscopic human behavior, i.e. millions of
users x thousands of items = billions of user-item calculations
Other definitions (too general IMO):
● data science > statistics (only); stats does not require engineering skills
● data science > computer science (only); engineering does not require training in the scientific method
● data science > business analytics (only); analytics does not require engineering skills nor training in the scientific
method
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy in a nutshell
● consumer online education marketplace
● instructors get 50% of enrollment fee
● no certification requirements
● typical enrollment price point (paid) is $20-$40
● get to critical mass (instructors and students)
in each language through marketing
● above critical mass, leverage marketplace
(organic) driven growth
● Udemy currently has ~7 million students, ~30
thousand courses
● relevance of search and recommendations is
key to fostering growth
● learning goal data science is key to fostering
long term growth Google search trends for selected online education
companies
● Udemy (blue). Exponential marketplace growth.
● Coursera (yellow), Udacity (red), Lynda (green).
Incremental growth.
● note: this chart convinced me to join Udemy :)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy web site
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data science projects at Udemy
search & recommendation
● real time recommendation (web, mobile)
● real time search
● batch e-mail recommendation
learning goals
● course learning process optimization
● learning goal paths
● career learning goals
+ more projects
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Search and recommendation (in experiment)
Feature classes
● course historical averages
● personal historical behavior
● search term matching
Overall ranking strategy
● compute global score per visitor per
course per day
● consider modules as filters on the total
available inventory
● the module score will be the sum of the
global course scores for the top N
courses in the module
● individual courses are ranked within
each module according to the global
course score
course 1 course 2 course 3 course 4
course 5 course 6 course 7 course 8
course 9 course 10 course 11 course 12
module A
module B
module C
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Learning goals (conceptual stage)
Course learning goal clustering
● goals are hierarchical
● goals are linked
● goals are dynamic
Overall learning goal strategy
● continuously update learning goal
clustering
● quantify and evaluate student progress
towards learning goals
● identify learning goal paths according
to desired careers or hobbies
goal 1 goal 2 goal 3
goal 4 goal 5 goal 6course A
course B
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Data science work cycle
experiment
setup
exploratory
analysis
model
deployment
model
building
data
collection ideal cycling time
is ~days to
~weeks
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Exploratory analysis
● data to be explored can in general be defined
as a multi-dimensional cube, a.k.a.
“hypercube”, where each side of the
hypercube is an exploratory “dimension” and
the “measures” of the user behavior are
aggregates in each cell
● the hypercube is the minimal representation
required for the exploratory analysis; e.g. we
minimize cardinality for continuous variables
● the human mind is unable to easily
comprehend more than 3 dimensions,
therefore exploratory analysis must be broken
down into actions which project the entire
hypercube onto different dimensions in
sequence
● goal for the analyst is to understand the multi-
dimensional user behavior, which may take
many projections in sequence (~100)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
model building
● platforms such as R allow us to leverage open
source modeling packages and compare
models with relatively low overhead
● most user behavior features are non-linear
and correlated; thus, the simplest “black box”
non-linear models which handle correlations
are practical to use, e.g. decision trees
● use residuals on holdout to validate model
model
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
model deployment
● standardized predictive model markup language (PMML) allows abstraction of models in deployment
● “plug-in” model deployment is agile because no new production code is needed for model updates
● shifts focus of algo development from production code development to data mining methods
● this approach allows a single person to build and deploy models quickly
● this approach is cutting edge and is being tested now at Udemy
create training dataset
create predictive
model, e.g. decision
trees, random forest
offline analysis;
residuals;
feature importance
loop through courses,
compute feature
vector per course
compute score per
course
sort by score
predictive model
store
(PMML format)
in memory model;
load on initialization;
periodic updates
model
building
model
deployment
model
storage
model
scoring
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
experiment setup
Practical requirements for experiments, a.k.a. A/B tests
● need enough users to measure an interesting
effect
● conversely, if an effect is not large enough to
measure, then it is not interesting, at least from a
data science point of view, and potentially from a
business point of view
● e.g. an interesting effect from a business point of
view would be +5% relative lift of conversion rate
● to achieve +5% relative lift at 95% confidence level
(on say typical 1 conversion per 10 sessions),
need to have 30,000 sessions in each of A and B
samples, i.e. >60,000 sessions
● ideally, would like to measure lift within ~days; so
need >60,000 sessions per day
● Udemy currently has >200,000 sessions per day
(but 2 years ago it was more like 20,000 sessions
per day, so 10x slower to run experiments)
1. smoke test (~few days)
○ 1% for test variant(s)
○ verify that nothing is broken
○ 40% CONTROL_1, 40% CONTROL_2
○ validate that control is setup correctly
2. initial ramp (~1 week)
○ 5-10% for test variant(s)
○ sizing depends upon whether we’ve tested
something like this before, and any
revenue concerns
3. intermediate ramp (~few weeks)
○ 25%-50% for test variant
○ 40%-50% for CONTROL_1
4. final ramp / launch
○ 90% for test variant
○ 10% for CONTROL_1 (optional); turn off
after a few weeks of monitoring
○ rename “test” as new baseline
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
data collection
● data should be collected at the most granular
level, e.g. typically per visitor per item per day
● data should be pre-arranged in a way which
facilitates fast hypercube production, i.e. star
schema
● most granular data is located at the star core
● experiment variants can be incorporated as
an additional dimension in one of the star
limbs
core table with
grouping fields
A, B, C
limb table with
grouping field
A
limb table with
grouping fields
A, B
limb table with
grouping field
B
limb table with
grouping fields
B, C
limb table with
grouping fields
A, B, D
mapping table
with grouping
field C and
other field D
“star schema”
(with intermediate mapping)
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
What does it mean to be a data scientist?
A successful data scientist is somebody who can independently execute the entire data
science work cycle on the time scale of days to weeks.
Important personal factors
● technical chops in math, computational methods, and the scientific method
● a genuine research interest in the underlying user behavior
● good intuition for how the business works
Important environmental factors
● top-down knowledgeability and commitment to data science
● excellent data architect
● best practices data science infrastructure
Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015
Udemy is hiring!
https://about.udemy.com/careers/

More Related Content

What's hot

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaret
Databricks
 
Graph-Powered Machine Learning
Graph-Powered Machine Learning Graph-Powered Machine Learning
Graph-Powered Machine Learning
GraphAware
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
Karthik Murugesan
 
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
Johann Schleier-Smith
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
Aseem Bansal
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
QuantUniversity
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Ikanow oanyc summit
Ikanow oanyc summitIkanow oanyc summit
Ikanow oanyc summit
Open Analytics
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
ODSC East 2018
ODSC East 2018ODSC East 2018
ODSC East 2018
Cameron Sim
 
Data analysis@network programming
Data analysis@network programmingData analysis@network programming
Data analysis@network programming
Rama .
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
Aravindharamanan S
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
Srinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
WeCloudData
 

What's hot (20)

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Machine Learning with PyCaret
Machine Learning with PyCaretMachine Learning with PyCaret
Machine Learning with PyCaret
 
Graph-Powered Machine Learning
Graph-Powered Machine Learning Graph-Powered Machine Learning
Graph-Powered Machine Learning
 
Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
Data Con LA 2019 - Big Data Modeling with Spark SQL: Make data valuable by Ja...
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022
 
Choosing data warehouse considerations
Choosing data warehouse considerationsChoosing data warehouse considerations
Choosing data warehouse considerations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Big data, Analytics and Beyond
Big data, Analytics and BeyondBig data, Analytics and Beyond
Big data, Analytics and Beyond
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Ikanow oanyc summit
Ikanow oanyc summitIkanow oanyc summit
Ikanow oanyc summit
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
ODSC East 2018
ODSC East 2018ODSC East 2018
ODSC East 2018
 
Data analysis@network programming
Data analysis@network programmingData analysis@network programming
Data analysis@network programming
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 

Similar to Data Science at Udemy

VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
eMadrid network
 
Exploring learning analytics
Exploring learning analyticsExploring learning analytics
Exploring learning analytics
Jisc
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
Benjamin Bengfort
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
dataKarthik
 
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Joshua
 
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at TribalSoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
Chris Ballard
 
Research in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at TribalResearch in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at Tribal
LACE Project
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
Brodmann17
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning model
Open Cyber University of Korea
 
What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?
Leonardo Moraes
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Rafael Scapin, Ph.D.
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro Slides
Paul Bailey
 
Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16
Paul Bailey
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 
UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016
Paul Bailey
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learned
mwebbjisc
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
Ferdin Joe John Joseph PhD
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
nicholes21
 

Similar to Data Science at Udemy (20)

VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
VII Jornadas eMadrid "Education in exponential times". "Open Analytics in an ...
 
Exploring learning analytics
Exploring learning analyticsExploring learning analytics
Exploring learning analytics
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
 
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
Open Learning Analytics Strategy for Student Success: The North Carolina Stat...
 
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at TribalSoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
SoLAR Flare 2015 - Turning Learning Analytics Research into Practice at Tribal
 
Research in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at TribalResearch in to Practice: Building and implementing learning analytics at Tribal
Research in to Practice: Building and implementing learning analytics at Tribal
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning model
 
What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?What is Machine Learning Operations (MLOps)?
What is Machine Learning Operations (MLOps)?
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
Phase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro SlidesPhase 1 Learning Analytics Intro Slides
Phase 1 Learning Analytics Intro Slides
 
Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16Intro to jisc Learning Analytics March 16
Intro to jisc Learning Analytics March 16
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016UKSG Jisc learninganalytics-3june2016
UKSG Jisc learninganalytics-3june2016
 
Lak2018: Scaling Nationally: Seven Lesson Learned
Lak2018:  Scaling Nationally: Seven Lesson LearnedLak2018:  Scaling Nationally: Seven Lesson Learned
Lak2018: Scaling Nationally: Seven Lesson Learned
 
2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...Performance Management to Program Evaluation: Creating a Complementary Connec...
Performance Management to Program Evaluation: Creating a Complementary Connec...
 

Recently uploaded

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 

Data Science at Udemy

  • 1. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data Science at Udemy Larry Wai Principal Data Scientist @Udemy
  • 2. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Overview of talk ● What is data science? ● Udemy in a nutshell ● Data science projects at Udemy ● Data science work cycle ● What does it mean to be a data scientist?
  • 3. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 What is data science? data science in consumer internet = application of the scientific method using big data computational methods to ascertain, predict, and utilize user behavior for business purposes Inherits from three historical schools of thought 1. Research of natural phenomena using the scientific method ○ e.g. physics, astronomy ○ data science arises from substituting the study of natural phenomena with study of user behavior 2. Research of computational methods ○ e.g. mathematics, computer science ○ data science arises from pushing the limits of existing methods to compute that which could not be computed before 3. Research of human behavior ○ e.g. economics, psychology ○ data science arises from applying big data to the study of microscopic human behavior, i.e. millions of users x thousands of items = billions of user-item calculations Other definitions (too general IMO): ● data science > statistics (only); stats does not require engineering skills ● data science > computer science (only); engineering does not require training in the scientific method ● data science > business analytics (only); analytics does not require engineering skills nor training in the scientific method
  • 4. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy in a nutshell ● consumer online education marketplace ● instructors get 50% of enrollment fee ● no certification requirements ● typical enrollment price point (paid) is $20-$40 ● get to critical mass (instructors and students) in each language through marketing ● above critical mass, leverage marketplace (organic) driven growth ● Udemy currently has ~7 million students, ~30 thousand courses ● relevance of search and recommendations is key to fostering growth ● learning goal data science is key to fostering long term growth Google search trends for selected online education companies ● Udemy (blue). Exponential marketplace growth. ● Coursera (yellow), Udacity (red), Lynda (green). Incremental growth. ● note: this chart convinced me to join Udemy :)
  • 5. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy web site
  • 6. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data science projects at Udemy search & recommendation ● real time recommendation (web, mobile) ● real time search ● batch e-mail recommendation learning goals ● course learning process optimization ● learning goal paths ● career learning goals + more projects
  • 7. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Search and recommendation (in experiment) Feature classes ● course historical averages ● personal historical behavior ● search term matching Overall ranking strategy ● compute global score per visitor per course per day ● consider modules as filters on the total available inventory ● the module score will be the sum of the global course scores for the top N courses in the module ● individual courses are ranked within each module according to the global course score course 1 course 2 course 3 course 4 course 5 course 6 course 7 course 8 course 9 course 10 course 11 course 12 module A module B module C
  • 8. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Learning goals (conceptual stage) Course learning goal clustering ● goals are hierarchical ● goals are linked ● goals are dynamic Overall learning goal strategy ● continuously update learning goal clustering ● quantify and evaluate student progress towards learning goals ● identify learning goal paths according to desired careers or hobbies goal 1 goal 2 goal 3 goal 4 goal 5 goal 6course A course B
  • 9. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Data science work cycle experiment setup exploratory analysis model deployment model building data collection ideal cycling time is ~days to ~weeks
  • 10. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Exploratory analysis ● data to be explored can in general be defined as a multi-dimensional cube, a.k.a. “hypercube”, where each side of the hypercube is an exploratory “dimension” and the “measures” of the user behavior are aggregates in each cell ● the hypercube is the minimal representation required for the exploratory analysis; e.g. we minimize cardinality for continuous variables ● the human mind is unable to easily comprehend more than 3 dimensions, therefore exploratory analysis must be broken down into actions which project the entire hypercube onto different dimensions in sequence ● goal for the analyst is to understand the multi- dimensional user behavior, which may take many projections in sequence (~100)
  • 11. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 model building ● platforms such as R allow us to leverage open source modeling packages and compare models with relatively low overhead ● most user behavior features are non-linear and correlated; thus, the simplest “black box” non-linear models which handle correlations are practical to use, e.g. decision trees ● use residuals on holdout to validate model model
  • 12. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 model deployment ● standardized predictive model markup language (PMML) allows abstraction of models in deployment ● “plug-in” model deployment is agile because no new production code is needed for model updates ● shifts focus of algo development from production code development to data mining methods ● this approach allows a single person to build and deploy models quickly ● this approach is cutting edge and is being tested now at Udemy create training dataset create predictive model, e.g. decision trees, random forest offline analysis; residuals; feature importance loop through courses, compute feature vector per course compute score per course sort by score predictive model store (PMML format) in memory model; load on initialization; periodic updates model building model deployment model storage model scoring
  • 13. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 experiment setup Practical requirements for experiments, a.k.a. A/B tests ● need enough users to measure an interesting effect ● conversely, if an effect is not large enough to measure, then it is not interesting, at least from a data science point of view, and potentially from a business point of view ● e.g. an interesting effect from a business point of view would be +5% relative lift of conversion rate ● to achieve +5% relative lift at 95% confidence level (on say typical 1 conversion per 10 sessions), need to have 30,000 sessions in each of A and B samples, i.e. >60,000 sessions ● ideally, would like to measure lift within ~days; so need >60,000 sessions per day ● Udemy currently has >200,000 sessions per day (but 2 years ago it was more like 20,000 sessions per day, so 10x slower to run experiments) 1. smoke test (~few days) ○ 1% for test variant(s) ○ verify that nothing is broken ○ 40% CONTROL_1, 40% CONTROL_2 ○ validate that control is setup correctly 2. initial ramp (~1 week) ○ 5-10% for test variant(s) ○ sizing depends upon whether we’ve tested something like this before, and any revenue concerns 3. intermediate ramp (~few weeks) ○ 25%-50% for test variant ○ 40%-50% for CONTROL_1 4. final ramp / launch ○ 90% for test variant ○ 10% for CONTROL_1 (optional); turn off after a few weeks of monitoring ○ rename “test” as new baseline
  • 14. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 data collection ● data should be collected at the most granular level, e.g. typically per visitor per item per day ● data should be pre-arranged in a way which facilitates fast hypercube production, i.e. star schema ● most granular data is located at the star core ● experiment variants can be incorporated as an additional dimension in one of the star limbs core table with grouping fields A, B, C limb table with grouping field A limb table with grouping fields A, B limb table with grouping field B limb table with grouping fields B, C limb table with grouping fields A, B, D mapping table with grouping field C and other field D “star schema” (with intermediate mapping)
  • 15. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 What does it mean to be a data scientist? A successful data scientist is somebody who can independently execute the entire data science work cycle on the time scale of days to weeks. Important personal factors ● technical chops in math, computational methods, and the scientific method ● a genuine research interest in the underlying user behavior ● good intuition for how the business works Important environmental factors ● top-down knowledgeability and commitment to data science ● excellent data architect ● best practices data science infrastructure
  • 16. Ankara Big Data Meetup - Bilkent Cyberpark, August 5, 2015 Udemy is hiring! https://about.udemy.com/careers/