SlideShare a Scribd company logo
1 of 16
Download to read offline
AdmitSee	Analytics	
Mike	Yung	
mikeyung	
yungmsh	
yungmsh
Context	
Part	I:	The	Model	
–  Can	we	build	a	model	that	predicts	a	student’s	chances*	
of	being	admitted	into	college?	
Part	II:	The	Essay	
–  What	insights	can	we	glean	from	the	Common	App	
essay?	
*There	are	some	resources	that	‘calculate’	your	chances	based	on	your	GPA,	SAT,	and	demographics,	but	none	
(at	least	publicly	available)	that	take	into	account	detailed	factors	such	as	specific	extracurriculars,	academic	
trajectory,	the	Common	App	essay	etc.
Part	1:	The	Model	
Can	we	build	a	model	that	predicts	a	student’s	
chances	of	being	admitted	into	college?
Evaluating	the	Model	
Test	Set	
Accuracy	 88.1	 87.8	 88.0	
Precision	 62.8	 61.9	 57.7	
Recall	 25.4	 21.6	 37.1	
Target	Metric:	Precision	
False	Positive	Rate	
True	Positive	Rate	
ROC	Curve	(Test)
Interpreting	the	Model	
Logistic	Regression	Model	
Variable	 eCoefficient	
Leader	 2.26	
Student	Gov	 1.69	
Varsity	Sport	 1.58	
Sports	Captain	 1.29	
Award	 1.22	
Community	Service	 1.21	
SAT	Score	 1.0003	
SAT	Times	Taken	 0.39	
How	to	Interpret?	
If	you	aren’t	already	in	a	
leadership	position,	taking	
one	will	more	than	double	
your	odds	of	being	admitted	
*Note:	this	is	only	a	subset	of	all	variables	
used	in	the	model
Part	2:	The	Essay	
What	insights	can	we	glean	from	the	Common	
App	essay?
What’s	in	an	Essay?	
Family	
Music/Arts	
Culture	
Sports	
Personal	Story	
Science	
Career	
mother, father, family,
parent, sister, ...
music, play, piano,
perform, theater, ...
culture, world, language,
travel, chinese, ...
team, game, coach,
player, season, ...
feel, think, friend,
love, life, moment, ...
research, science,
technology, math,...
work, success, career,
educate, community,...
Topic	Distribution	of	a	Sample	Essay
A	2-D	Representation	of	College	Essays	
Cultural	Focus	
Career	
Driven	
Scientific	Focus	
Personality	
Driven	
Step	 Vectorize	Text	 Topic	Modeling	 Visualize	in	2D	 Clustering	
Method	 TF-IDF	 NMF	 PCA	 K-Means	
*Each	point	represents	a	college/university
Cultural	Focus	
Career	
Driven	
Scientific	Focus	
Personality	
Driven	
*Each	point	represents	a	college/university	
A	2-D	Representation	of	College	Essays	
Bowdoin	
Skidmore	
Wellesley	
Amherst	
Middlebury	
CalTech	
MIT	CMU	
Purdue	
Arizona	
State	
Michigan	
State	
Cal	State	
LB 		
SD	State	
Step	 Vectorize	Text	 Topic	Modeling	 Visualize	in	2D	 Clustering	
Method	 TF-IDF	 NMF	 PCA	 K-Means
A	2-D	Representation	of	College	Essays	
Cultural	Focus	
Career	
Driven	
Scientific	Focus	
Personality	
Driven	
*Each	point	represents	a	college/university	
Step	 Vectorize	Text	 Topic	Modeling	 Visualize	in	2D	 Clustering	
Method	 TF-IDF	 NMF	 PCA	 K-Means	
STEM	
State	Schools	Liberal	Arts	
Mixed	
Ivy	League
Final	Thoughts	
•  Limitation	of	data	
–  Only	enough	to	model	‘top	school’	admittance	
–  With	more	data:	
•  School-level	model	
•  Graduate	school	model	
•  Explore	deeper	feature-engineering	
–  Interaction	effects	(e.g.	Varsity*Captain)	
–  Deeper	effects	(e.g.	Hispanic	student	leading	an	African-American	society)	
•  Refine	topic	modeling	with	LDA
Thank		You	
Did	You	Know?	
•  If	you	gain	an	additional	100	
points	on	your	SAT,	you	can	
increase	your	odds	of	being	
admitted	by	3%	
•  If	you	take	the	SATs	one	more	
time,	you	can	reduce	your	odds	
of	being	admitted	by	61%	
mikeyung	
yungmsh	
yungmsh
Non-Essay	Features	
	
(GPA,	SAT,	
Demographics,	
Extra-curriculars,	etc.)	
Essay	Features	
(Word	Sophistication,	
Latent	Topics)	
Accepted	
/	
Rejected	
Inputs	
TFIDF	-->	NMF		
Probabilities	
m1_prob	
m2_prob	
Prediction	Modeling	
Final	
Predictions	
Feature	
Engineering	
Model	Pipeline	
Grid	Search	
Appendix	1:	Ensemble	Model	Pipeline
Train	Set	
Logistic	Regression	underfits	
Random	Forest	overfits	
Accuracy	 92.4	 91.2	 94.2	
Precision	 93.1	 91.2	 90.1	
Recall	 45.7	 37.2	 63.3	
Target	Metric:	Precision	
False	Positive	Rate	
True	Positive	Rate	
ROC	Curve	(Train)	
Appendix	2:	ROC	Curve	for	Train	Set
Appendix	3:	NMF	Visualized	
Step	 Vectorize	Text	 Topic	Modeling	 Analyze	
Method	 TF-IDF	 NMF	(Fit)	 NMF	(Transform)	
Words	
≈	 x	
Topics	
Essays	
Essays	
Topics	
Words	
New	Matrix	
(Topic	vs.	Essay)	
Semantic	Reference	
(Word	vs.	Topic)	
Old	Matrix	
(Word	vs.	Essay)	
Visualization	of	NMF
Step	 Vectorize	Text	 Topic	Modeling	 Analyze	
Method	 TF-IDF	 NMF	(Fit)	 NMF	(Transform)	
Topics	
Words	
Non-Negative	Matrix	Factorization	
Semantic	Reference	
(Word	vs.	Topic)	
mother	 father	 family	 parent	 sister	 ….	
music	 play	 piano	 perform	 theater	 …	
culture	 world	 language	 travel	 american	 …	
team	 game	 coach	 player	 season	 …	
…	
research	 science	 computer	 technology	 math	 …	
work	 education	 career	 success	 community	 …	
Family	
Music/Arts	
Culture	
Sports	
Personal/Story	
Science	
Career	
Appendix	4:	NMF	Semantic	Reference	Table

More Related Content

Viewers also liked

Mbwl ACA prime - employer's guide to aca reporting nov 7, 2016
Mbwl   ACA prime - employer's guide to aca reporting nov 7, 2016Mbwl   ACA prime - employer's guide to aca reporting nov 7, 2016
Mbwl ACA prime - employer's guide to aca reporting nov 7, 2016Brett Bussell
 
Sterilization&disinfection
Sterilization&disinfectionSterilization&disinfection
Sterilization&disinfectionAman Ullah
 
Structure of DNA
Structure of DNAStructure of DNA
Structure of DNAAman Ullah
 
Difference between prokaryotic and eukaryotic translation
Difference between prokaryotic and eukaryotic translationDifference between prokaryotic and eukaryotic translation
Difference between prokaryotic and eukaryotic translationAman Ullah
 
Cleaning and disinfection of hospital
Cleaning and disinfection of hospitalCleaning and disinfection of hospital
Cleaning and disinfection of hospitalAman Ullah
 
Aml and all by asif.ppt.jjj
Aml and all by asif.ppt.jjjAml and all by asif.ppt.jjj
Aml and all by asif.ppt.jjjAsif Zeb
 
Constructing HRA: Blueprints for Solid Administration
Constructing HRA: Blueprints for Solid AdministrationConstructing HRA: Blueprints for Solid Administration
Constructing HRA: Blueprints for Solid Administrationbenefitexpress
 
Biochemical tests (2nd part)
Biochemical tests (2nd part)Biochemical tests (2nd part)
Biochemical tests (2nd part)Aman Ullah
 
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi Foundation
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi FoundationPMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi Foundation
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi FoundationSUNAINA SAMRIDDHI FOUNDATION
 

Viewers also liked (11)

DATA PROTECTION LAWS OF THE WORLD
DATA PROTECTION LAWS OF THE WORLDDATA PROTECTION LAWS OF THE WORLD
DATA PROTECTION LAWS OF THE WORLD
 
Mbwl ACA prime - employer's guide to aca reporting nov 7, 2016
Mbwl   ACA prime - employer's guide to aca reporting nov 7, 2016Mbwl   ACA prime - employer's guide to aca reporting nov 7, 2016
Mbwl ACA prime - employer's guide to aca reporting nov 7, 2016
 
Sterilization&disinfection
Sterilization&disinfectionSterilization&disinfection
Sterilization&disinfection
 
Structure of DNA
Structure of DNAStructure of DNA
Structure of DNA
 
Difference between prokaryotic and eukaryotic translation
Difference between prokaryotic and eukaryotic translationDifference between prokaryotic and eukaryotic translation
Difference between prokaryotic and eukaryotic translation
 
Cleaning and disinfection of hospital
Cleaning and disinfection of hospitalCleaning and disinfection of hospital
Cleaning and disinfection of hospital
 
Aml and all by asif.ppt.jjj
Aml and all by asif.ppt.jjjAml and all by asif.ppt.jjj
Aml and all by asif.ppt.jjj
 
Constructing HRA: Blueprints for Solid Administration
Constructing HRA: Blueprints for Solid AdministrationConstructing HRA: Blueprints for Solid Administration
Constructing HRA: Blueprints for Solid Administration
 
Biochemical tests (2nd part)
Biochemical tests (2nd part)Biochemical tests (2nd part)
Biochemical tests (2nd part)
 
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi Foundation
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi FoundationPMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi Foundation
PMKVY 2.0 - Branding and communication guidelines - Sunaina Samriddhi Foundation
 
Debugging
DebuggingDebugging
Debugging
 

Similar to Mike_Yung

IRJET- Attribute Based Adaptive Evaluation System
IRJET-  	  Attribute Based Adaptive Evaluation SystemIRJET-  	  Attribute Based Adaptive Evaluation System
IRJET- Attribute Based Adaptive Evaluation SystemIRJET Journal
 
software engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonesoftware engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonerebantaofficial
 
Gqm paper
Gqm paperGqm paper
Gqm paperinandhu
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningIRJET Journal
 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Romania
 
project work final version
project work final versionproject work final version
project work final versionMohammed Naji
 
IRJET- Online Examination System
IRJET- Online Examination SystemIRJET- Online Examination System
IRJET- Online Examination SystemIRJET Journal
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET Journal
 
Trends over time
Trends over timeTrends over time
Trends over timedjleach
 
Fyp final presentation
Fyp final presentationFyp final presentation
Fyp final presentationcrahmusa
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxdaniahendric
 
Fyp final presentation
Fyp final presentationFyp final presentation
Fyp final presentationcrahmusa
 
Brick56 130404015033-phpapp01 (1)
Brick56 130404015033-phpapp01 (1)Brick56 130404015033-phpapp01 (1)
Brick56 130404015033-phpapp01 (1)Yahaira Rodriguez
 
IRJET- Institution Evaluation System
IRJET- Institution Evaluation SystemIRJET- Institution Evaluation System
IRJET- Institution Evaluation SystemIRJET Journal
 
Iisrt shiju george (cs)
Iisrt shiju george (cs)Iisrt shiju george (cs)
Iisrt shiju george (cs)IISRT
 
An Adaptive Evaluation System to Test Student Caliber using Item Response Theory
An Adaptive Evaluation System to Test Student Caliber using Item Response TheoryAn Adaptive Evaluation System to Test Student Caliber using Item Response Theory
An Adaptive Evaluation System to Test Student Caliber using Item Response TheoryEditor IJMTER
 
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET Journal
 
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET Journal
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for StudentsIRJET Journal
 

Similar to Mike_Yung (20)

IRJET- Attribute Based Adaptive Evaluation System
IRJET-  	  Attribute Based Adaptive Evaluation SystemIRJET-  	  Attribute Based Adaptive Evaluation System
IRJET- Attribute Based Adaptive Evaluation System
 
software engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonesoftware engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyone
 
Gqm paper
Gqm paperGqm paper
Gqm paper
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine Learning
 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
 
F343236
F343236F343236
F343236
 
project work final version
project work final versionproject work final version
project work final version
 
IRJET- Online Examination System
IRJET- Online Examination SystemIRJET- Online Examination System
IRJET- Online Examination System
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various Courses
 
Trends over time
Trends over timeTrends over time
Trends over time
 
Fyp final presentation
Fyp final presentationFyp final presentation
Fyp final presentation
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
Fyp final presentation
Fyp final presentationFyp final presentation
Fyp final presentation
 
Brick56 130404015033-phpapp01 (1)
Brick56 130404015033-phpapp01 (1)Brick56 130404015033-phpapp01 (1)
Brick56 130404015033-phpapp01 (1)
 
IRJET- Institution Evaluation System
IRJET- Institution Evaluation SystemIRJET- Institution Evaluation System
IRJET- Institution Evaluation System
 
Iisrt shiju george (cs)
Iisrt shiju george (cs)Iisrt shiju george (cs)
Iisrt shiju george (cs)
 
An Adaptive Evaluation System to Test Student Caliber using Item Response Theory
An Adaptive Evaluation System to Test Student Caliber using Item Response TheoryAn Adaptive Evaluation System to Test Student Caliber using Item Response Theory
An Adaptive Evaluation System to Test Student Caliber using Item Response Theory
 
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
 
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
IRJET- Personalized E-Learning using Learner’s Capability Score (LCS)
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for Students
 

Mike_Yung