SlideShare a Scribd company logo
Data Science that Vacation.
Using Data Science to find where you should take your next vacation.
WIFI: Eastern Foundry Guest
Password: FoundryGuest@!!
http://bit.ly/ds-event
TJ Stalcup
Lead DC Mentor @Thinkful
API Evangelist @WealthEngine
Pokemon Master
About Us
Jennifer
Thinkful Student
Recent Graduate (applause)
***see slide***
Speaker notes
What's your name?
What do you do?
Why are you interested in data science?
About you
Online Bootcamp since 2012. We have worked
with over 6000 students around the world
paired up with over 300 mentors.
We get you ready for a career and guarantee
your first job
92% success rate
About Thinkful
Local DC Crew
Vacationing is fun.
Planning a vacation is not.
Data science can help.
TONIGHT: Data, Vacation, and AI
A text analyzer to take your writeup of your dream vacation and find your best match.
To do that we need 3 things:
A set of vacation reviews (we're going to use hotel reviews)
A text based model for hotel matching
Dream vacation descriptions
What we're building:
The data tonight is a sample of reviews of 1000 hotels collected by Datafinity, available
on Kaggle .
Has information about the hotel (name, location, etc)
Information about the reviewer
Review Text
Rating
here
The Data
Text processing is a slow and involved process
This way we can make a model and perform matching in a relatively quick amount of
time
Why is it slow?
Why only 1000 hotels?
Text data is often referred to as 'unstructured data'.
But what is structure data?
Let's talk about text
Structured Data
NameName EmailEmail Date of SignupDate of Signup
TJ Stalcup tj@thinkful.com 12/13/2017
... ... ...
This data is nice. It's a table with columns and we know what to
expect.
Unstructured Data
This data is not as nice. It's unpredictable, varying in length and we
don't really know what's what. It just kind of looks like one big thing.
The text above (and this text here) is unstructured data....
The Problems with Unstructured
Unstructured data gives us a few specific problems:
- What is a data point?
- How do we compare data?
- What parts of the data matter?
An example
This is our test sentence.
So what parts of this sentence matter?
What are our data points?
An example
This is our test sentence.
The words matter! And whitespace gives us a way to find them.
An example
This is our test sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
An example
This is our test sentence.
And this is a second sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
We've taken our data and turned it into a table.
We added structure!
Bag of words
This is called a 'bag of words' approach. (It's also called vectorizing.)
We took our initial sentence and created a bag for each word.
Count the number of times we found a word that matched.
Words are columns, rows are counts
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
Punctuation and case
However, in looking at our example, something should seem logically
off.
This is our test sentence.
And this is a second sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
Punctuation and case
Things like 'This' and 'this' are not considered equal because the
computer doesn't see them as the same. The case is a difference.
This is why you (almost) always preprocess text data.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
Back to the example
This is our test sentence. ---> this is our test sentence
And this is a second sentence. ---> and this is a second sentence
thisthis isis ourour testtest sentencesentence
1 1 1 1 1
1 1 0 0 1
Getting rid of case and punctuation makes comparisons easier and
more effective (particularly on small data)
Stop words
But there's more!
Some words don't matter. They don't really tell us anything.
These are called 'stop words'.
Things like 'it', 'is', 'the' are usually just thrown out.
Back to the example
This is our test sentence. ---> this our test sentence
And this is a second sentence. ---> this second sentence
thisthis ourour testtest sentencesentence
1 1 1 1
1 0 0 1
Now we have vectors of the essentials for each sentence.
This is something we can build a model on!This is something we can build a model on!
The Model
Our model is going to be a Random Forest.
A random forest is an ensemble of decision trees to predict the most
likely class of an outcome variable.
What does that mean?
Decision Trees
A set of rules that get us to a prediction, in the form of a tree.
You can think of it like a computer building a version of 20
questions.
Decision Trees - Golf?
Random Forest
A random forest builds a lot of different decision trees and then lets
each one vote.
Our questions will be things like "Contains the word 'beach'" or
"Contains the world 'sun' 2 or more times".
The Notebook
We're going to use a Google hosted Python to build this
model.
http://bit.ly/ideal-vacationhttp://bit.ly/ideal-vacation
notebook
Shortcomings
Our model has a few weaknesses:
-What about relative frequency?
-What about context?
Relative Frequency
It has a nice beach.
vs
10 pages of text that says the word beach once.
Relative Frequency
Each one scores a 1 for beach.
TFIDF is the answer. It rates each word by its relative frequency.
So the word beach in a ten word sentence counts more than one
mention in 10000 words.
http://bit.ly/tfidf-wiki
Context
'I hate beaches and love cities'
vs
'I love beaches and hate cities'
Our model would see these as the same thing.
Context - N-Grams
We can get a sense of context with n-grams. Each feature is a set of
words rather than individual words.
So we'd get features like 'love cities' and 'hate beaches' rather than
'love' 'cities' 'hate' 'beaches'.
http://bit.ly/ngram-wiki
There's a lot more
This all falls under the banner of Natural Language Processing, or
NLP, one of the largest and most exciting fields of data science and
artificial intelligence.
It's the basis for things like chatbots and Siri and the Turing test itself.
There is a lot of fun to be had in this space.
Data Science @ Thinkful
Flexible, project-based curriculum to help you become the data
scientist you want to be
You don’t just learn skills, you get to make things
Mentor support from experts in the industry
Also, there's a job guarantee
Link for the third party audit jobs report:
https://www.thinkful.com/bootcamp-jobs-statshttps://www.thinkful.com/bootcamp-jobs-stats
Thinkful Graduates 92%92% Job Placement Rate
Learning Mentor
Career MentorProgram Manager
Local Community
You
Unprecedented Support
http://bit.ly/dc-ds-trialhttp://bit.ly/dc-ds-trial
Initial 2-week trial course
Start with Python and Statistics
Unlimited Q&A Sessions
Option to continue with full bootcamp
Financing & scholarships available
Offer valid for tonight onlyOffer valid for tonight only
Aaron LamphereAaron Lamphere
Trial Program ManagerTrial Program Manager
Thinkful Two Week Trial

More Related Content

Similar to Data Science Your Vacation

Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
Osebe Sammi
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averages
PVS-Studio
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
Subhabrata Mukherjee
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
Justin Sebok
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
Lucidworks
 
446504474-Coherence-Cohesion-JAN-6-ppt.ppt
446504474-Coherence-Cohesion-JAN-6-ppt.ppt446504474-Coherence-Cohesion-JAN-6-ppt.ppt
446504474-Coherence-Cohesion-JAN-6-ppt.ppt
JewelAhmed29
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
punedevscom
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
Data Works MD
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Forward Gradient
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
MLReview
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Shruti kar
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucidworks
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understanding
Tran Trung
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
Chengeng Ma
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
ankit_ppt
 

Similar to Data Science Your Vacation (20)

Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averages
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
446504474-Coherence-Cohesion-JAN-6-ppt.ppt
446504474-Coherence-Cohesion-JAN-6-ppt.ppt446504474-Coherence-Cohesion-JAN-6-ppt.ppt
446504474-Coherence-Cohesion-JAN-6-ppt.ppt
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understanding
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
 

More from TJ Stalcup

Intro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DCIntro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DC
TJ Stalcup
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
TJ Stalcup
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
TJ Stalcup
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
TJ Stalcup
 
Build Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSSBuild Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSS
TJ Stalcup
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
TJ Stalcup
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
TJ Stalcup
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data Science
TJ Stalcup
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
TJ Stalcup
 
Build a Game with Javascript
Build a Game with JavascriptBuild a Game with Javascript
Build a Game with Javascript
TJ Stalcup
 
Thinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSSThinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSS
TJ Stalcup
 
Build Your Own Instagram Filters
Build Your Own Instagram FiltersBuild Your Own Instagram Filters
Build Your Own Instagram Filters
TJ Stalcup
 
Choosing a Programming Language
Choosing a Programming LanguageChoosing a Programming Language
Choosing a Programming Language
TJ Stalcup
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
TJ Stalcup
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
TJ Stalcup
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
TJ Stalcup
 
Build a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScriptBuild a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScript
TJ Stalcup
 
Intro to Javascript
Intro to JavascriptIntro to Javascript
Intro to Javascript
TJ Stalcup
 
DC jQuery App
DC jQuery AppDC jQuery App
DC jQuery App
TJ Stalcup
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
TJ Stalcup
 

More from TJ Stalcup (20)

Intro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DCIntro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DC
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Build Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSSBuild Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSS
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data Science
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
 
Build a Game with Javascript
Build a Game with JavascriptBuild a Game with Javascript
Build a Game with Javascript
 
Thinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSSThinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSS
 
Build Your Own Instagram Filters
Build Your Own Instagram FiltersBuild Your Own Instagram Filters
Build Your Own Instagram Filters
 
Choosing a Programming Language
Choosing a Programming LanguageChoosing a Programming Language
Choosing a Programming Language
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
 
Build a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScriptBuild a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScript
 
Intro to Javascript
Intro to JavascriptIntro to Javascript
Intro to Javascript
 
DC jQuery App
DC jQuery AppDC jQuery App
DC jQuery App
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Data Science Your Vacation

  • 1. Data Science that Vacation. Using Data Science to find where you should take your next vacation. WIFI: Eastern Foundry Guest Password: FoundryGuest@!! http://bit.ly/ds-event
  • 2. TJ Stalcup Lead DC Mentor @Thinkful API Evangelist @WealthEngine Pokemon Master About Us Jennifer Thinkful Student Recent Graduate (applause)
  • 4. What's your name? What do you do? Why are you interested in data science? About you
  • 5. Online Bootcamp since 2012. We have worked with over 6000 students around the world paired up with over 300 mentors. We get you ready for a career and guarantee your first job 92% success rate About Thinkful Local DC Crew
  • 6. Vacationing is fun. Planning a vacation is not. Data science can help. TONIGHT: Data, Vacation, and AI
  • 7. A text analyzer to take your writeup of your dream vacation and find your best match. To do that we need 3 things: A set of vacation reviews (we're going to use hotel reviews) A text based model for hotel matching Dream vacation descriptions What we're building:
  • 8. The data tonight is a sample of reviews of 1000 hotels collected by Datafinity, available on Kaggle . Has information about the hotel (name, location, etc) Information about the reviewer Review Text Rating here The Data
  • 9. Text processing is a slow and involved process This way we can make a model and perform matching in a relatively quick amount of time Why is it slow? Why only 1000 hotels?
  • 10. Text data is often referred to as 'unstructured data'. But what is structure data? Let's talk about text
  • 11. Structured Data NameName EmailEmail Date of SignupDate of Signup TJ Stalcup tj@thinkful.com 12/13/2017 ... ... ... This data is nice. It's a table with columns and we know what to expect.
  • 12. Unstructured Data This data is not as nice. It's unpredictable, varying in length and we don't really know what's what. It just kind of looks like one big thing. The text above (and this text here) is unstructured data....
  • 13. The Problems with Unstructured Unstructured data gives us a few specific problems: - What is a data point? - How do we compare data? - What parts of the data matter?
  • 14. An example This is our test sentence. So what parts of this sentence matter? What are our data points?
  • 15. An example This is our test sentence. The words matter! And whitespace gives us a way to find them.
  • 16. An example This is our test sentence. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1
  • 17. An example This is our test sentence. And this is a second sentence. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1 0 1 0 0 1 We've taken our data and turned it into a table. We added structure!
  • 18. Bag of words This is called a 'bag of words' approach. (It's also called vectorizing.) We took our initial sentence and created a bag for each word. Count the number of times we found a word that matched. Words are columns, rows are counts ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1 0 1 0 0 1
  • 19. Punctuation and case However, in looking at our example, something should seem logically off. This is our test sentence. And this is a second sentence. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1 0 1 0 0 1
  • 20. Punctuation and case Things like 'This' and 'this' are not considered equal because the computer doesn't see them as the same. The case is a difference. This is why you (almost) always preprocess text data. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1 0 1 0 0 1
  • 21. Back to the example This is our test sentence. ---> this is our test sentence And this is a second sentence. ---> and this is a second sentence thisthis isis ourour testtest sentencesentence 1 1 1 1 1 1 1 0 0 1 Getting rid of case and punctuation makes comparisons easier and more effective (particularly on small data)
  • 22. Stop words But there's more! Some words don't matter. They don't really tell us anything. These are called 'stop words'. Things like 'it', 'is', 'the' are usually just thrown out.
  • 23. Back to the example This is our test sentence. ---> this our test sentence And this is a second sentence. ---> this second sentence thisthis ourour testtest sentencesentence 1 1 1 1 1 0 0 1 Now we have vectors of the essentials for each sentence. This is something we can build a model on!This is something we can build a model on!
  • 24. The Model Our model is going to be a Random Forest. A random forest is an ensemble of decision trees to predict the most likely class of an outcome variable. What does that mean?
  • 25. Decision Trees A set of rules that get us to a prediction, in the form of a tree. You can think of it like a computer building a version of 20 questions.
  • 27. Random Forest A random forest builds a lot of different decision trees and then lets each one vote. Our questions will be things like "Contains the word 'beach'" or "Contains the world 'sun' 2 or more times".
  • 28. The Notebook We're going to use a Google hosted Python to build this model. http://bit.ly/ideal-vacationhttp://bit.ly/ideal-vacation notebook
  • 29. Shortcomings Our model has a few weaknesses: -What about relative frequency? -What about context?
  • 30. Relative Frequency It has a nice beach. vs 10 pages of text that says the word beach once.
  • 31. Relative Frequency Each one scores a 1 for beach. TFIDF is the answer. It rates each word by its relative frequency. So the word beach in a ten word sentence counts more than one mention in 10000 words. http://bit.ly/tfidf-wiki
  • 32. Context 'I hate beaches and love cities' vs 'I love beaches and hate cities' Our model would see these as the same thing.
  • 33. Context - N-Grams We can get a sense of context with n-grams. Each feature is a set of words rather than individual words. So we'd get features like 'love cities' and 'hate beaches' rather than 'love' 'cities' 'hate' 'beaches'. http://bit.ly/ngram-wiki
  • 34. There's a lot more This all falls under the banner of Natural Language Processing, or NLP, one of the largest and most exciting fields of data science and artificial intelligence. It's the basis for things like chatbots and Siri and the Turing test itself. There is a lot of fun to be had in this space.
  • 35. Data Science @ Thinkful Flexible, project-based curriculum to help you become the data scientist you want to be You don’t just learn skills, you get to make things Mentor support from experts in the industry Also, there's a job guarantee
  • 36. Link for the third party audit jobs report: https://www.thinkful.com/bootcamp-jobs-statshttps://www.thinkful.com/bootcamp-jobs-stats Thinkful Graduates 92%92% Job Placement Rate
  • 37. Learning Mentor Career MentorProgram Manager Local Community You Unprecedented Support
  • 38. http://bit.ly/dc-ds-trialhttp://bit.ly/dc-ds-trial Initial 2-week trial course Start with Python and Statistics Unlimited Q&A Sessions Option to continue with full bootcamp Financing & scholarships available Offer valid for tonight onlyOffer valid for tonight only Aaron LamphereAaron Lamphere Trial Program ManagerTrial Program Manager Thinkful Two Week Trial