9654467111 Call Girls In Munirka Hotel And Home Service
Data Science Your Vacation
1. Data Science that Vacation.
Using Data Science to find where you should take your next vacation.
WIFI: Eastern Foundry Guest
Password: FoundryGuest@!!
http://bit.ly/ds-event
2. TJ Stalcup
Lead DC Mentor @Thinkful
API Evangelist @WealthEngine
Pokemon Master
About Us
Jennifer
Thinkful Student
Recent Graduate (applause)
5. Online Bootcamp since 2012. We have worked
with over 6000 students around the world
paired up with over 300 mentors.
We get you ready for a career and guarantee
your first job
92% success rate
About Thinkful
Local DC Crew
7. A text analyzer to take your writeup of your dream vacation and find your best match.
To do that we need 3 things:
A set of vacation reviews (we're going to use hotel reviews)
A text based model for hotel matching
Dream vacation descriptions
What we're building:
8. The data tonight is a sample of reviews of 1000 hotels collected by Datafinity, available
on Kaggle .
Has information about the hotel (name, location, etc)
Information about the reviewer
Review Text
Rating
here
The Data
9. Text processing is a slow and involved process
This way we can make a model and perform matching in a relatively quick amount of
time
Why is it slow?
Why only 1000 hotels?
10. Text data is often referred to as 'unstructured data'.
But what is structure data?
Let's talk about text
11. Structured Data
NameName EmailEmail Date of SignupDate of Signup
TJ Stalcup tj@thinkful.com 12/13/2017
... ... ...
This data is nice. It's a table with columns and we know what to
expect.
12. Unstructured Data
This data is not as nice. It's unpredictable, varying in length and we
don't really know what's what. It just kind of looks like one big thing.
The text above (and this text here) is unstructured data....
13. The Problems with Unstructured
Unstructured data gives us a few specific problems:
- What is a data point?
- How do we compare data?
- What parts of the data matter?
14. An example
This is our test sentence.
So what parts of this sentence matter?
What are our data points?
15. An example
This is our test sentence.
The words matter! And whitespace gives us a way to find them.
16. An example
This is our test sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
17. An example
This is our test sentence.
And this is a second sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
We've taken our data and turned it into a table.
We added structure!
18. Bag of words
This is called a 'bag of words' approach. (It's also called vectorizing.)
We took our initial sentence and created a bag for each word.
Count the number of times we found a word that matched.
Words are columns, rows are counts
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
19. Punctuation and case
However, in looking at our example, something should seem logically
off.
This is our test sentence.
And this is a second sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
20. Punctuation and case
Things like 'This' and 'this' are not considered equal because the
computer doesn't see them as the same. The case is a difference.
This is why you (almost) always preprocess text data.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
21. Back to the example
This is our test sentence. ---> this is our test sentence
And this is a second sentence. ---> and this is a second sentence
thisthis isis ourour testtest sentencesentence
1 1 1 1 1
1 1 0 0 1
Getting rid of case and punctuation makes comparisons easier and
more effective (particularly on small data)
22. Stop words
But there's more!
Some words don't matter. They don't really tell us anything.
These are called 'stop words'.
Things like 'it', 'is', 'the' are usually just thrown out.
23. Back to the example
This is our test sentence. ---> this our test sentence
And this is a second sentence. ---> this second sentence
thisthis ourour testtest sentencesentence
1 1 1 1
1 0 0 1
Now we have vectors of the essentials for each sentence.
This is something we can build a model on!This is something we can build a model on!
24. The Model
Our model is going to be a Random Forest.
A random forest is an ensemble of decision trees to predict the most
likely class of an outcome variable.
What does that mean?
25. Decision Trees
A set of rules that get us to a prediction, in the form of a tree.
You can think of it like a computer building a version of 20
questions.
27. Random Forest
A random forest builds a lot of different decision trees and then lets
each one vote.
Our questions will be things like "Contains the word 'beach'" or
"Contains the world 'sun' 2 or more times".
28. The Notebook
We're going to use a Google hosted Python to build this
model.
http://bit.ly/ideal-vacationhttp://bit.ly/ideal-vacation
notebook
31. Relative Frequency
Each one scores a 1 for beach.
TFIDF is the answer. It rates each word by its relative frequency.
So the word beach in a ten word sentence counts more than one
mention in 10000 words.
http://bit.ly/tfidf-wiki
32. Context
'I hate beaches and love cities'
vs
'I love beaches and hate cities'
Our model would see these as the same thing.
33. Context - N-Grams
We can get a sense of context with n-grams. Each feature is a set of
words rather than individual words.
So we'd get features like 'love cities' and 'hate beaches' rather than
'love' 'cities' 'hate' 'beaches'.
http://bit.ly/ngram-wiki
34. There's a lot more
This all falls under the banner of Natural Language Processing, or
NLP, one of the largest and most exciting fields of data science and
artificial intelligence.
It's the basis for things like chatbots and Siri and the Turing test itself.
There is a lot of fun to be had in this space.
35. Data Science @ Thinkful
Flexible, project-based curriculum to help you become the data
scientist you want to be
You don’t just learn skills, you get to make things
Mentor support from experts in the industry
Also, there's a job guarantee
36. Link for the third party audit jobs report:
https://www.thinkful.com/bootcamp-jobs-statshttps://www.thinkful.com/bootcamp-jobs-stats
Thinkful Graduates 92%92% Job Placement Rate
38. http://bit.ly/dc-ds-trialhttp://bit.ly/dc-ds-trial
Initial 2-week trial course
Start with Python and Statistics
Unlimited Q&A Sessions
Option to continue with full bootcamp
Financing & scholarships available
Offer valid for tonight onlyOffer valid for tonight only
Aaron LamphereAaron Lamphere
Trial Program ManagerTrial Program Manager
Thinkful Two Week Trial