Machine Learning for
Non-technical People
Slater Victoroff
Designed by freepik.com
YOU!
The non-technical
audience interested in
Learning about Machine
Learning!
Who is this talk for?
Who am I?
• Slater Victoroff
• Olin College of Engineering
• Typical young hoodie, flip-flop
wearing entrepreneur
• Someone who cares very
deeply about machine learning
• CEO of indico
What is Machine
Learning?
Such a big buzzword.
Here’s what it comes down to in a human definition:
A class of computer algorithms and mathematical
models that allow machines to perform general tasks,
like identifying human faces in photos. The models
are used to make predictions and decisions, which
you can then use to solve real world problems, such
as understanding how your customers feel about
your brand across various social media channels.
The neat thing is that instead of hiring 100 people to
analyze 1,000 data points each, you could get a
single machine to do it in a fraction of the time.
Quick Poll
Can you use machine learning in the
following industries?
Factories
Smart Phones
Robots
Human Robots
NOT HUMAN ROBOTS
Machine Learning is
Blurry
Language is blurry — sarcasm, etc.
Where there’s a gray area,
machine learning can solve the issue.
Computers are bad at the world
when there is inconsistency.
Say you’re a brand and you want to know what
people are saying about your brand.
You look through everyone talking about
your brand on Twitter, Facebook, etc..
Now you want to look at how popular
those people are to find your influencers.
And finally, you want to know… what are they talking about?
In the old spreadsheet way, we have always just ignored these
problems as they were in a gray area we couldn’t access.
A social media example
Machine
learning is born
in very ordinary
circumstances
• Marty McFly ended up in 1955 which is the same year
that the first branch of ML came out (AI movie to come
later)
• Georgetown and IBM Cold War found ML to be useful as
they wanted to translate a large amount of Russian text
to analyze
• MIT went after the image side, teaching computers to
recognize objects and scenes. They tried to teach the
computer to look at a picture and determine a bird or a
plant.
Machine Translation will
be a Solved Problem in
Three to Five Years
- Optimistic Researcher 1954
CSAIL
• The Computer Science and Artificial Intelligence
Laboratory – known as CSAIL is the largest
research laboratory at MIT and one of the world’s
most important centers of information technology
research.
• Founded in the 1940’s by Marvin Minsky
We’re pretty sure we bit
off more than we can
chew here
- ALPAC 1966
• Committees were spun up to precise translation
and recognition.
• In one solid decade, we effectively made no
progress. We had one-off ML systems.
• We could teach a computer to understand one
sentence by showing it that one sentence.
• We made no progress, spent a lot of money, and
cut the research. It was the death of an era.
During that time…
Time Passes
Arnold brings us back!
Machine Learning Goes
Mainstream
Thumbs up?
Sentiment classification
using
machine learning
techniques.
Bo Pang, Lillian Lee, and Shivakumar
Vaithyanathan.
Sentiment analysis = determine if a piece of text is
positive or negative.
How do we do it?
Well, we map each word to its sentiment and give
the words a score.
AKA: A Lexicon-based approach
Sentiment Analysis
Word Positivity
Great 0.9
Terrible 0.1
Alright 0.6
Mediocre 0.4
This sandwich isn’t
bad
Words Positivity
Isn’t bad 0.6
Isn’t good 0.3
Ain’t half-bad 0.73
Above average 0.7
“I have to say, that while most of
my experiences at tourists traps
have been horrendous, the one I
recently went to broke the pattern.”
• Many humans can’t figure out the sentiment of this
sentence
• Gray areas of language = why sentiment analysis is
quite a difficult problem for computers to solve
How do we know how well we’re doing?
How do we know how good AI is?
• Well, it’s hard
• Take a spreadsheet
• Label each piece of text for positive vs.
negative
• Guess which words made it positive or negative
• Train the model on half of the spreadsheet and
then make predictions on the other half
Then what.
Train
Test
Still, it’s not that simple
Performance metrics
Overfitting
Customer Did they buy?
1 No
2 No
3 No
4 No
5 No
6 Yes
7 No
8 No
9 No
10 No
11 No
12 Yes
13 No
14 No
Performance Metrics
- Accuracy isn’t necessarily the best performance metric
- Predicting sentiment is a very different problem depending on whether the text
you’re making predictions on consists of Amazon reviews, tweets, or medical
journals
- It also depends on how much data you’ve got
- When you teach a computer what sentiment is, you end up showing it a huge
number of examples. Depending on the data you’ve got, the number of examples
you might use range from a few hundred to hundreds of millions
- It’s not fair to use those examples to check your model’s accuracy — you already
know the answers
Performance Metrics
Learn more about sentiment analysis and
performance metrics:
What Even Is Sentiment Analysis?
Precision: fraction of retrieved instances that are relevant
Recall: fraction of relevant instances that are retrieved
Precision vs Recall
Overfitting
This product left me with a deep feeling of regret.
This film left me with a deep feeling of regret,
love, and hopelessness for a life not lived.
I #love these new @nike shoes
Overfitting
• Overfitting means you “fail to generalise to examples outside of
your training set”
• In other words…you’re living under a rock. You’re great at
recognizing everything under your rock, but you don’t
understand the rest of the world
• Domain is a factor — there are so many different kinds of text
(scientific journal articles vs. tweets)
• No one model is going to be the best at every kind of text
KNOWLEDGE = POWER
Email us: contact@indico.io

Machine Learning for Non-technical People

  • 1.
    Machine Learning for Non-technicalPeople Slater Victoroff
  • 2.
    Designed by freepik.com YOU! Thenon-technical audience interested in Learning about Machine Learning! Who is this talk for?
  • 3.
    Who am I? •Slater Victoroff • Olin College of Engineering • Typical young hoodie, flip-flop wearing entrepreneur • Someone who cares very deeply about machine learning • CEO of indico
  • 4.
  • 5.
    Such a bigbuzzword. Here’s what it comes down to in a human definition: A class of computer algorithms and mathematical models that allow machines to perform general tasks, like identifying human faces in photos. The models are used to make predictions and decisions, which you can then use to solve real world problems, such as understanding how your customers feel about your brand across various social media channels. The neat thing is that instead of hiring 100 people to analyze 1,000 data points each, you could get a single machine to do it in a fraction of the time.
  • 6.
    Quick Poll Can youuse machine learning in the following industries?
  • 7.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Language is blurry— sarcasm, etc. Where there’s a gray area, machine learning can solve the issue. Computers are bad at the world when there is inconsistency.
  • 15.
    Say you’re abrand and you want to know what people are saying about your brand. You look through everyone talking about your brand on Twitter, Facebook, etc.. Now you want to look at how popular those people are to find your influencers. And finally, you want to know… what are they talking about? In the old spreadsheet way, we have always just ignored these problems as they were in a gray area we couldn’t access. A social media example
  • 16.
    Machine learning is born invery ordinary circumstances
  • 18.
    • Marty McFlyended up in 1955 which is the same year that the first branch of ML came out (AI movie to come later) • Georgetown and IBM Cold War found ML to be useful as they wanted to translate a large amount of Russian text to analyze • MIT went after the image side, teaching computers to recognize objects and scenes. They tried to teach the computer to look at a picture and determine a bird or a plant.
  • 19.
    Machine Translation will bea Solved Problem in Three to Five Years - Optimistic Researcher 1954
  • 20.
    CSAIL • The ComputerScience and Artificial Intelligence Laboratory – known as CSAIL is the largest research laboratory at MIT and one of the world’s most important centers of information technology research. • Founded in the 1940’s by Marvin Minsky
  • 22.
    We’re pretty surewe bit off more than we can chew here - ALPAC 1966
  • 23.
    • Committees werespun up to precise translation and recognition. • In one solid decade, we effectively made no progress. We had one-off ML systems. • We could teach a computer to understand one sentence by showing it that one sentence. • We made no progress, spent a lot of money, and cut the research. It was the death of an era. During that time…
  • 24.
  • 25.
  • 26.
  • 27.
    Thumbs up? Sentiment classification using machinelearning techniques. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
  • 28.
    Sentiment analysis =determine if a piece of text is positive or negative. How do we do it? Well, we map each word to its sentiment and give the words a score. AKA: A Lexicon-based approach Sentiment Analysis
  • 29.
    Word Positivity Great 0.9 Terrible0.1 Alright 0.6 Mediocre 0.4
  • 30.
  • 31.
    Words Positivity Isn’t bad0.6 Isn’t good 0.3 Ain’t half-bad 0.73 Above average 0.7
  • 32.
    “I have tosay, that while most of my experiences at tourists traps have been horrendous, the one I recently went to broke the pattern.” • Many humans can’t figure out the sentiment of this sentence • Gray areas of language = why sentiment analysis is quite a difficult problem for computers to solve
  • 33.
    How do weknow how well we’re doing?
  • 34.
    How do weknow how good AI is?
  • 35.
    • Well, it’shard • Take a spreadsheet • Label each piece of text for positive vs. negative • Guess which words made it positive or negative • Train the model on half of the spreadsheet and then make predictions on the other half Then what.
  • 36.
  • 37.
    Still, it’s notthat simple Performance metrics Overfitting
  • 38.
    Customer Did theybuy? 1 No 2 No 3 No 4 No 5 No 6 Yes 7 No 8 No 9 No 10 No 11 No 12 Yes 13 No 14 No Performance Metrics
  • 39.
    - Accuracy isn’tnecessarily the best performance metric - Predicting sentiment is a very different problem depending on whether the text you’re making predictions on consists of Amazon reviews, tweets, or medical journals - It also depends on how much data you’ve got - When you teach a computer what sentiment is, you end up showing it a huge number of examples. Depending on the data you’ve got, the number of examples you might use range from a few hundred to hundreds of millions - It’s not fair to use those examples to check your model’s accuracy — you already know the answers Performance Metrics
  • 40.
    Learn more aboutsentiment analysis and performance metrics: What Even Is Sentiment Analysis?
  • 41.
    Precision: fraction ofretrieved instances that are relevant Recall: fraction of relevant instances that are retrieved Precision vs Recall
  • 42.
    Overfitting This product leftme with a deep feeling of regret. This film left me with a deep feeling of regret, love, and hopelessness for a life not lived. I #love these new @nike shoes
  • 43.
    Overfitting • Overfitting meansyou “fail to generalise to examples outside of your training set” • In other words…you’re living under a rock. You’re great at recognizing everything under your rock, but you don’t understand the rest of the world • Domain is a factor — there are so many different kinds of text (scientific journal articles vs. tweets) • No one model is going to be the best at every kind of text
  • 44.
    KNOWLEDGE = POWER Emailus: contact@indico.io

Editor's Notes

  • #34 For a more in-depth look at sentiment analysis, see this post: https://indico.io/blog/what-is-sentiment-analysis/
  • #39 Accuracy isn’t necessarily the best performance metric Predicting sentiment is a very different problem depending on whether the text you’re making predictions on consists of Amazon reviews, tweets, or medical journals. It also depends how much data you’ve got. When you teach a computer what sentiment is, you end up showing it a huge number of examples. Depending on the data you’ve got, the number of examples you might use range from a few hundred to hundreds of millions. It’s not fair to use those examples to check your model’s accuracy — you already know the answers
  • #43 Overfitting means you “fail to generalise to examples outside of your training set” In other words…you’re living under a rock. You’re great at recognizing everything under your rock, but you don’t understand the rest of the world Domain is a factor — there are so many different kinds of text (scientific journal articles vs. tweets) No one model is going to be the best at every kind of text
  • #44 Overfitting means you “fail to generalise to examples outside of your training set” In other words…you’re living under a rock. You’re great at recognizing everything under your rock, but you don’t understand the rest of the world Domain is a factor — there are so many different kinds of text (scientific journal articles vs. tweets) No one model is going to be the best at every kind of text