The document provides an introduction to sentiment analysis. It defines sentiment as the expression of emotions, evaluations, or stances. Sentiment analysis involves classifying text into sentiment values like positive, negative, or neutral based on features like unigrams. The document discusses how sentiment analysis is difficult due to differences in human annotations, and shows precision and recall scores from different student annotations to demonstrate this difficulty. It then provides an overview of the basic process for automatic sentiment analysis using machine learning classifiers trained on annotated datasets.
A confluence of factors have converged to afford the opportunity to apply data science at large scale to agricultural production. The demand for agricultural outputs is growing and there is a need to meet this demand by utilizing increasingly mechanized precision agriculture and enormous data volumes collected to intelligently optimize agriculture outputs. We will consider the machine learning challenges related to optimizing global food production.
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
By David F. Larcker, Stephen A. Miles, and Brian Tayan
Stanford Closer Look Series
Overview:
Shareholders pay considerable attention to the choice of executive selected as the new CEO whenever a change in leadership takes place. However, without an inside look at the leading candidates to assume the CEO role, it is difficult for shareholders to tell whether the board has made the correct choice. In this Closer Look, we examine CEO succession events among the largest 100 companies over a ten-year period to determine what happens to the executives who were not selected (i.e., the “succession losers”) and how they perform relative to those who were selected (the “succession winners”).
We ask:
• Are the executives selected for the CEO role really better than those passed over?
• What are the implications for understanding the labor market for executive talent?
• Are differences in performance due to operating conditions or quality of available talent?
• Are boards better at identifying CEO talent than other research generally suggests?
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
Covering important topics of Classical Machine Learning in 16 hours, in preparation for the following 10 weeks of Deep Learning courses at Taiwan AI academy from 2018/02-2018/05. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
A confluence of factors have converged to afford the opportunity to apply data science at large scale to agricultural production. The demand for agricultural outputs is growing and there is a need to meet this demand by utilizing increasingly mechanized precision agriculture and enormous data volumes collected to intelligently optimize agriculture outputs. We will consider the machine learning challenges related to optimizing global food production.
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
By David F. Larcker, Stephen A. Miles, and Brian Tayan
Stanford Closer Look Series
Overview:
Shareholders pay considerable attention to the choice of executive selected as the new CEO whenever a change in leadership takes place. However, without an inside look at the leading candidates to assume the CEO role, it is difficult for shareholders to tell whether the board has made the correct choice. In this Closer Look, we examine CEO succession events among the largest 100 companies over a ten-year period to determine what happens to the executives who were not selected (i.e., the “succession losers”) and how they perform relative to those who were selected (the “succession winners”).
We ask:
• Are the executives selected for the CEO role really better than those passed over?
• What are the implications for understanding the labor market for executive talent?
• Are differences in performance due to operating conditions or quality of available talent?
• Are boards better at identifying CEO talent than other research generally suggests?
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
Covering important topics of Classical Machine Learning in 16 hours, in preparation for the following 10 weeks of Deep Learning courses at Taiwan AI academy from 2018/02-2018/05. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
It's Not Magic - Explaining classification algorithmsBrian Lange
As organizations increasingly leverage data and machine learning methods, people throughout those organizations need to build a basic "data literacy" in those topics. In this session, data scientist and instructor Brian Lange provides simple, visual, and equation free explanations for a variety of classification algorithms, geared towards helping anyone understand how they work. Now with Python code examples!
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
This material is a part of PGPSE / CSE study material for the students of PGPSE / CSE students. PGPSE is a free online programme for all those who want to be social entrepreneurs / entrepreneurs
The slides include a condensed explanation of Transformers and their advantages in compared to CNN and RNN. The presentation begins with a brief explanation of Transformers. Then, the advantages and disadvantages of Transformers relative to CNNs and RNNs are discussed. The attention mechanism is next presented, followed by an illustration of the structure of the document "Attention all you need."
It's Not Magic - Explaining classification algorithmsBrian Lange
As organizations increasingly leverage data and machine learning methods, people throughout those organizations need to build a basic "data literacy" in those topics. In this session, data scientist and instructor Brian Lange provides simple, visual, and equation free explanations for a variety of classification algorithms, geared towards helping anyone understand how they work. Now with Python code examples!
Statistical inference: Probability and DistributionEugene Yan Ziyou
This deck was used in the IDA facilitation of the John Hopkins' Data Science Specialization course for Statistical Inference. It covers the topics in week 1 (probability) and week 2 (distribution).
This material is a part of PGPSE / CSE study material for the students of PGPSE / CSE students. PGPSE is a free online programme for all those who want to be social entrepreneurs / entrepreneurs
The slides include a condensed explanation of Transformers and their advantages in compared to CNN and RNN. The presentation begins with a brief explanation of Transformers. Then, the advantages and disadvantages of Transformers relative to CNNs and RNNs are discussed. The attention mechanism is next presented, followed by an illustration of the structure of the document "Attention all you need."
2. What
is
sen.ment?
Expression
of:
-‐
an
emo.on
(I
am
happy)
-‐
an
evalua.on
(Great
idea!)
-‐
a
stance
(I
support
the
bill)
3. What
is
sen.ment?
Expression
of:
-‐
an
emo.on
(I
am
happy)
-‐
an
evalua.on
(Great
idea!)
-‐
a
stance
(I
support
the
bill)
Involves
a
perspec.ve,
a
target
(named
en..es)
and
a
sen.ment
value.
Kermit
was
thrilled
about
the
idea!
4. Sen.ment
analysis
is
difficult!!
Sen$ment
Precision
Recall
Nega.ve
71%
90%
Neutral
96%
87%
Posi.ve
77%
92%
Sen$ment
Precision
Recall
Nega.ve
88%
66%
Neutral
86%
97%
Posi.ve
91%
65%
Student
1:
Sen$ment
Precision
Recall
Nega.ve
79%
91%
Neutral
96%
90%
Posi.ve
80%
92%
Student
2:
Student
3:
71%
of
the
men.ons
labeled
“Nega.ve”
by
student
1
were
also
labeled
“Nega.ve”
by
student
2
or
3
(or
both)
29%
of
the
men.ons
labeled
“Nega.ve”
by
student
1
were
labeled
neutral
(or
posi.ve)
by
both
the
other
students.
5. Sen.ment
analysis
is
difficult!!
Sen$ment
Precision
Recall
Nega.ve
71%
90%
Neutral
96%
87%
Posi.ve
77%
92%
Sen$ment
Precision
Recall
Nega.ve
88%
66%
Neutral
86%
97%
Posi.ve
91%
65%
Student
1:
Sen$ment
Precision
Recall
Nega.ve
79%
91%
Neutral
96%
90%
Posi.ve
80%
92%
Student
2:
Student
3:
66%
of
the
men.ons
labeled
“Nega.ve”
by
student
1
or
2
(or
both)
were
also
labeled
“Nega.ve”
by
student
3
34%
of
the
men.ons
labeled
“Nega.ve”
by
student
1
and
2
were
not
labeled
“Nega.ve”
by
student
3
6. Sen.ment
analysis
is
difficult!!
Sen$ment
Precision
Recall
Nega.ve
71%
90%
Neutral
96%
87%
Posi.ve
77%
92%
Sen$ment
Precision
Recall
Nega.ve
88%
66%
Neutral
86%
97%
Posi.ve
91%
65%
Student
1:
Sen$ment
Precision
Recall
Nega.ve
79%
91%
Neutral
96%
90%
Posi.ve
80%
92%
Student
2:
Student
3:
Neutral
is
“easy”
because
70%
of
all
men.ons
are
neutral
Thus,
always
saying
“Neutral”
will
be
correct
70%
of
the
.me
and
lets
you
recall
100%
of
the
neutral
messages
7. Sen.ment
analysis
is
difficult!!
#tvvv
neeeeee
:(
domien
is
out
;o
ik
blijf
vanje
houden
domien!
Eindelijk
verlost
van
@belgacom!
Surfen
gaat
een
pak
vlo?er
met
@telenet
:-‐)
8. Sen.ment
analysis
is
difficult!!
#tvvv
neeeeee
:(
domien
is
out
;o
ik
blijf
vanje
houden
domien!
ບ"ມ$ຕ&ນໄມ)ຖ+ກອອກoຂ)າພະເຈ&າຍ5ງຮ5ກທ9ານເປ5ນຕ&ນໄມ)!
Eindelijk
verlost
van
@belgacom!
Surfen
gaat
een
pak
vlo?er
met
@telenet
:-‐)
ສ<ດທ)າຍຈາກຕ&ນໄມ)ເກມບ>ນແມ9ນ@າຍຂAນໄວທCມ$ປ9າໄມ)
9. Automa.c
Sen.ment
Analysis
Basic
strategy
Human
annota.on
Features
(unigrams)
Label/
Ac.on/
predic.on
Men.on
Tokeniza.on,
POS
taging,
…
Learning
Classifier
Model:
Feature-‐weights
per
class
(“count
table”)
(1)
Training
phase
10. Features
(unigrams)
Men.on
Tokeniza.on,
POS
taging,
…
classifica.on
Classifier
Model:
Feature-‐weights
per
class
(“count
table”)
(2)
Opera.onal
phase
Label/
Ac.on/
predic.on
Automa.c
Sen.ment
Analysis
Basic
strategy
11. Automa.c
Sen.ment
Analysis
Training
Set:
neeeeee
:(
domien
is
out
=
NegaDve
ik
blijf
vanje
houden
domien!
=
PosiDve
eindelijk
verlost
van
@belgacom!
=
NegaDve
surfen
gaat
een
pak
vlo?er
met
@telenet
:-‐)
=
PosiDve
…
=
…
12. “Bag
of
Words”
“neeeeee
:(
domien
is
out”
=
PosiDve
{“domien”,
“is”,
“neeeeee”,
“out”,
“:(“}
=
PosiDve
14. Bayes
rule
of
condi.onal
probabili.es:
P[Nega.ve]
x
P[“ik
ben
blij”
|
Nega.ve]
P[
Nega.ve|
“ik
ben
blij”]
=
P[“ik
ben
blij”]
P[“ik
ben
blij”
|
Neg.]
=
P[“ik”
|
nega.ve]
(unigram)
x
P[“ben”
|
Neg.,
“ik”]
(bigram)
x
P[“blij”
|
Neg.,
“ik
ben”
]
(trigram)
Evidence
(same
for
all
senDments)
Prior
(over
all
menDons)
likelihood
Chain
rule:
15. Naïve
Bayes
approxima.on
P[
Neg.|
“ik
ben
blij”]
=
P[Neg.]
x
P[“ik”
|
Neg.]
x
P[“ben”
|
Neg.]
x
P[“blij”
|
Neg.]
P[Pos.
|
“ik
ben
blij”]
=
P[Pos.]
x
P[“ik”
|
Pos.]
x
P[“ben”
|
Pos.]
x
P[“blij”
|
Pos.]
“Posi.ve”
if
P[Pos.
|
“ik
ben
blij”]
>
P[Neg.
|
“ik
ben
blij”
]
From
unigram
counts
table
Classifica.on
Algorithm:
16. Improvements
over
Naïve
Bayes
-‐ Beoer
features:
-‐ Bigrams,
trigrams,
-‐ Parts
of
speech
-‐ Tf/idf
weigh.ng
-‐ Gramma.cal
dependencies
(e.g.
nega.on
marking)
-‐ Named
en..es
-‐ Alterna.ve
strategies
to
calculate
feature
weights
from
counts
-‐ Transformed
Normalized
Weighted
Naïve
Bayes
-‐ Mutual
Informa.on
-‐ Maximum
entropy
-‐ Other
approaches
-‐ Sen.ment
lexicons
(cf.
current
classifier)
17. Evalua.on
-‐ In
terms
of
Precision,
Recall,
F1,
Accuracy,
…
-‐ Very
good
on
“simple”
tasks
(comparable
to
humans)
-‐ e.g.
spam
detec.on
-‐ In
general,
tasks
for
which
grammar
and
context
are
not
important
(nega.on,
source/target/perspec.ve
roles,
…)
-‐ But
rather
bad
on
“difficult”
tasks,
including
sen.ment
analysis
(worse
than
humans)
19. Many
unresolved
issues…
-‐ Other
languages
(Unsupervised
learning/bootstrapping)
-‐ Source/Target
resolu.on
-‐ Classifiers
trained
on
one
dataset/topic
does
not
perform
well
on
other
datasets/topics
-‐ …
20. …and
opportuni.es
Many
informa.on
extrac.on
problems
can
be
cast
as
classifica.on
problems
-‐ Assigning
tags
to
men.ons
-‐ Predic.ng
the
number
of
likes/retweets/…
of
men.ons
-‐ Deciding
whom
to
send/assign
a
message
-‐ …
-‐ In
general,
any
problem
where
things
must
be
“labeled”,
“decided”
or
“predicted”,
with
a
limited
number
of
alterna.ves,
and
for
which
training
data
is
available
(can
be
user
feedback!)
-‐ And
our
users
generate
massive
amounts
of
data!!
à
don’t
hesitate
to
discuss
ideas
with
me!
ß
21.
22.
23. Part
2:
Clojure
-‐ Dynamic
programming
language
targe.ng
the
JVM
(and
javascript)
-‐ Combining
interac.ve
development
of
scrip.ng
language
with
efficient
and
robust
infrastructure
for
mul.threaded
programming
-‐
-‐ Lisp
dialect:
-‐ (almost)
no
syntax
(+
1
2)
=>
3
(list
‘+
1
2)
=>
(+
1
2)
-‐ Code
as
data
(eval
(list
‘+
1
2))
=>
3
24. Part
2:
Clojure
-‐ Project
management
through
“leiningen”
-‐ bash$
lein
new
test-‐project
-‐ Add
dependencies
to
project.clj,
add
code
to
src/test-‐project
-‐ bash$
lein
uberjar
=>
testproject.jar
-‐ Java
–jar
test-‐project.jar
-‐ Online
demo…