935 speaking characters
111,395 lines
144 female characters
778 male characters
Top 5 Male Characters by Verbiage
1. Gloucester (1)
2. Hamlet (2)
3. Iago (3)
4. Falstaff (4)
5. King Henry V (5)
Top 5 Female Characters by Verbiage
1. Queen Margaret (12)
2. Cleopatra (19)
3. Helena (21)
4. Rosalind (24)
5. Portia (25)
Play Positive Neutral Negative Overall
Macbeth 8.7% 81.2% 10.1% Negative
Much Ado 11.7% 82.4% 5.8% Positive
Henry IV 8.3% 84.8% 6.9% Positive
A Winter’s Tale 9.1% 84.7% 6.2% Positive
Merry Wives 28% 67% 4.3% Positive
Measuring Sentiment in Shakespeare I
Sentiment Analysis Matches Traditional Literary Classification!
Play Positive Neutral Negative Overall
Merry Wives 28% 68.9% 4.2% Positive
Falstaff in
Merry Wives
27% 67% 5% Positive
Henry IV 8.3% 84.8% 6.9% Positive
Falstaff in
Henry IV
8.5% 83.4% 8.2% Positive
Model Name
Hist/Com
Accuracy
Hist/Trag
Acurracy Baseline
Logistic Regression
Classifier
63% 66% 51/48, 61/39
K Nearest
Neighbors
Classifier
52% 52% 51/48, 61/39
Support Vector
Classifier
63% 61% 51/48, 61/39
Random Forest
Classifier
64% 64% 51/48, 61/39
Model Name Training Accuracy Testing Accuracy Baseline
Logistic Regression 66% 66% 51/48
3705 (TP) 2771 (FP)
2185 (FN) 4715 (TN)
Confusion Matrix

The Data Analytics of Shakespeare

  • 6.
    935 speaking characters 111,395lines 144 female characters 778 male characters Top 5 Male Characters by Verbiage 1. Gloucester (1) 2. Hamlet (2) 3. Iago (3) 4. Falstaff (4) 5. King Henry V (5) Top 5 Female Characters by Verbiage 1. Queen Margaret (12) 2. Cleopatra (19) 3. Helena (21) 4. Rosalind (24) 5. Portia (25)
  • 9.
    Play Positive NeutralNegative Overall Macbeth 8.7% 81.2% 10.1% Negative Much Ado 11.7% 82.4% 5.8% Positive Henry IV 8.3% 84.8% 6.9% Positive A Winter’s Tale 9.1% 84.7% 6.2% Positive Merry Wives 28% 67% 4.3% Positive Measuring Sentiment in Shakespeare I Sentiment Analysis Matches Traditional Literary Classification!
  • 10.
    Play Positive NeutralNegative Overall Merry Wives 28% 68.9% 4.2% Positive Falstaff in Merry Wives 27% 67% 5% Positive Henry IV 8.3% 84.8% 6.9% Positive Falstaff in Henry IV 8.5% 83.4% 8.2% Positive
  • 13.
    Model Name Hist/Com Accuracy Hist/Trag Acurracy Baseline LogisticRegression Classifier 63% 66% 51/48, 61/39 K Nearest Neighbors Classifier 52% 52% 51/48, 61/39 Support Vector Classifier 63% 61% 51/48, 61/39 Random Forest Classifier 64% 64% 51/48, 61/39
  • 14.
    Model Name TrainingAccuracy Testing Accuracy Baseline Logistic Regression 66% 66% 51/48 3705 (TP) 2771 (FP) 2185 (FN) 4715 (TN) Confusion Matrix