An Analysis of Machine- and Human-
Analytics in Classification
Authors:
1. Gary K. L. Tam (Swansea University)
2. Vivek Kothari (University of Oxford)
3. Min Chen (University of Oxford)
Presented by:
Subhashis Hazarika
(Ohio State University)
Major Contribution
• An information-theoretic model that explains why a human driven visual
analytic model of classification performs better than a purely machine-
learning model.
Overview
• Consider two classification case studies.
• Create a decision tree classifier applying standard ML algorithms.
• Create a decision tree classifier using visual analytics guided by “soft
knowledge” of a human model-developer.[1]
• Using Information theory explain why the human centric approach
performs better than the ML approach.
• Quantify the “soft knowledge” that a human centric approach takes
advantage of.
[1]: “Visualization of Time-Series Data in Parameter Space for Understanding Facial Dynamics”, G.K.L. Tam, H. Fang, A. J.
Aubrey, P.W. Grant, D. Marshall, M. Chen. Eurovis2011.
Case Study A (Facial Dynamics Data)
• Input and feature extraction:
– 68 raw facial videos classified as one of the four (smile, sadness, surprise, anger).
– For each video, extracted 14 time series representing different temporal facial features.
– For each time series, 23 quantitative measures were obtained.
– Resulting in 14x23 attributes/features per video.
• Create Decision Tree using a Parallel Coordinates based visual analytics
system.
• Create a Decision Tree with standard ML algorithms (C4.5 or CART).
A: Visual Analytics Approach
A: Interactive Visualization
A: Outliers and anomalies
A: Building the D-tree
A: VA v/s ML
Case Study B (Visualization Image Classification)
• Input and feature extraction:
– 4x49 jpeg images classified as (bubble-chart, treemap, parallel-coordinate, bar-graphs).
– For each image, extracted 222 features via. different image classification and clustering .
• Create Decision Tree using a Parallel Coordinates based visual analytics
system.
• Create a Decision Tree with standard ML algorithms (C4.5 or CART).
B: Building the D-Tree
B: Comparative Evaluation
The Team
• Case Study A:
– Conducted by 7 researchers with expertise in vision, visual analytics, computer graphics
and machine learning.
– Human-centric D-Tree was constructed by a researcher who was specialized in graphics
and acquired the knowledge of computer vision and visual analytics during the project.
• Case Study B:
– Conducted by 2 researchers with expertise in image processing and visual analytics.
– Human-centric D-Tree was constructed by a researcher with 8 months of experience in
visual analytics.
But Why? Some Empirical Observations
• O1: Overview and Axis Distribution.
– A machine-centric approach examines many cut positions on all the axis and greedily
picks the cut with the highest quality measure.
– While a human model developer usually first obtains a general overview of the data and
identifies important axes with promising patterns before paying detailed attention to
these axes.
• O2: General Agreement amongst Statistics.
– ML algorithms only use one metric to determine the cut.
– HC approach can evaluate more than one statistics to decide the cut.
• O3: Look-ahead.
– Humans’ insights into the consequence often influences the current decision.
– Humans’ look-ahead ability enables multi-step judgement, while the ML algorithms
focused only on the current decisions.
But Why? Some Empirical Observations
• O4: Outliers.
– If possible model developers avoid axes with outliers, as they may be unreliable.
– Such reasoning is not available in the ML algorithms.
• O5: Cut Positions on an Axis.
– Humans’ look for a cut or cuts that would allow each class to expand beyond the current
instance in the training set.
– ML algorithms decide the cuts at the very edges of a particular class.
• O6: Human (Domain) Knowledge.
– Humans’ incorporate their domain knowledge into their model construction process.
Information Flow
Information Theoretic Analysis
• Estimated World Population : 7.4 billion
• Consider each person have 5 variations for each of the 4 expressions.
• The number of possible scenarios to capture : 148 billion
• The maximal entropy is 37.1 bits.
• We only know 68 cases(the raw training video)
• That is 1.7 x 10-8 bits. (a drop in the ocean)
• [ML] Optimistically, assuming the categorization retains 50% mutual
information. That leaves us with 8.5 x 10-9 bits of information.
Information Theoretic Analysis
• [VA] Model developer may know some 200 people reasonably well, and
can recall their 5 variations of 4 expressions at ease. Conservatively, that is
equivalent to 4068 videos instead of 68. Representing 1.0 x 10-6 bits of
known information.
• [VA]When given an arbitrary facial image, the developer can also
reconstruct an expression using imagination e.g at least 1 variation per
expression. This ability accounts to 29.6 billion videos, representing 7.4
bits of known information. (This ability shows up in determining outliers).
• 7.4 bits v/s 8.5 x 10-9 bits . That is roughly 871 million times more
information content.
Soft Knowledge and Soft Models
• Soft Knowledge: The uncaptured information not available to the
machine-centric approach.
• Soft Model: The models which make decisions based on soft knowledge.
• Examples:
1. Given a facial photo (input), imagine how the person would smile (output).
2. Given a video (input), determine if it is an outlier (output).
3. Given a set of points on an axis (input), decide how many cuts and where they are
(output).
Soft Models
Conclusion
• There is an overwhelming amount of information available to the human-
centric approach in the form of soft knowledge that can’t be utilized by a
machine-centric approach.
• It is necessary to understand and quantify the information flow in both
machine- and human-centric approaches to help design a mixed model
performing an much better job.
• Human model developer can never by cast aside.

An analysis of_machine_and_human_analytics_in_classification

  • 1.
    An Analysis ofMachine- and Human- Analytics in Classification Authors: 1. Gary K. L. Tam (Swansea University) 2. Vivek Kothari (University of Oxford) 3. Min Chen (University of Oxford) Presented by: Subhashis Hazarika (Ohio State University)
  • 2.
    Major Contribution • Aninformation-theoretic model that explains why a human driven visual analytic model of classification performs better than a purely machine- learning model.
  • 3.
    Overview • Consider twoclassification case studies. • Create a decision tree classifier applying standard ML algorithms. • Create a decision tree classifier using visual analytics guided by “soft knowledge” of a human model-developer.[1] • Using Information theory explain why the human centric approach performs better than the ML approach. • Quantify the “soft knowledge” that a human centric approach takes advantage of. [1]: “Visualization of Time-Series Data in Parameter Space for Understanding Facial Dynamics”, G.K.L. Tam, H. Fang, A. J. Aubrey, P.W. Grant, D. Marshall, M. Chen. Eurovis2011.
  • 4.
    Case Study A(Facial Dynamics Data) • Input and feature extraction: – 68 raw facial videos classified as one of the four (smile, sadness, surprise, anger). – For each video, extracted 14 time series representing different temporal facial features. – For each time series, 23 quantitative measures were obtained. – Resulting in 14x23 attributes/features per video. • Create Decision Tree using a Parallel Coordinates based visual analytics system. • Create a Decision Tree with standard ML algorithms (C4.5 or CART).
  • 5.
  • 6.
  • 7.
    A: Outliers andanomalies
  • 8.
  • 9.
  • 10.
    Case Study B(Visualization Image Classification) • Input and feature extraction: – 4x49 jpeg images classified as (bubble-chart, treemap, parallel-coordinate, bar-graphs). – For each image, extracted 222 features via. different image classification and clustering . • Create Decision Tree using a Parallel Coordinates based visual analytics system. • Create a Decision Tree with standard ML algorithms (C4.5 or CART).
  • 11.
  • 12.
  • 13.
    The Team • CaseStudy A: – Conducted by 7 researchers with expertise in vision, visual analytics, computer graphics and machine learning. – Human-centric D-Tree was constructed by a researcher who was specialized in graphics and acquired the knowledge of computer vision and visual analytics during the project. • Case Study B: – Conducted by 2 researchers with expertise in image processing and visual analytics. – Human-centric D-Tree was constructed by a researcher with 8 months of experience in visual analytics.
  • 14.
    But Why? SomeEmpirical Observations • O1: Overview and Axis Distribution. – A machine-centric approach examines many cut positions on all the axis and greedily picks the cut with the highest quality measure. – While a human model developer usually first obtains a general overview of the data and identifies important axes with promising patterns before paying detailed attention to these axes. • O2: General Agreement amongst Statistics. – ML algorithms only use one metric to determine the cut. – HC approach can evaluate more than one statistics to decide the cut. • O3: Look-ahead. – Humans’ insights into the consequence often influences the current decision. – Humans’ look-ahead ability enables multi-step judgement, while the ML algorithms focused only on the current decisions.
  • 15.
    But Why? SomeEmpirical Observations • O4: Outliers. – If possible model developers avoid axes with outliers, as they may be unreliable. – Such reasoning is not available in the ML algorithms. • O5: Cut Positions on an Axis. – Humans’ look for a cut or cuts that would allow each class to expand beyond the current instance in the training set. – ML algorithms decide the cuts at the very edges of a particular class. • O6: Human (Domain) Knowledge. – Humans’ incorporate their domain knowledge into their model construction process.
  • 16.
  • 17.
    Information Theoretic Analysis •Estimated World Population : 7.4 billion • Consider each person have 5 variations for each of the 4 expressions. • The number of possible scenarios to capture : 148 billion • The maximal entropy is 37.1 bits. • We only know 68 cases(the raw training video) • That is 1.7 x 10-8 bits. (a drop in the ocean) • [ML] Optimistically, assuming the categorization retains 50% mutual information. That leaves us with 8.5 x 10-9 bits of information.
  • 18.
    Information Theoretic Analysis •[VA] Model developer may know some 200 people reasonably well, and can recall their 5 variations of 4 expressions at ease. Conservatively, that is equivalent to 4068 videos instead of 68. Representing 1.0 x 10-6 bits of known information. • [VA]When given an arbitrary facial image, the developer can also reconstruct an expression using imagination e.g at least 1 variation per expression. This ability accounts to 29.6 billion videos, representing 7.4 bits of known information. (This ability shows up in determining outliers). • 7.4 bits v/s 8.5 x 10-9 bits . That is roughly 871 million times more information content.
  • 19.
    Soft Knowledge andSoft Models • Soft Knowledge: The uncaptured information not available to the machine-centric approach. • Soft Model: The models which make decisions based on soft knowledge. • Examples: 1. Given a facial photo (input), imagine how the person would smile (output). 2. Given a video (input), determine if it is an outlier (output). 3. Given a set of points on an axis (input), decide how many cuts and where they are (output).
  • 20.
  • 21.
    Conclusion • There isan overwhelming amount of information available to the human- centric approach in the form of soft knowledge that can’t be utilized by a machine-centric approach. • It is necessary to understand and quantify the information flow in both machine- and human-centric approaches to help design a mixed model performing an much better job. • Human model developer can never by cast aside.