Deep Learning
Graph Models for Deep Learning
Stephen Donald Huff,
PhD
• Title of Course:
o “Graph Models for Deep Learning”
• Target Audience:
o Information Technology Professionals with a Basic Knowledge
of Graph Theory and Its Practical Applications;
Topics
Prerequisites
• Basic Understanding of Application-Specific
(“Traditional” or “Rule-Based”) Methods for
Statistical Analysis;
• Basic Understanding of Deep Learning Theory
(Graph/Network Theory);
• Basic General Understanding of Information
Technology;
Optional Course requisite(s):
• Basic Working Knowledge of Python Programming
• Primary Skills Developed
o Improved Ability to Discriminate, Differentiate and Conceptualize Appropriate
Implementations of Application-Specific (“Traditional” or “Rule-Based”)
Methods versus Deep Learning Methods of Statistical Analyses and Data
Modeling;
o Improved General Understanding of Graph Models as Deep Learning
Concepts;
o State-of-the-Art Awareness of Deep Learning Applications within the Fields of
Character Recognition and Computer Vision;
• Secondary Skills Developed:
o Basic working knowledge of the Python-based manipulation of Keras,
Microsoft Cognitive Toolkit, Theano and TensorFlow deep learning platforms;
o Basic ability to compare/contrast similar implementations of practical, graph-
based solutions in Keras using Microsoft Cognitive Toolkit, Theano and/or
TensorFlow back-end systems;
MODULE 1
Introduction, Review of Background Concepts and
Technological Context
Introduction:
optimal course outcomes and focus
• Generally inform the student regarding state-of-the-art in deep
learning
• Specifically inform the student regarding the role of graph models
within the realm of deep learning, including background concepts
and current technological state of the art
• Provide basic awareness of Python-based implementations of
various deep learning systems in Keras, using Microsoft Cognitive
Toolkit, Theano and TensorFlow as back-end platforms (the actual
“deep learning software”)
Review of Background Concepts:
Statistical inference and statistical models
• Uses data analysis to deduce properties of underlying probability distributions
• Infers population properties, assuming observed data sample from larger set
• Contrasts with descriptive statistics
o Strictly concerned with properties of observed data
o Does not assume the data come from a larger population
• Three levels of modeling assumptions
o Fully parametric
• Data generated by a process fully described by probability distributions with a finite number of
unknown parameters
• E.g., generalized linear models
o Non-parametric
• Minimal assumptions about the data generating process
• E.g., data generated in a simple random sample (a continuous probability distribution), which will have a
median value easily calculated from known features of the data set
o Semi-parametric
• Hybrid solution, models separated into “structural” and “random” variation components
• One component is treated parametrically and the other non-parametrically
Review of Background Concepts:
Relevant Application-Specific (“Traditional” or “Rule-Based”)
Methods of Statistical Analysis (with Specific Focus on “Big Data”
Problems)
• General Development Paradigm: One Problem -> One Solution/Model ->
One Algorithm
• Reduced utility when applied to “Big Data” and/or multivariate problems
• Example: linear Regression
o Examples: ANOVA, ANCOVA, MANOVA, MANCOVA, t-test and F-test
o Requires multiple assumptions (often violated in practice)
o Models/predicts a scalar response (dependent variable) to one explanatory variable
(independent variable)
• Example: multiple linear regression
o Simple regression with multiple explanatory variables
• Example: multivariate linear regression
o Model predicts multiple correlated response (dependent) variables
Review of Background Concepts:
o Example: non-linear regression (curve-fitting)
• Examples: exponential-, logarithmic-, trigonometric-, power- and Gaussian functions; Lorenz curves
• Numerical optimization algorithms often required
» Many potential local/global minima, optimization reduces bias (acceptance of local vs. global minima)
» Estimated parameter values applied via optimization algorithm (input/output less objective)
o Example: logistic regression (estimation of logistic model parameters)
• A logistic (logit) model usually applies to binary dependent variable(s)
» Log-odds of event probability (often binary, labelled "0"/"1" - pass/fail, win/lose, alive/dead, & etc...) is a linear
combination of predictor (independent) variables
o Example: decision trees (decision support tools, flow charts)
• Use of a tree-like graph/model of decisions and associated consequences
» Visualization of algorithms that only contain conditional control statements (e.g., IF/THEN and looping statements)
» Contemporarily used in operations research, specifically in decision analysis, to identify a strategy most likely to reach
specific a goal
• Also used in machine learning applications
Technological Context:
Predictive Modeling and Machine Learning
• Synonymous/overlaps with machine learning; commercially referred to as
“predictive analytics”
• Examples (Predictors and Classifiers)
o GLM (Generalized Linear
Models)
o Logistic regression
o Generalized additive models
o Robust regression
o Semiparametric regression
o Ordinary Least Squares
o Random forests
o Boosted trees
o Naive Bayes
o k-nearest neighbor algorithm
o Majority classifier
o Support vector machines
o CART (Classification and
Regression Trees)
o MARS (Multivariate Adaptive
Regression Splines)
o ACE and AVAS (Alternating
Conditional Expectations) and
(Additivity and Variance
Stabilization)
o Neural Networks
Technological Context:
Graph Theory and Neural Networks
• Graphs are mathematical structures that model pairwise relations between entities
o Comprised of vertices (or “nodes” or “points”) connected by edges (or “arcs” or “lines”)
o “Undirected” graphs have two-way edges connecting its vertices
o “Directed” graphs have one-way edges connecting its vertices
• Application: linguistics
o Language syntax and compositional semantics modeled as hierarchical graphs
• Application: physics and chemistry
o Quantitatively modelling of three-dimensional structures of complex atomic structures according to atomic topologies
o Structural modeling of atoms and bonds within molecules
• Application: social sciences
o Quantitative and qualitative exploration of social phenomena (interpersonal relationships, personality popularity, rumor spreading, acquaintanceship/friendship
matrices, etc...)
• Application: biology
o Ecological niche modeling, migration modeling, epidemiology
• Application: mathematics
o Geometric operations, topological studies (knot theory), many, many others
• Application: computer science
o Representation, modeling, visualization of networks (routing table optimization, website link mapping and quantification, social media analyses, and many, many
more)
• Neural networks are a popular example, loosely based on biology
o The variety and utility of these graph-based solutions is rapidly advancing and expanding in many directions; “Deep Learning” is a popular variant
Deep Learning (and Graph Models)
• Deep learning methods “learn data representations”
o As opposed to manually-developed, task-specific (“traditional” or “rule-based”) algorithms, summarized above
• Utilize “cascading” graphs, multiple layers of hierarchically connected nonlinear processing units (“a
network of neural networks”)
o Each successive layer uses the output from the previous layer as input (hence, the term “cascade”)
• Typically used for feature extraction and transformation
o Review: as opposed to engineered features, which typically require abundant intervention during development
• Learns multiple levels of data representations corresponding to different levels of abstraction
o These levels form a hierarchy of concepts, as described by graph (network) architecture
• General goal is identification of informative, discriminating and independent data “features” (to support
visualization, pattern-recognition, structural analyses, etc...)
o A feature is an individual, measurable property/characteristic of an observed phenomenon
o Without “data learning” methods, features must be engineered (e.g., development of characteristic-templates),
which is costly and time-consuming
Deep Learning (and Graph Models)
• Requires “training” – learning methods may be supervised (e.g., classification) and unsupervised (e.g.,
pattern analysis), or both (semi-supervised)
o Supervised feature learning requires labeled input data
• Supervised neural networks, multilayer-perceptrons, supervised dictionary learning, etc...
o Unsupervised feature learning uses unlabeled data
• Dictionary learning, independent component analysis, autoencoders, matrix factorization, clustering methods, etc...
• Multiple contemporary architectures (partial list of a growing enumeration)
o Deep neural networks
o Deep belief networks
o Recurrent neural networks
o Etc...
• Multiple contemporary applications (partial list of a growing enumeration, now including deployments
to personal devices)
o Bioinformatics/drug design
o Gaming (occasionally outclassing human experts)
o Natural language processing
o Social network filtering
o Machine translation
o Speech recognition
o Audio recognition
o Computer vision
o Etc...
Relevant code samples
• Keras overhead code with underlying language-specific implementations
of salient concepts in Microsoft Cognitive Toolkit, Theano and TensorFlow
Lesson Conclusion
• Traditional (application-specific or rule-based) algorithms become unwieldy and untenable as
solutions to many “big data” problems
• Advanced statistical methods remain relevant to some “big data” problems and some still out-
perform AI/ML (Artificial Intelligence/Machine Learning) methods, however many complex
problems cannot be directly analyzed using these methods;
• Graph-based methods provide a flexible, robust means of modeling many complex and otherwise
intractable problems
• Deep learning methods represent a promising new frontier within the realm of data science,
especially as applied to “big data” problems
• Graph theory describes the underlying structures supporting contemporary deep learning
technology
• This course will provide general insight into the deep learning domain while specifically focusing on
the role of graph models within these systems as presented, for basic awareness, via Python-based
implementations of these protocols within a Keras environment, backed by Microsoft Cognitive
Toolkit, Theano and TensorFlow systems, with relevant code-based examples

Graph Models for Deep Learning

  • 2.
    Deep Learning Graph Modelsfor Deep Learning Stephen Donald Huff, PhD
  • 3.
    • Title ofCourse: o “Graph Models for Deep Learning” • Target Audience: o Information Technology Professionals with a Basic Knowledge of Graph Theory and Its Practical Applications; Topics
  • 4.
    Prerequisites • Basic Understandingof Application-Specific (“Traditional” or “Rule-Based”) Methods for Statistical Analysis; • Basic Understanding of Deep Learning Theory (Graph/Network Theory); • Basic General Understanding of Information Technology;
  • 5.
    Optional Course requisite(s): •Basic Working Knowledge of Python Programming • Primary Skills Developed o Improved Ability to Discriminate, Differentiate and Conceptualize Appropriate Implementations of Application-Specific (“Traditional” or “Rule-Based”) Methods versus Deep Learning Methods of Statistical Analyses and Data Modeling; o Improved General Understanding of Graph Models as Deep Learning Concepts; o State-of-the-Art Awareness of Deep Learning Applications within the Fields of Character Recognition and Computer Vision; • Secondary Skills Developed: o Basic working knowledge of the Python-based manipulation of Keras, Microsoft Cognitive Toolkit, Theano and TensorFlow deep learning platforms; o Basic ability to compare/contrast similar implementations of practical, graph- based solutions in Keras using Microsoft Cognitive Toolkit, Theano and/or TensorFlow back-end systems;
  • 6.
    MODULE 1 Introduction, Reviewof Background Concepts and Technological Context
  • 7.
    Introduction: optimal course outcomesand focus • Generally inform the student regarding state-of-the-art in deep learning • Specifically inform the student regarding the role of graph models within the realm of deep learning, including background concepts and current technological state of the art • Provide basic awareness of Python-based implementations of various deep learning systems in Keras, using Microsoft Cognitive Toolkit, Theano and TensorFlow as back-end platforms (the actual “deep learning software”)
  • 8.
    Review of BackgroundConcepts: Statistical inference and statistical models • Uses data analysis to deduce properties of underlying probability distributions • Infers population properties, assuming observed data sample from larger set • Contrasts with descriptive statistics o Strictly concerned with properties of observed data o Does not assume the data come from a larger population • Three levels of modeling assumptions o Fully parametric • Data generated by a process fully described by probability distributions with a finite number of unknown parameters • E.g., generalized linear models o Non-parametric • Minimal assumptions about the data generating process • E.g., data generated in a simple random sample (a continuous probability distribution), which will have a median value easily calculated from known features of the data set o Semi-parametric • Hybrid solution, models separated into “structural” and “random” variation components • One component is treated parametrically and the other non-parametrically
  • 9.
    Review of BackgroundConcepts: Relevant Application-Specific (“Traditional” or “Rule-Based”) Methods of Statistical Analysis (with Specific Focus on “Big Data” Problems) • General Development Paradigm: One Problem -> One Solution/Model -> One Algorithm • Reduced utility when applied to “Big Data” and/or multivariate problems • Example: linear Regression o Examples: ANOVA, ANCOVA, MANOVA, MANCOVA, t-test and F-test o Requires multiple assumptions (often violated in practice) o Models/predicts a scalar response (dependent variable) to one explanatory variable (independent variable) • Example: multiple linear regression o Simple regression with multiple explanatory variables • Example: multivariate linear regression o Model predicts multiple correlated response (dependent) variables
  • 10.
    Review of BackgroundConcepts: o Example: non-linear regression (curve-fitting) • Examples: exponential-, logarithmic-, trigonometric-, power- and Gaussian functions; Lorenz curves • Numerical optimization algorithms often required » Many potential local/global minima, optimization reduces bias (acceptance of local vs. global minima) » Estimated parameter values applied via optimization algorithm (input/output less objective) o Example: logistic regression (estimation of logistic model parameters) • A logistic (logit) model usually applies to binary dependent variable(s) » Log-odds of event probability (often binary, labelled "0"/"1" - pass/fail, win/lose, alive/dead, & etc...) is a linear combination of predictor (independent) variables o Example: decision trees (decision support tools, flow charts) • Use of a tree-like graph/model of decisions and associated consequences » Visualization of algorithms that only contain conditional control statements (e.g., IF/THEN and looping statements) » Contemporarily used in operations research, specifically in decision analysis, to identify a strategy most likely to reach specific a goal • Also used in machine learning applications
  • 11.
    Technological Context: Predictive Modelingand Machine Learning • Synonymous/overlaps with machine learning; commercially referred to as “predictive analytics” • Examples (Predictors and Classifiers) o GLM (Generalized Linear Models) o Logistic regression o Generalized additive models o Robust regression o Semiparametric regression o Ordinary Least Squares o Random forests o Boosted trees o Naive Bayes o k-nearest neighbor algorithm o Majority classifier o Support vector machines o CART (Classification and Regression Trees) o MARS (Multivariate Adaptive Regression Splines) o ACE and AVAS (Alternating Conditional Expectations) and (Additivity and Variance Stabilization) o Neural Networks
  • 12.
    Technological Context: Graph Theoryand Neural Networks • Graphs are mathematical structures that model pairwise relations between entities o Comprised of vertices (or “nodes” or “points”) connected by edges (or “arcs” or “lines”) o “Undirected” graphs have two-way edges connecting its vertices o “Directed” graphs have one-way edges connecting its vertices • Application: linguistics o Language syntax and compositional semantics modeled as hierarchical graphs • Application: physics and chemistry o Quantitatively modelling of three-dimensional structures of complex atomic structures according to atomic topologies o Structural modeling of atoms and bonds within molecules • Application: social sciences o Quantitative and qualitative exploration of social phenomena (interpersonal relationships, personality popularity, rumor spreading, acquaintanceship/friendship matrices, etc...) • Application: biology o Ecological niche modeling, migration modeling, epidemiology • Application: mathematics o Geometric operations, topological studies (knot theory), many, many others • Application: computer science o Representation, modeling, visualization of networks (routing table optimization, website link mapping and quantification, social media analyses, and many, many more) • Neural networks are a popular example, loosely based on biology o The variety and utility of these graph-based solutions is rapidly advancing and expanding in many directions; “Deep Learning” is a popular variant
  • 13.
    Deep Learning (andGraph Models) • Deep learning methods “learn data representations” o As opposed to manually-developed, task-specific (“traditional” or “rule-based”) algorithms, summarized above • Utilize “cascading” graphs, multiple layers of hierarchically connected nonlinear processing units (“a network of neural networks”) o Each successive layer uses the output from the previous layer as input (hence, the term “cascade”) • Typically used for feature extraction and transformation o Review: as opposed to engineered features, which typically require abundant intervention during development • Learns multiple levels of data representations corresponding to different levels of abstraction o These levels form a hierarchy of concepts, as described by graph (network) architecture • General goal is identification of informative, discriminating and independent data “features” (to support visualization, pattern-recognition, structural analyses, etc...) o A feature is an individual, measurable property/characteristic of an observed phenomenon o Without “data learning” methods, features must be engineered (e.g., development of characteristic-templates), which is costly and time-consuming
  • 14.
    Deep Learning (andGraph Models) • Requires “training” – learning methods may be supervised (e.g., classification) and unsupervised (e.g., pattern analysis), or both (semi-supervised) o Supervised feature learning requires labeled input data • Supervised neural networks, multilayer-perceptrons, supervised dictionary learning, etc... o Unsupervised feature learning uses unlabeled data • Dictionary learning, independent component analysis, autoencoders, matrix factorization, clustering methods, etc... • Multiple contemporary architectures (partial list of a growing enumeration) o Deep neural networks o Deep belief networks o Recurrent neural networks o Etc... • Multiple contemporary applications (partial list of a growing enumeration, now including deployments to personal devices) o Bioinformatics/drug design o Gaming (occasionally outclassing human experts) o Natural language processing o Social network filtering o Machine translation o Speech recognition o Audio recognition o Computer vision o Etc...
  • 15.
    Relevant code samples •Keras overhead code with underlying language-specific implementations of salient concepts in Microsoft Cognitive Toolkit, Theano and TensorFlow
  • 16.
    Lesson Conclusion • Traditional(application-specific or rule-based) algorithms become unwieldy and untenable as solutions to many “big data” problems • Advanced statistical methods remain relevant to some “big data” problems and some still out- perform AI/ML (Artificial Intelligence/Machine Learning) methods, however many complex problems cannot be directly analyzed using these methods; • Graph-based methods provide a flexible, robust means of modeling many complex and otherwise intractable problems • Deep learning methods represent a promising new frontier within the realm of data science, especially as applied to “big data” problems • Graph theory describes the underlying structures supporting contemporary deep learning technology • This course will provide general insight into the deep learning domain while specifically focusing on the role of graph models within these systems as presented, for basic awareness, via Python-based implementations of these protocols within a Keras environment, backed by Microsoft Cognitive Toolkit, Theano and TensorFlow systems, with relevant code-based examples

Editor's Notes

  • #2 Cover Option2
  • #7 Section Beginning (Dark Color Option )