Graph Models for Deep Learning

Deep Learning
Graph Models for Deep Learning
Stephen Donald Huff,
PhD

• Title of Course:
o “Graph Models for Deep Learning”
• Target Audience:
o Information Technology Professionals with a Basic Knowledge
of Graph Theory and Its Practical Applications;
Topics

Prerequisites
• Basic Understanding of Application-Specific
(“Traditional” or “Rule-Based”) Methods for
Statistical Analysis;
• Basic Understanding of Deep Learning Theory
(Graph/Network Theory);
• Basic General Understanding of Information
Technology;

Optional Course requisite(s):
• Basic Working Knowledge of Python Programming
• Primary Skills Developed
o Improved Ability to Discriminate, Differentiate and Conceptualize Appropriate
Implementations of Application-Specific (“Traditional” or “Rule-Based”)
Methods versus Deep Learning Methods of Statistical Analyses and Data
Modeling;
o Improved General Understanding of Graph Models as Deep Learning
Concepts;
o State-of-the-Art Awareness of Deep Learning Applications within the Fields of
Character Recognition and Computer Vision;
• Secondary Skills Developed:
o Basic working knowledge of the Python-based manipulation of Keras,
Microsoft Cognitive Toolkit, Theano and TensorFlow deep learning platforms;
o Basic ability to compare/contrast similar implementations of practical, graph-
based solutions in Keras using Microsoft Cognitive Toolkit, Theano and/or
TensorFlow back-end systems;

MODULE 1
Introduction, Review of Background Concepts and
Technological Context

Introduction:
optimal course outcomes and focus
• Generally inform the student regarding state-of-the-art in deep
learning
• Specifically inform the student regarding the role of graph models
within the realm of deep learning, including background concepts
and current technological state of the art
• Provide basic awareness of Python-based implementations of
various deep learning systems in Keras, using Microsoft Cognitive
Toolkit, Theano and TensorFlow as back-end platforms (the actual
“deep learning software”)

Review of Background Concepts:
Statistical inference and statistical models
• Uses data analysis to deduce properties of underlying probability distributions
• Infers population properties, assuming observed data sample from larger set
• Contrasts with descriptive statistics
o Strictly concerned with properties of observed data
o Does not assume the data come from a larger population
• Three levels of modeling assumptions
o Fully parametric
• Data generated by a process fully described by probability distributions with a finite number of
unknown parameters
• E.g., generalized linear models
o Non-parametric
• Minimal assumptions about the data generating process
• E.g., data generated in a simple random sample (a continuous probability distribution), which will have a
median value easily calculated from known features of the data set
o Semi-parametric
• Hybrid solution, models separated into “structural” and “random” variation components
• One component is treated parametrically and the other non-parametrically

Relevant Application-Specific (“Traditional” or “Rule-Based”)
Methods of Statistical Analysis (with Specific Focus on “Big Data”
Problems)
• General Development Paradigm: One Problem -> One Solution/Model ->
One Algorithm
• Reduced utility when applied to “Big Data” and/or multivariate problems
• Example: linear Regression
o Examples: ANOVA, ANCOVA, MANOVA, MANCOVA, t-test and F-test
o Requires multiple assumptions (often violated in practice)
o Models/predicts a scalar response (dependent variable) to one explanatory variable
(independent variable)
• Example: multiple linear regression
o Simple regression with multiple explanatory variables
• Example: multivariate linear regression
o Model predicts multiple correlated response (dependent) variables

o Example: non-linear regression (curve-fitting)
• Examples: exponential-, logarithmic-, trigonometric-, power- and Gaussian functions; Lorenz curves
• Numerical optimization algorithms often required
» Many potential local/global minima, optimization reduces bias (acceptance of local vs. global minima)
» Estimated parameter values applied via optimization algorithm (input/output less objective)
o Example: logistic regression (estimation of logistic model parameters)
• A logistic (logit) model usually applies to binary dependent variable(s)
» Log-odds of event probability (often binary, labelled "0"/"1" - pass/fail, win/lose, alive/dead, & etc...) is a linear
combination of predictor (independent) variables
o Example: decision trees (decision support tools, flow charts)
• Use of a tree-like graph/model of decisions and associated consequences
» Visualization of algorithms that only contain conditional control statements (e.g., IF/THEN and looping statements)
» Contemporarily used in operations research, specifically in decision analysis, to identify a strategy most likely to reach
specific a goal
• Also used in machine learning applications

Technological Context:
Predictive Modeling and Machine Learning
• Synonymous/overlaps with machine learning; commercially referred to as
“predictive analytics”
• Examples (Predictors and Classifiers)
o GLM (Generalized Linear
Models)
o Logistic regression
o Generalized additive models
o Robust regression
o Semiparametric regression
o Ordinary Least Squares
o Random forests
o Boosted trees
o Naive Bayes
o k-nearest neighbor algorithm
o Majority classifier
o Support vector machines
o CART (Classification and
Regression Trees)
o MARS (Multivariate Adaptive
Regression Splines)
o ACE and AVAS (Alternating
Conditional Expectations) and
(Additivity and Variance
Stabilization)
o Neural Networks

Technological Context:
Graph Theory and Neural Networks
• Graphs are mathematical structures that model pairwise relations between entities
o Comprised of vertices (or “nodes” or “points”) connected by edges (or “arcs” or “lines”)
o “Undirected” graphs have two-way edges connecting its vertices
o “Directed” graphs have one-way edges connecting its vertices
• Application: linguistics
o Language syntax and compositional semantics modeled as hierarchical graphs
• Application: physics and chemistry
o Quantitatively modelling of three-dimensional structures of complex atomic structures according to atomic topologies
o Structural modeling of atoms and bonds within molecules
• Application: social sciences
o Quantitative and qualitative exploration of social phenomena (interpersonal relationships, personality popularity, rumor spreading, acquaintanceship/friendship
matrices, etc...)
• Application: biology
o Ecological niche modeling, migration modeling, epidemiology
• Application: mathematics
o Geometric operations, topological studies (knot theory), many, many others
• Application: computer science
o Representation, modeling, visualization of networks (routing table optimization, website link mapping and quantification, social media analyses, and many, many
more)
• Neural networks are a popular example, loosely based on biology
o The variety and utility of these graph-based solutions is rapidly advancing and expanding in many directions; “Deep Learning” is a popular variant

Deep Learning (and Graph Models)
• Deep learning methods “learn data representations”
o As opposed to manually-developed, task-specific (“traditional” or “rule-based”) algorithms, summarized above
• Utilize “cascading” graphs, multiple layers of hierarchically connected nonlinear processing units (“a
network of neural networks”)
o Each successive layer uses the output from the previous layer as input (hence, the term “cascade”)
• Typically used for feature extraction and transformation
o Review: as opposed to engineered features, which typically require abundant intervention during development
• Learns multiple levels of data representations corresponding to different levels of abstraction
o These levels form a hierarchy of concepts, as described by graph (network) architecture
• General goal is identification of informative, discriminating and independent data “features” (to support
visualization, pattern-recognition, structural analyses, etc...)
o A feature is an individual, measurable property/characteristic of an observed phenomenon
o Without “data learning” methods, features must be engineered (e.g., development of characteristic-templates),
which is costly and time-consuming

Deep Learning (and Graph Models)
• Requires “training” – learning methods may be supervised (e.g., classification) and unsupervised (e.g.,
pattern analysis), or both (semi-supervised)
o Supervised feature learning requires labeled input data
• Supervised neural networks, multilayer-perceptrons, supervised dictionary learning, etc...
o Unsupervised feature learning uses unlabeled data
• Dictionary learning, independent component analysis, autoencoders, matrix factorization, clustering methods, etc...
• Multiple contemporary architectures (partial list of a growing enumeration)
o Deep neural networks
o Deep belief networks
o Recurrent neural networks
o Etc...
• Multiple contemporary applications (partial list of a growing enumeration, now including deployments
to personal devices)
o Bioinformatics/drug design
o Gaming (occasionally outclassing human experts)
o Natural language processing
o Social network filtering
o Machine translation
o Speech recognition
o Audio recognition
o Computer vision
o Etc...

Relevant code samples
• Keras overhead code with underlying language-specific implementations
of salient concepts in Microsoft Cognitive Toolkit, Theano and TensorFlow

Lesson Conclusion
• Traditional (application-specific or rule-based) algorithms become unwieldy and untenable as
solutions to many “big data” problems
• Advanced statistical methods remain relevant to some “big data” problems and some still out-
perform AI/ML (Artificial Intelligence/Machine Learning) methods, however many complex
problems cannot be directly analyzed using these methods;
• Graph-based methods provide a flexible, robust means of modeling many complex and otherwise
intractable problems
• Deep learning methods represent a promising new frontier within the realm of data science,
especially as applied to “big data” problems
• Graph theory describes the underlying structures supporting contemporary deep learning
technology
• This course will provide general insight into the deep learning domain while specifically focusing on
the role of graph models within these systems as presented, for basic awareness, via Python-based
implementations of these protocols within a Keras environment, backed by Microsoft Cognitive
Toolkit, Theano and TensorFlow systems, with relevant code-based examples

Graph Models for Deep Learning

More Related Content

What's hot

Similar to Graph Models for Deep Learning

More from Experfy

Recently uploaded

Graph Models for Deep Learning

Editor's Notes