Here is a basic Linear Algebra review for the class of Machine Learning. This is actually becoming a new class in the mathematics of Intelligent Systems, there I will be teaching stuff in
1.- Linear Algebra - From the basics to the Cayley-Hamilton Theorem with applications
2.- Mathematical Analysis - from set to the Reimann Integral
3.- Topology - Mostly in Hilbert Spaces
4.- Optimization - Convex functions, KKT conditions, Duality Theory, etc.
The stuff is going to be interesting...
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Here is a basic Linear Algebra review for the class of Machine Learning. This is actually becoming a new class in the mathematics of Intelligent Systems, there I will be teaching stuff in
1.- Linear Algebra - From the basics to the Cayley-Hamilton Theorem with applications
2.- Mathematical Analysis - from set to the Reimann Integral
3.- Topology - Mostly in Hilbert Spaces
4.- Optimization - Convex functions, KKT conditions, Duality Theory, etc.
The stuff is going to be interesting...
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...Ceni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the fifth part which is discussing singular value decomposition and principal component analysis.
Here are the slides of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Here are the slides of the second part which was discussing basis and dimension:
https://www.slideshare.net/CeniBabaogluPhDinMat/2-linear-algebra-for-machine-learning-basis-and-dimension
Here are the slides of the third part which is discussing factorization and linear transformations.
https://www.slideshare.net/CeniBabaogluPhDinMat/3-linear-algebra-for-machine-learning-factorization-and-linear-transformations-130813437
Here are the slides of the fourth part which is discussing eigenvalues and eigenvectors.
https://www.slideshare.net/CeniBabaogluPhDinMat/4-linear-algebra-for-machine-learning-eigenvalues-eigenvectors-and-diagonalization
3. Linear Algebra for Machine Learning: Factorization and Linear TransformationsCeni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the third part which is discussing factorization and linear transformations.
Here is the link of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Here are the slides of the second part which was discussing basis and dimension:
https://www.slideshare.net/CeniBabaogluPhDinMat/2-linear-algebra-for-machine-learning-basis-and-dimension
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
The generalized additive model is an excellent tool for analyzing non -linear functions. This document provides information on the strategy to adopt for this analysis.
Principal Component Analysis, or PCA, is a factual method that permits you to sum up the data contained in enormous information tables by methods for a littler arrangement of "synopsis files" that can be all the more handily envisioned and broke down.
This was a presentation I gave to my firm's internal CPE in December 2012. It related to correlation and simple regression models and how we can utilize these statistics in both income and market approaches.
Market Practice Series (Credit Losses Modeling)Yahya Kamel
The Central Bank of Egypt “CBE” has adopted IFRS in year 2008. In specific IAS 39 has a discussion about implementing a model that can derive the incurred credit losses for a pool of receivables/ loans, which was quite open for market development & practical initiatives.
From the part of the CBE, it has adopted same approach, which led to some wide different market practices, logic, and interpretations, which sometimes have been questionable on a wide scale basis!
So, I've thought to develop some sort of materials that can serve as a practical guidance for quantifying the credit risk, using different simple models, based on Basel II definitions of the risk components.
The intended users of this material are the credit risk professionals who conduct risk analysis, implement risk management policies, or/and are in charge of quantifying the credit risk for a loan portfolio (corporate & retail).
Also, other professionals or officers complying with IFRS, or CBE GAAP.
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...Ceni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the fifth part which is discussing singular value decomposition and principal component analysis.
Here are the slides of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Here are the slides of the second part which was discussing basis and dimension:
https://www.slideshare.net/CeniBabaogluPhDinMat/2-linear-algebra-for-machine-learning-basis-and-dimension
Here are the slides of the third part which is discussing factorization and linear transformations.
https://www.slideshare.net/CeniBabaogluPhDinMat/3-linear-algebra-for-machine-learning-factorization-and-linear-transformations-130813437
Here are the slides of the fourth part which is discussing eigenvalues and eigenvectors.
https://www.slideshare.net/CeniBabaogluPhDinMat/4-linear-algebra-for-machine-learning-eigenvalues-eigenvectors-and-diagonalization
3. Linear Algebra for Machine Learning: Factorization and Linear TransformationsCeni Babaoglu, PhD
The seminar series will focus on the mathematical background needed for machine learning. The first set of the seminars will be on "Linear Algebra for Machine Learning". Here are the slides of the third part which is discussing factorization and linear transformations.
Here is the link of the first part which was discussing linear systems: https://www.slideshare.net/CeniBabaogluPhDinMat/linear-algebra-for-machine-learning-linear-systems/1
Here are the slides of the second part which was discussing basis and dimension:
https://www.slideshare.net/CeniBabaogluPhDinMat/2-linear-algebra-for-machine-learning-basis-and-dimension
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
The generalized additive model is an excellent tool for analyzing non -linear functions. This document provides information on the strategy to adopt for this analysis.
Principal Component Analysis, or PCA, is a factual method that permits you to sum up the data contained in enormous information tables by methods for a littler arrangement of "synopsis files" that can be all the more handily envisioned and broke down.
This was a presentation I gave to my firm's internal CPE in December 2012. It related to correlation and simple regression models and how we can utilize these statistics in both income and market approaches.
Market Practice Series (Credit Losses Modeling)Yahya Kamel
The Central Bank of Egypt “CBE” has adopted IFRS in year 2008. In specific IAS 39 has a discussion about implementing a model that can derive the incurred credit losses for a pool of receivables/ loans, which was quite open for market development & practical initiatives.
From the part of the CBE, it has adopted same approach, which led to some wide different market practices, logic, and interpretations, which sometimes have been questionable on a wide scale basis!
So, I've thought to develop some sort of materials that can serve as a practical guidance for quantifying the credit risk, using different simple models, based on Basel II definitions of the risk components.
The intended users of this material are the credit risk professionals who conduct risk analysis, implement risk management policies, or/and are in charge of quantifying the credit risk for a loan portfolio (corporate & retail).
Also, other professionals or officers complying with IFRS, or CBE GAAP.
Social Media is often overlooked as a sales channel. This presentation gives some practical ideas on how to approach this channel and develop it for lead generation.
In this work, we study H∞ control wind turbine fuzzy model for finite frequency(FF) interval. Less conservative results are obtained by using Finsler’s lemma technique, generalized Kalman Yakubovich Popov (gKYP), linear matrix inequality (LMI) approach and added several separate parameters, these conditions are given in terms of LMI which can be efficiently solved numerically for the problem that such fuzzy systems are admissible with H∞ disturbance attenuation level. The FF H∞ performance approach allows the state feedback command in a specific interval, the simulation example is given to validate our results.
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
We have implemented a multiple precision ODE solver based on high-order fully implicit Runge-Kutta(IRK) methods. This ODE solver uses any order Gauss type formulas, and can be accelerated by using (1) MPFR as multiple precision floating-point arithmetic library, (2) real tridiagonalization supported in SPARK3, of linear equations to be solved in simplified Newton method as inner iteration, (3) mixed precision iterative refinement method\cite{mixed_prec_iterative_ref}, (4) parallelization with OpenMP, and (5) embedded formulas for IRK methods. In this talk, we describe the reason why we adopt such accelerations, and show the efficiency of the ODE solver through numerical experiments such as Kuramoto-Sivashinsky equation.
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Arthur Weglein
This short report gives a brief review on the finite difference modeling method used in MOSRP
and its boundary conditions as a preparation for the Green’s theorem RTM. The first
part gives the finite difference formulae we used and the second part describes the implemented
boundary conditions. The last part, using two examples, points out some impacts of the accuracy
of source fields on the results of modeling.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This paper studies an approximate dynamic programming (ADP) strategy of a group of nonlinear switched systems, where the external disturbances are considered. The neural network (NN) technique is regarded to estimate the unknown part of actor as well as critic to deal with the corresponding nominal system. The training technique is simul-taneously carried out based on the solution of minimizing the square error Hamilton function. The closed system’s tracking error is analyzed to converge to an attraction region of origin point with the uniformly ultimately bounded (UUB) description. The simulation results are implemented to determine the effectiveness of the ADP based controller.
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
1. Motivation: improve statistical models
2. Motivation: disadvantages of matrices
3. Tools: Tucker tensor format
4. Tensor approximation of Matern covariance function via FFT
5. Typical statistical operations in Tucker tensor format
6. Numerical experiments
Abstract : Motivated by the recovery and prediction of electricity consumption time series, we extend Nonnegative Matrix Factorization to take into account external features as side information. We consider general linear measurement settings, and propose a framework which models non-linear relationships between external features and the response variable. We extend previous theoretical results to obtain a sufficient condition on the identifiability of NMF with side information. Based on the classical Hierarchical Alternating Least Squares (HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates NMF in this setting. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation system dataset, to show its performance in matrix recovery and prediction for new rows and columns.
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
Generalized Bradley-Terry Modelling of Football Resultshtstatistics
Generalization of the Davidson model for modelling paired comparisons, to incorporate a tied outcome, applied to the results of football (soccer) results from the English first division.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
1. Generalized Nonlinear Models in R
Heather Turner1,2
, David Firth2
and Ioannis Kosmidis3
1 Independent consultant
2 University of Warwick, UK
3 UCL, UK
Turner, Firth & Kosmidis GNM in R ERCIM 2013 1 / 30
2. Generalized Linear Models
A GLM is made up of a linear predictor
η = β0 + β1x1 + ... + βpxp
and two functions
a link function that describes how the mean, E(Y ) = µ,
depends on the linear predictor
g(µ) = η
a variance function that describes how the variance, V ar(Y )
depends on the mean
V ar(Y ) = φV (µ)
where the dispersion parameter φ is a constant
Turner, Firth & Kosmidis GNM in R ERCIM 2013 2 / 30
3. Generalized Nonlinear Models
A generalized nonlinear model (GNM) is the same as a GLM
except that we have
g(µ) = η(x; β)
where η(x; β) is nonlinear in the parameters β.
Equivalently an extension of nonlinear least squares model, where the
variance of Y is allowed to depend on the mean.
Using a nonlinear predictor can produce a more parsimonious and
interpretable model.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 3 / 30
4. Example: Mental Health Status
A study of 1660 children from Manhattan recorded their mental
impairment and parents’ socioeconomic status (Agresti, 2002)
MHS
SES
FEDCBA
well mild moderate impaired
Turner, Firth & Kosmidis GNM in R ERCIM 2013 4 / 30
5. Independence
A simple analysis of these data might be to test for independence of
MHS and SES using a chi-squared test.
This is equivalent to testing the goodness-of-fit of the independence
model
log(µrc) = αr + βc
Such a test compares the independence model to the saturated model
log(µrc) = αr + βc + γrc
which may be over-complex.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 5 / 30
6. Row-column Association
One intermediate model is the Row-Column association model:
log(µrc) = αr + βc + φrψc
(Goodman, 1979), an example of a multiplicative interaction model.
For the Mental Health data:
## Analysis of Deviance Table
##
## Model 1: Freq ~ SES + MHS
## Model 2: Freq ~ SES + MHS + Mult(SES , MHS)
## Model 3: Freq ~ SES + MHS + SES:MHS
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 15 47.4
## 2 8 3.6 7 43.8 2.3e-07
## 3 0 0.0 8 3.6 0.89
Turner, Firth & Kosmidis GNM in R ERCIM 2013 6 / 30
7. Parameterisation
The independence model was defined earlier in an over-parameterised
form:
log(µrc) = αr + βc
= (αr + 1) + (βc − 1)
= α∗
r + β∗
c
Identifiability constraints may be imposed
to fix a one-to-one mapping between parameter values and
distributions
to enable interpretation of parameters
Turner, Firth & Kosmidis GNM in R ERCIM 2013 7 / 30
8. Standard Implementation
The standard approach of all major statistical software packages is to
apply the identifiability constraints in the construction of the model
g(µ) = Xβ
so that rank(X) is equal to the number of parameters p.
Then the inverse in the score equations of the IWLS algorithm
β(r+1)
= XT
W (r)
X
−1
XT
W (r)
z(r)
exists.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 8 / 30
9. Alternative Implementation
The gnm package for R works with over-parameterised models, where
rank(X) < p, and uses the generalised inverse in the IWLS updates:
β(r+1)
= XT
W (r)
X
−
XT
W (r)
z(r)
This approach is more useful for GNMs, where it is much harder to
define standard rules for specifying identifiability constraints.
Rather, identifiability constraints can be applied post-fitting for
inference and interpretation.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 9 / 30
10. Estimation of GNMs
GNMs present further technical difficulties vs. GLMs
automatic generation of starting values is hard
the likelihood may have multiple optima
The default approach of the gnm function in package gnm is to:
generate starting values randomly for nonlinear parameters and
using a GLM fit for linear parameters
use one-parameter-at-a-time Newton method to update
nonlinear parameters
use the generalized IWLS to update all parameters
Consequently, the parameterisation returned is random.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 10 / 30
11. Parameterisation of RC Model
The RC model is invariant to changes in scale or location of the
interaction parameters:
log(µrc) = αr + βc + φrψc
= αr + βc + (2φr)(0.5ψc)
= αr + (βc − ψc) + (φr + 1)(ψc)
One way to constrain these parameters is as follows
φ∗
r =
φr − r wrφr
r wr
r wr φr − r wrφr
r wr
where wr is the row probability, say, so that
r
wrφ∗
r = 0
r
wr(φ∗
r)2
= 1
Turner, Firth & Kosmidis GNM in R ERCIM 2013 11 / 30
12. Row and Column Scores
These scores and their standard errors can be obtained via the
getContrasts function in the gnm package
## Estimate Std. Error
## Mult(., MHS).SESA 1.11 0.30
## Mult(., MHS).SESB 1.12 0.31
## Mult(., MHS).SESC 0.37 0.32
## Mult(., MHS).SESD -0.03 0.27
## Mult(., MHS).SESE -1.01 0.31
## Mult(., MHS).SESF -1.82 0.28
## Estimate Std. Error
## Mult(SES , .).MHSwell 1.68 0.19
## Mult(SES , .).MHSmild 0.14 0.20
## Mult(SES , .). MHSmoderate -0.14 0.28
## Mult(SES , .). MHSimpaired -1.41 0.17
Turner, Firth & Kosmidis GNM in R ERCIM 2013 12 / 30
13. Stereotype Model
The stereotype model (Anderson, 1984) is suitable for ordered
categorical data. It is a special case of the multinomial logistic model:
pr(yi = c|xi) =
exp(β0c + βT
c xi)
r exp(β0r + βT
r xi)
in which only the scale of the relationship with the covariates changes
between categories:
pr(yi = c|xi) =
exp(β0c + γcβT
xi)
r exp(β0r + γrβT
xi)
Turner, Firth & Kosmidis GNM in R ERCIM 2013 13 / 30
14. Poisson Trick
The stereotype model can be fitted as a GNM by re-expressing the
categorical data as category counts Yi = (Yi1, . . . , Yik).
Assuming a Poisson distribution for Yic, the joint distribution of Yi is
Multinomial(Ni, pi1, . . . , pik) conditional on the total count Ni.
The expected counts are then µic = Nipic and the parameters of the
sterotype model can be estimated through fitting
log µic = log(Ni) + log(pic)
= αi + β0c + γc
r
βrxir
where the “nuisance” parameters αi ensure that the multinomial
denominators are reproduced exactly, as required.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 14 / 30
15. Augmented Least Squares
A disadvantage of using the Poisson trick is that the number of
nuisance parameters can be large, making computation slow.
The algorithm can be adapted using augmented least squares.
For an ordinary least squares model,
(y|X)T
(y|X)
−1
=
yT
y yT
X
XT
y XT
X
−1
=
A11 A12
A21 A22
where A11, A12 and A22 are functions of yT
y, XT
y and XT
X.
Then it can be shown that
ˆβ = (XT
X)−1
XT
y = −
A21
A11
requiring only the first row (column) of the inverse to be found.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 15 / 30
16. Application to Nuisance Parameters I
The same approach can be applied to the IWLS algorithm, letting
˜X = W
1
2 (z|X)
Now let
˜X = (U|V )
where V is the part of the design matrix corresponding to the
nuisance factor.
U is an nk × p matrix where n is the number of nuisance parameters
and k is the number of categories and p is the number of model
parameters, typically with n >> p.
V is an nk × n matrix of dummy variables identifying each individual.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 16 / 30
17. Application to Nuisance Parameters II
Then
( ˜X
T
˜X)−
=
UT
U UT
V
V T
U V T
V
−
=
B11 B12
B21 B22
Again, only the first row (column) of this generalised inverse is
required to estimate ˆβ, so we are only interested in B11 and B12.
B11 = (UT
U − UT
V (V T
V )−1
V T
U)−
B12 = −(V T
V )−1
V T
UB11
Turner, Firth & Kosmidis GNM in R ERCIM 2013 17 / 30
18. Elimination of the Nuisance Factor
UT
U is p × p, therefore not expensive to compute.
V T
V and V T
U can be computed without constructing the large
nk × n matrix V , due to the stucture of V
V T
V is diagonal and the non-zero elements can be computed
directly
V T
U is equivalent to aggregating the rows of U by levels of the
nuisance factor
Thus we only need to construct the U matrix, saving memory and
reducing the computational burden.
This approach is invoked using the eliminate argument to gnm.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 18 / 30
19. Example: Back Pain Data
For 101 patients, 3 prognostic variables were recorded at baseline,
then after 3 weeks the level of back pain was recorded (Anderson,
1984)
These data can be converted to counts using the
expandCategorical function, giving for the first record:
## x1 x2 x3 pain count id
## 1 1 1 1 worse 0 1
## 1.1 1 1 1 same 1 1
## 1.2 1 1 1 slight. improvement 0 1
## 1.3 1 1 1 moderate.improvement 0 1
## 1.4 1 1 1 marked. improvement 0 1
## 1.5 1 1 1 complete.relief 0 1
Turner, Firth & Kosmidis GNM in R ERCIM 2013 19 / 30
20. Back Pain Model
The expanded data set has only 606 records and the total number of
parameters is only 115 (9 nonlinear). So the model is quick to fit:
system.time({
m <- gnm(count ~ id + pain + Mult(pain, x1 + x2 + x3),
family = poisson, data = backPainLong, verbose = FALSE)
})[3]
## elapsed
## 0.268
However, eliminating the linear parameters reduces the run time by
more than two thirds, showing the potential of this technique.
system.time(m2 <- update(m, eliminate = id))[3]
## elapsed
## 0.088
Turner, Firth & Kosmidis GNM in R ERCIM 2013 20 / 30
21. Rasch Models
Rasch models are used in Item Response Theory to model the binary
responses of subjects over a set of items.
The simplest one parameter logistic (1PL) model has the form
log
πis
1 − πis
= αi + γs
The one-dimensional Rasch model extends the 1PL as follows:
log
πis
1 − πis
= αi + βiγs
where βi measures the discrimination of item i: the larger βi the
steeper the item-response function that maps γs to πis.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 21 / 30
22. Example: US House of Representatives
Votes on 20 roll calls selected by Americans for Democratic Action (ADA)
BankruptcyOverhaul.Yes
ErgonomicsRuleDisapproval.No
IncomeTaxReduction.No
MarriageTaxReduction.Yes
EstateTaxRelief.Yes
FetalProtection.No
SchoolVouchers.No
TaxCutReconciliationBill.No
CampaignFinanceReform.No
FlagDesecration.No
FaithBasedInitiative.Yes
ChinaNormalizedTradeRelations.Yes
ANWRDrillingBan.Yes
PatientsRightsHMOLiability.No
PatientsBillOfRights.No
DomesticPartnerBenefits.No
USMilitaryPersonnelOverseasAbortions.Yes
AntiTerrorismAuthority.No
EconomicStimulus.No
TradePromotionAuthorityFastTrack.No
Vote For Against Party Democrat Republican Other
Turner, Firth & Kosmidis GNM in R ERCIM 2013 22 / 30
23. Complete Separation
For representatives that always vote “For” or “Against” the ASA
position, maximum likelihood will produce infinite γs estimates, so
that the fitted probabilities are 0 or 1.
Two possible remedies:
1. Add δ to yis and 2δ to the totals nis
hard to quantify effect of adjustment
different δ give different results
2. Bias reduction (Firth, 1993; Kosmidis and Firth, 2009)
requires identifiable parameterization
Turner, Firth & Kosmidis GNM in R ERCIM 2013 23 / 30
24. Bias Reduction in the 1D Rasch Model
ML estimates are obtained by solving the score equations, which for
the one dimensional Rasch model with θ = (αT
, βT
, γT
)T
are
Ut =
I
i=1
S
s=1
(yis − nisπis)zist = 0
where zist = ∂ηis/∂θt.
The bias reduction method of Kosmidis and Firth (2009) works by
adjusting the scores, in this case giving
U∗
t =
I
i=1
S
s=1
yis +
1
2
his + cisvis − (nis + his)πis zist = 0
where vis, his and cis are depend on the model parameters.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 24 / 30
25. Identifiability in the 1D Rasch Model
In order to identify the parameters in 1D Rasch model
log
πis
1 − πis
= αi + βiγs
the scale of the βi and the location of the γs must be constrained.
This can be achieved by fixing one of the βi and one of the γs.
Here we will select one βi and one γs at random and fix them to their
ML estimates based on data that have been δ adjusted.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 25 / 30
26. Bias Reduction Algorithm
The bias adjustment suggests the following iterative scheme
1. Evaluate bias adjusted responses and totals given θ(i)
2. Fit the 1D Rasch model to the adjusted data using ML
Unfortunately the cis quantities are unbounded and can produce
adjusted yis < 0 or > nis
redefine yis and nis to avoid this
Adding a further iteration loop to IWLS adds significantly to the
computation time, therefore good starting values are important
if ML estimates finite use these
else use ML estimates found by δ adjustment
Turner, Firth & Kosmidis GNM in R ERCIM 2013 26 / 30
27. Liberality of US Representatives
All the ˆβi are < 0, hence smaller ˆγs implies larger probability of
voting for the ADA position, i.e. more liberal.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 27 / 30
28. Comparison Intervals
Adding intervals based on quasi-standard errors that are invariant to
the parameter constraints (Firth and de Menezes, 2004):
Turner, Firth & Kosmidis GNM in R ERCIM 2013 28 / 30
29. Summary
Working with over-parameterized models enables a general
framework to be implemented for GNMs
Some of the computational methods from GLMs can be applied
directly to GNMs. . .
. . . whilst others require much more work!
Further examples can be found in the help files and manual
accompanying the gnm package which is available on CRAN.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 29 / 30
30. References
Agresti, A. (2002). Categorical Data Analysis (2nd ed.). New York: Wiley.
Anderson, J. A. (1984). Regression and Ordered Categorical Variables. J.
R. Statist. Soc. B 46(1), 1–30.
Firth, D. (1993). Bias reduction of maximum likelihood estimates.
Biometrika 80(1), 27–38.
Firth, D. and R. X. de Menezes (2004). Quasi-variances. Biometrika 91,
65–80.
Goodman, L. A. (1979). Simple models for the analysis of association in
cross-classifications having ordered categories. J. Amer. Statist.
Assoc. 74, 537–552.
Kosmidis, I. and D. Firth (2009). Bias reduction in exponential family
nonlinear models. Biometrika 96(4), 793–804.
Turner, Firth & Kosmidis GNM in R ERCIM 2013 30 / 30