SlideShare a Scribd company logo
1 of 17
DEEP VS. DIVERSE
ARCHITECTURES
By Colleen M. Farrelly
SCOPE OF PROBLEM
•The No Free Lunch Theorem suggests that no individual machine learning model
will perform best across all types of data and datasets.
• Social science/behavioral datasets present a particular challenge, as data often contains main
effects and interaction effects, which can be linear or nonlinear with respect to an outcome of
interest.
• In addition, social science datasets often contain outliers and group overlap among
classification outcomes, where someone may have all the risk factors for dropping out or drug
use but does not exhibit the predicted behavior.
•Several machine learning frameworks have nice theoretical properties, including
convergence theorems and universal approximation guarantees, that may be
particularly adept at modeling social science outcomes.
• Superlearners and subsembles have been proven to improve ensemble performance to a level
at least as good as the best model in the ensemble.
• Neural networks with one hidden layer have universal approximation properties, which
guarantee that random mappings to a wide enough layer will come arbitrarily close to a desired
error level for any given function.
• One caveat to this universal approximation is the size needed to obtain these guarantees may be larger than is practical or
possible in a model.
• Deep learning attempts to rectify this limitation by adding additional layers to the neural network, where each layer
reduces model error beyond the previous layers’ capabilities.
NEURAL NETWORK GENERAL
OVERVIEW
colah.github.iowww.alz.org
•A neural network is a model
based on processing
complex, nonlinear
information the way the
human brain does via a series
of feature mappings.
Arrows denote mapping
functions, which take
one topological space
to another
•These are a type of shallow, wide neural network.
•This formulation of neural networks reduces framework to a penalized linear
algebra problem, rather than iterative training (much faster to solve).
•It is based on random mappings, it is shown to converge to correct
classification/regression via the Universal Approximation Theorem (likely a
result of adequate coverage of the underlying data manifold).
•However, this the width of the network required may be computational
infeasible at the point of convergence with an arbitrary error level.
EXTREME LEARNING MACHINES
AND UNIVERSAL APPROXIMATION
DEEP LEARNING
•Deep learning attempts to solve the wide
layer problem by adding depth layers in
neural networks, which can be more
effective and computationally feasible than
extreme learning machines for some
problems.
• This framework is like sifting data with multiple
sifters to distill finer and finer pieces of the data.
•These are computationally intensive and
require architecture design and tuning for
each problem.
• Feed-forward networks are particularly popular, as
they can be easily built, tuned, and trained.
• Feed-forward networks also have relations to the
Universal Approximation Theorem, providing a
means to exploit these results without requiring
•This model is a weighted aggregation
of multiple types of models.
• This is analogous to a small town election.
• Different people have different views of the
politics and care about different issues.
•Different modeling methods capture
different pieces of the data variance
and vote accordingly.
• This leverages algorithm strengths while
minimizing weaknesses for each model (kind
of like an extension of bagging to multi-
algorithm ensembles).
• Diversity allows the full ensemble to better
explore the geometry underlying the data.
•This combines multiple models while
avoiding multiple testing issues.
SUPERLEARNERS
THEORY AND PRACTICE
•Superlearners are a type of ensemble of machine learning models,
typically using a set of classifiers or regression models, including linear
models, tree models, and ensemble models like boosting or bagging.
• Superlearners also have some theoretical guarantees about convergence and least
upper bounds on model error relative to algorithms within superlearner framework.
• They also have the ability to rank variables by importance and provide model fits for
each component.
•Deep architectures can be designed as feed-forward data processing
networks, in which functional nodes through which data passes add
information to the dataset regarding optimal partitioning and variable
pairing.
• Recent attempts to create feed-forward deep networks employing random forest or
SVM functions at each mapping show promise as an alternative to the typical neural
network formulation of deep learning.
• It stands to reason that feed-forward deep networks based on other machine learning
algorithms or combinations of algorithms may enjoy some of these benefits of deep
EXPERIMENTAL SET-UP
•Algorithm frameworks tested:
1. Superlearner with random forest, random
ferns, KNN regression, MARS regression,
conditional inference trees, and boosted
regression.
2. Deep feed-forward machine learning model
(mixed deep model) with first hidden layer
of 2 random forest models, a conditional
inference tree model, and a random ferns
model; with second hidden layer of MARS
regression and conditional inference trees;
and a third hidden layer of boosted
regression.
3. Optimally tuned deep feed-forward neural
network model (13-5-3-1 configuration).
4. Deep feed-forward neural network model
with the same hidden layer structure as the
mixed deep model (Model 2).
5. KNN models, including k=5 regression
model, a deep k=5 model with 10-10-5
hidden layer configuration, and a
•Simulation design:
1. Outcome as yes/no for simplicity of
design (logistic regression problem)
2. 4 true predictors, 9 noise predictors
3. Predictor relationships
1. Purely linear terms (ideal neural network set-
up)
2. Purely nonlinear terms (ideal machine
learning set-up)
3. Mix of linear and nonlinear terms (more likely
in real-world data)
4. Gaussian noise level
1. Low
2. High (more likely in real-world data)
5. Addition of outliers (fraction ~5-10%)
to high noise conditions (mimic
group overlap)
6. Sample sizes of 500, 1000, 2500,
5000, 10000 to test convergence
properties for each condition and
algorithm
LINEAR RESULTS
•Deep neural networks show strong performance
(linear relationship models show universal
approximation convergence at low sample sizes with
low noise).
•Superlearners seem to perform better than deep
models for machine learning ensembles.
•Deep architectures enhance the performance of KNN
models, particularly at low sample sizes, but
superlearners win out.
NONLINEAR RESULTS
•Superlearners dominate performance accuracy at
smaller sample sizes, and machine learning deep
models are competitive at these sample sizes.
•Tuned deep neural networks catch up to this
performance at large sample sizes, particularly with
noise and no outliers.
•Superlearner architectures show performance gains in
KNN regression models across all conditions.
MIXED RESULTS
•Superlearners retain their competitive advantage up until
very large sample sizes, suggesting that deep neural
networks struggle with a mix of linear and nonlinear terms
in a classification/regression model.
•Machine-learning-based deep architectures are
competitive at small sample sizes compared to deep
neural networks when no outliers are present.
•KNN superlearners retain a large advantage, particularly at
low noise with few outliers.
PREDICTING BAR PASSAGE
•Data includes 188 Concord Law
students for whom BAR data exists.
•22 predictors, including admissions
factors and law school grades,
used.
•Mixed deep model, superlearner
model, and tuned deep neural
network model were compared to
assess performance on real-world
data exhibiting linear and nonlinear
relationships with noise and group
overlap.
•70% of data was used to train, with
30% held out as a test set to assess
Algorithm Accuracy
Deep Machine Learning
Network
84.2%
Superlearner Model 100.0%
Tuned Deep Neural
Network
68.4%
•Deep neural networks struggle with
the small sample size; using
machine learning map functions
dramatically improves accuracy.
• Sample size requirements for
convergence are a noted limitation of
neural networks in general.
• Previous results suggest performance
depends on choice of hidden layer
activation functions (maps).
•Superlearner yields perfect
prediction, with individual
PREDICTING RETENTION BY
ADVISING
•Data includes 27666 students in 2016
and retention/graduation status at the
end of each term.
•10 predictors—academic,
demographic, and advising factors—
were used.
•Mixed deep model, superlearner
model, and tuned deep neural network
model were compared to assess
performance on real-world data
exhibiting linear and nonlinear
relationships with noise and group
overlap.
•70% of data was used to train, with 30%
held out as a test set to assess
accuracy.
Algorithm Accuracy
Deep Machine Learning
Network
73.2%
Superlearner Model 74.1%
Tuned Deep Neural
Network
74.4%
•Deep neural networks and deep
machine learning models seem to
provide a good processing sequence
to improve model fits iteratively.
• Examining the deep machine learning
model, we see that later layers do weight
prior models as fairly important
predictors, and we see evidence that
these previous layer predictions combine
with other factors in the dataset in these
later layers.
• This suggests that a deep approach can
PREDICTING ADMISSIONS
•Data involved 905,612 leads from
2016 and various admission
factors.
• Because of low enrollment counts
(~24000), stratified sampling was
used to enrich the training set for all
models.
• Training set contained ~20% of
observations, with ~10% of those
being enrolled students.
•Superlearner/deep models give
very similar model fit specs
(accuracy, AUC, FNR, FPR), and
some individual models (MARS,
random forest, boosted
regression, conditional trees) gave
very good model fit, as well.
•This suggests convergence, of
most models tested, including
•Runtime analysis shows the advantage of
some models over others, with conditional
trees/MARS models showing low runtimes.
•Deep NN have an advantage over deep ML
models and superlearners, mostly as a result
of the random forest runtimes.
•A tree/MARS superlearner gave similar
performance in a shorter amount of time than
the deep NN (~2 minutes).
Algorithm Accurac
y
AUC FNR FPR Time
(Minutes
)
Deep
Machine
Learning
Network
98.0% 0.9
5
0.08 0.0
2
22
Superlearner
Model
98.2% 0.9
6
0.08 0.0
1
15
Fast
Superlearner
Model
98.0% 0.9
5
0.08 0.0
2
2
Tuned Deep
Neural
Network
98.0% 0.9
5
0.08 0.0
2
8
CONCLUSIONS
•Deep architectures can provide gain above individual models, particularly at
lower sample sizes, suggesting deep feed-forward approaches are
efficacious at improving predictive capabilities.
• This suggests that deep architectures can improve individual models that work well on a
particular problem.
• However, there is evidence that the topology of mappings between layers using these more
complex machine learning functions detracts from the predictive capabilities and universal
approximation property.
•Deep architectures with a variety of algorithms in each layer provide gains
above individual models and achieve good performance at low sample sizes
under real-world conditions.
•However, superlearners provide more robust models with no architecture
design or tuning needed; with group overlap and/or a combination of linear
and nonlinear relationships, they are the best models to use, even at sample
sizes where deep architecture begins to converge.
• Superlearners yield interpretable models and, hence, insight into important relationships
between predictors and an outcome.
SELECTED REFERENCES Theory and practice
• Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., & Zhavoronkov, A. (2016). Deep learning applications for predicting
pharmacological properties of drugs and drug repurposing using transcriptomic data. Molecular pharmaceutics, 13(7), 2524-2530.
• Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3),
175-185.
• Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
• Dekker, G., Pechenizkiy, M., & Vleeshouwers, J. (2009, July). Predicting students drop out: A case study. In Educational Data Mining
2009.
• Devroye, L. (1978). The uniform convergence of nearest neighbor regression function estimators and their application in
optimization. IEEE Transactions on Information Theory, 24(2), 142-151.
• Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 1-67.
• Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine,
22(9), 1365-1381. outliers
• Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks,
2(5), 359-366.
• Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of
Computational and Graphical statistics, 15(3), 651-674.
• Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with
random hidden nodes. IEEE Trans. Neural Networks, 17(4), 879-892.
• Huang, G. B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and
Cybernetics, 2(2), 107-122.
• Huberty, C. J., & Lowman, L. L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement, 60(4),
543-563.
• Kang, B., & Choo, H. (2016). A deep-learning-based emergency alert system. ICT Express, 2(2), 67-70.
• Lian, H. (2011). Convergence of functional k-nearest neighbor regression estimate with functional responses. Electronic Journal of
Statistics, 5, 31-40.
• Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical
assessment, research & evaluation, 9(6), 1-12.
• Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE transactions on pattern
analysis and machine intelligence, 32(3), 448-461.
• Pirracchio, R., Petersen, M. L., Carone, M., Rigon, M. R., Chevret, S., & van der Laan, M. J. (2015). Mortality prediction in intensive
care units with the Super ICU Learner Algorithm (SICULA): a population-based study. The Lancet Respiratory Medicine, 3(1), 42-52.
• Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117. –industry
and competition/robots

More Related Content

What's hot

JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelsJAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelshirokazutanaka
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced dataSaurabhWani6
 
Artificial Neural Network Abstract
Artificial Neural Network AbstractArtificial Neural Network Abstract
Artificial Neural Network AbstractAnjali Agrawal
 
Alpha fold 2
Alpha fold 2Alpha fold 2
Alpha fold 2Vishwas N
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
Helping Pharmas Manage Compliance Risks for Speaker Programs
Helping Pharmas Manage Compliance Risks for Speaker ProgramsHelping Pharmas Manage Compliance Risks for Speaker Programs
Helping Pharmas Manage Compliance Risks for Speaker ProgramsCognizant
 
Practical Non-Monotonic Reasoning
Practical Non-Monotonic ReasoningPractical Non-Monotonic Reasoning
Practical Non-Monotonic ReasoningGuido Governatori
 
Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analyticsJenya Terpil
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine LearningPriyanka Aash
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
 
Introduction to Soft Computing
Introduction to Soft Computing Introduction to Soft Computing
Introduction to Soft Computing Aakash Kumar
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)RakeshSaran5
 
Swarm intelligence and particle swarm optimization
Swarm intelligence and particle swarm optimizationSwarm intelligence and particle swarm optimization
Swarm intelligence and particle swarm optimizationMuhammad Haroon
 

What's hot (20)

neural networks
 neural networks neural networks
neural networks
 
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron modelsJAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
JAISTサマースクール2016「脳を知るための理論」講義01 Single neuron models
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
Artificial Neural Network Abstract
Artificial Neural Network AbstractArtificial Neural Network Abstract
Artificial Neural Network Abstract
 
Alpha fold 2
Alpha fold 2Alpha fold 2
Alpha fold 2
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Helping Pharmas Manage Compliance Risks for Speaker Programs
Helping Pharmas Manage Compliance Risks for Speaker ProgramsHelping Pharmas Manage Compliance Risks for Speaker Programs
Helping Pharmas Manage Compliance Risks for Speaker Programs
 
Practical Non-Monotonic Reasoning
Practical Non-Monotonic ReasoningPractical Non-Monotonic Reasoning
Practical Non-Monotonic Reasoning
 
Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analytics
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
 
Introduction to Soft Computing
Introduction to Soft Computing Introduction to Soft Computing
Introduction to Soft Computing
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)
 
Swarm intelligence and particle swarm optimization
Swarm intelligence and particle swarm optimizationSwarm intelligence and particle swarm optimization
Swarm intelligence and particle swarm optimization
 

Viewers also liked

Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceColleen Farrelly
 
The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of AddictionColleen Farrelly
 
Gender, Education, Skills, and Compensation in US Data Scientists
Gender, Education, Skills, and Compensation in US Data ScientistsGender, Education, Skills, and Compensation in US Data Scientists
Gender, Education, Skills, and Compensation in US Data ScientistsColleen Farrelly
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overviewColleen Farrelly
 
Understanding the Profoundly Gifted
Understanding the Profoundly GiftedUnderstanding the Profoundly Gifted
Understanding the Profoundly GiftedColleen Farrelly
 

Viewers also liked (8)

Trauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and ResilienceTrauma and Alcoholism: Risk and Resilience
Trauma and Alcoholism: Risk and Resilience
 
The Neurobiology of Addiction
The Neurobiology of AddictionThe Neurobiology of Addiction
The Neurobiology of Addiction
 
Gender, Education, Skills, and Compensation in US Data Scientists
Gender, Education, Skills, and Compensation in US Data ScientistsGender, Education, Skills, and Compensation in US Data Scientists
Gender, Education, Skills, and Compensation in US Data Scientists
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Understanding the Profoundly Gifted
Understanding the Profoundly GiftedUnderstanding the Profoundly Gifted
Understanding the Profoundly Gifted
 
Guide to MD/PhD programs
Guide to MD/PhD programsGuide to MD/PhD programs
Guide to MD/PhD programs
 
Profiles of the Gifted
Profiles of the GiftedProfiles of the Gifted
Profiles of the Gifted
 
Neuropsychopharmacology
NeuropsychopharmacologyNeuropsychopharmacology
Neuropsychopharmacology
 

Similar to Deep vs diverse architectures for classification problems

Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnnkartikaursang53
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfJayanti Pande
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine LearningAyodele Odubela
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
ADFUNN
ADFUNNADFUNN
ADFUNNadfunn
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
deep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted datadeep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted dataKVENKATASRAVANI
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learningRamesh Kumar
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 

Similar to Deep vs diverse architectures for classification problems (20)

Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdf
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Neural network
Neural networkNeural network
Neural network
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
ADFUNN
ADFUNNADFUNN
ADFUNN
 
SC1.pptx
SC1.pptxSC1.pptx
SC1.pptx
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
deep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted datadeep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted data
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Jack
JackJack
Jack
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 

Recently uploaded (20)

2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 

Deep vs diverse architectures for classification problems

  • 1. DEEP VS. DIVERSE ARCHITECTURES By Colleen M. Farrelly
  • 2. SCOPE OF PROBLEM •The No Free Lunch Theorem suggests that no individual machine learning model will perform best across all types of data and datasets. • Social science/behavioral datasets present a particular challenge, as data often contains main effects and interaction effects, which can be linear or nonlinear with respect to an outcome of interest. • In addition, social science datasets often contain outliers and group overlap among classification outcomes, where someone may have all the risk factors for dropping out or drug use but does not exhibit the predicted behavior. •Several machine learning frameworks have nice theoretical properties, including convergence theorems and universal approximation guarantees, that may be particularly adept at modeling social science outcomes. • Superlearners and subsembles have been proven to improve ensemble performance to a level at least as good as the best model in the ensemble. • Neural networks with one hidden layer have universal approximation properties, which guarantee that random mappings to a wide enough layer will come arbitrarily close to a desired error level for any given function. • One caveat to this universal approximation is the size needed to obtain these guarantees may be larger than is practical or possible in a model. • Deep learning attempts to rectify this limitation by adding additional layers to the neural network, where each layer reduces model error beyond the previous layers’ capabilities.
  • 3. NEURAL NETWORK GENERAL OVERVIEW colah.github.iowww.alz.org •A neural network is a model based on processing complex, nonlinear information the way the human brain does via a series of feature mappings. Arrows denote mapping functions, which take one topological space to another
  • 4. •These are a type of shallow, wide neural network. •This formulation of neural networks reduces framework to a penalized linear algebra problem, rather than iterative training (much faster to solve). •It is based on random mappings, it is shown to converge to correct classification/regression via the Universal Approximation Theorem (likely a result of adequate coverage of the underlying data manifold). •However, this the width of the network required may be computational infeasible at the point of convergence with an arbitrary error level. EXTREME LEARNING MACHINES AND UNIVERSAL APPROXIMATION
  • 5. DEEP LEARNING •Deep learning attempts to solve the wide layer problem by adding depth layers in neural networks, which can be more effective and computationally feasible than extreme learning machines for some problems. • This framework is like sifting data with multiple sifters to distill finer and finer pieces of the data. •These are computationally intensive and require architecture design and tuning for each problem. • Feed-forward networks are particularly popular, as they can be easily built, tuned, and trained. • Feed-forward networks also have relations to the Universal Approximation Theorem, providing a means to exploit these results without requiring
  • 6. •This model is a weighted aggregation of multiple types of models. • This is analogous to a small town election. • Different people have different views of the politics and care about different issues. •Different modeling methods capture different pieces of the data variance and vote accordingly. • This leverages algorithm strengths while minimizing weaknesses for each model (kind of like an extension of bagging to multi- algorithm ensembles). • Diversity allows the full ensemble to better explore the geometry underlying the data. •This combines multiple models while avoiding multiple testing issues. SUPERLEARNERS
  • 7. THEORY AND PRACTICE •Superlearners are a type of ensemble of machine learning models, typically using a set of classifiers or regression models, including linear models, tree models, and ensemble models like boosting or bagging. • Superlearners also have some theoretical guarantees about convergence and least upper bounds on model error relative to algorithms within superlearner framework. • They also have the ability to rank variables by importance and provide model fits for each component. •Deep architectures can be designed as feed-forward data processing networks, in which functional nodes through which data passes add information to the dataset regarding optimal partitioning and variable pairing. • Recent attempts to create feed-forward deep networks employing random forest or SVM functions at each mapping show promise as an alternative to the typical neural network formulation of deep learning. • It stands to reason that feed-forward deep networks based on other machine learning algorithms or combinations of algorithms may enjoy some of these benefits of deep
  • 8. EXPERIMENTAL SET-UP •Algorithm frameworks tested: 1. Superlearner with random forest, random ferns, KNN regression, MARS regression, conditional inference trees, and boosted regression. 2. Deep feed-forward machine learning model (mixed deep model) with first hidden layer of 2 random forest models, a conditional inference tree model, and a random ferns model; with second hidden layer of MARS regression and conditional inference trees; and a third hidden layer of boosted regression. 3. Optimally tuned deep feed-forward neural network model (13-5-3-1 configuration). 4. Deep feed-forward neural network model with the same hidden layer structure as the mixed deep model (Model 2). 5. KNN models, including k=5 regression model, a deep k=5 model with 10-10-5 hidden layer configuration, and a •Simulation design: 1. Outcome as yes/no for simplicity of design (logistic regression problem) 2. 4 true predictors, 9 noise predictors 3. Predictor relationships 1. Purely linear terms (ideal neural network set- up) 2. Purely nonlinear terms (ideal machine learning set-up) 3. Mix of linear and nonlinear terms (more likely in real-world data) 4. Gaussian noise level 1. Low 2. High (more likely in real-world data) 5. Addition of outliers (fraction ~5-10%) to high noise conditions (mimic group overlap) 6. Sample sizes of 500, 1000, 2500, 5000, 10000 to test convergence properties for each condition and algorithm
  • 9. LINEAR RESULTS •Deep neural networks show strong performance (linear relationship models show universal approximation convergence at low sample sizes with low noise). •Superlearners seem to perform better than deep models for machine learning ensembles. •Deep architectures enhance the performance of KNN models, particularly at low sample sizes, but superlearners win out.
  • 10. NONLINEAR RESULTS •Superlearners dominate performance accuracy at smaller sample sizes, and machine learning deep models are competitive at these sample sizes. •Tuned deep neural networks catch up to this performance at large sample sizes, particularly with noise and no outliers. •Superlearner architectures show performance gains in KNN regression models across all conditions.
  • 11. MIXED RESULTS •Superlearners retain their competitive advantage up until very large sample sizes, suggesting that deep neural networks struggle with a mix of linear and nonlinear terms in a classification/regression model. •Machine-learning-based deep architectures are competitive at small sample sizes compared to deep neural networks when no outliers are present. •KNN superlearners retain a large advantage, particularly at low noise with few outliers.
  • 12. PREDICTING BAR PASSAGE •Data includes 188 Concord Law students for whom BAR data exists. •22 predictors, including admissions factors and law school grades, used. •Mixed deep model, superlearner model, and tuned deep neural network model were compared to assess performance on real-world data exhibiting linear and nonlinear relationships with noise and group overlap. •70% of data was used to train, with 30% held out as a test set to assess Algorithm Accuracy Deep Machine Learning Network 84.2% Superlearner Model 100.0% Tuned Deep Neural Network 68.4% •Deep neural networks struggle with the small sample size; using machine learning map functions dramatically improves accuracy. • Sample size requirements for convergence are a noted limitation of neural networks in general. • Previous results suggest performance depends on choice of hidden layer activation functions (maps). •Superlearner yields perfect prediction, with individual
  • 13. PREDICTING RETENTION BY ADVISING •Data includes 27666 students in 2016 and retention/graduation status at the end of each term. •10 predictors—academic, demographic, and advising factors— were used. •Mixed deep model, superlearner model, and tuned deep neural network model were compared to assess performance on real-world data exhibiting linear and nonlinear relationships with noise and group overlap. •70% of data was used to train, with 30% held out as a test set to assess accuracy. Algorithm Accuracy Deep Machine Learning Network 73.2% Superlearner Model 74.1% Tuned Deep Neural Network 74.4% •Deep neural networks and deep machine learning models seem to provide a good processing sequence to improve model fits iteratively. • Examining the deep machine learning model, we see that later layers do weight prior models as fairly important predictors, and we see evidence that these previous layer predictions combine with other factors in the dataset in these later layers. • This suggests that a deep approach can
  • 14. PREDICTING ADMISSIONS •Data involved 905,612 leads from 2016 and various admission factors. • Because of low enrollment counts (~24000), stratified sampling was used to enrich the training set for all models. • Training set contained ~20% of observations, with ~10% of those being enrolled students. •Superlearner/deep models give very similar model fit specs (accuracy, AUC, FNR, FPR), and some individual models (MARS, random forest, boosted regression, conditional trees) gave very good model fit, as well. •This suggests convergence, of most models tested, including •Runtime analysis shows the advantage of some models over others, with conditional trees/MARS models showing low runtimes. •Deep NN have an advantage over deep ML models and superlearners, mostly as a result of the random forest runtimes. •A tree/MARS superlearner gave similar performance in a shorter amount of time than the deep NN (~2 minutes). Algorithm Accurac y AUC FNR FPR Time (Minutes ) Deep Machine Learning Network 98.0% 0.9 5 0.08 0.0 2 22 Superlearner Model 98.2% 0.9 6 0.08 0.0 1 15 Fast Superlearner Model 98.0% 0.9 5 0.08 0.0 2 2 Tuned Deep Neural Network 98.0% 0.9 5 0.08 0.0 2 8
  • 15. CONCLUSIONS •Deep architectures can provide gain above individual models, particularly at lower sample sizes, suggesting deep feed-forward approaches are efficacious at improving predictive capabilities. • This suggests that deep architectures can improve individual models that work well on a particular problem. • However, there is evidence that the topology of mappings between layers using these more complex machine learning functions detracts from the predictive capabilities and universal approximation property. •Deep architectures with a variety of algorithms in each layer provide gains above individual models and achieve good performance at low sample sizes under real-world conditions. •However, superlearners provide more robust models with no architecture design or tuning needed; with group overlap and/or a combination of linear and nonlinear relationships, they are the best models to use, even at sample sizes where deep architecture begins to converge. • Superlearners yield interpretable models and, hence, insight into important relationships between predictors and an outcome.
  • 17. • Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., & Zhavoronkov, A. (2016). Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Molecular pharmaceutics, 13(7), 2524-2530. • Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185. • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. • Dekker, G., Pechenizkiy, M., & Vleeshouwers, J. (2009, July). Predicting students drop out: A case study. In Educational Data Mining 2009. • Devroye, L. (1978). The uniform convergence of nearest neighbor regression function estimators and their application in optimization. IEEE Transactions on Information Theory, 24(2), 142-151. • Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 1-67. • Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine, 22(9), 1365-1381. outliers • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366. • Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical statistics, 15(3), 651-674. • Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Networks, 17(4), 879-892. • Huang, G. B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107-122. • Huberty, C. J., & Lowman, L. L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement, 60(4), 543-563. • Kang, B., & Choo, H. (2016). A deep-learning-based emergency alert system. ICT Express, 2(2), 67-70. • Lian, H. (2011). Convergence of functional k-nearest neighbor regression estimate with functional responses. Electronic Journal of Statistics, 5, 31-40. • Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical assessment, research & evaluation, 9(6), 1-12. • Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE transactions on pattern analysis and machine intelligence, 32(3), 448-461. • Pirracchio, R., Petersen, M. L., Carone, M., Rigon, M. R., Chevret, S., & van der Laan, M. J. (2015). Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. The Lancet Respiratory Medicine, 3(1), 42-52. • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117. –industry and competition/robots

Editor's Notes

  1. Computationally expensive in traditional algorithms and rooted in topological maps. Cannot handle lots of variables compared to number of observations. Cannot handle non-independent data. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.
  2. Random mappings to reduce MLP to linear system of equations. Huang, G. B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107-122.
  3. Computationally expensive neural network extension. Still suffers from singularities which hinder performance. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
  4. Bagging of different base models (same bootstrap or different bootstrap). van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).