The document summarizes research comparing human and machine vision across various models and datasets. Three key findings are presented: 1) The robustness gap between humans and CNNs is decreasing as newer models match or exceed human performance on most datasets. 2) However, an image-level consistency gap remains, where humans make different errors than models. 3) For many cases, human-model consistency improves when models are trained on datasets an order of magnitude larger. The research aims to benchmark progress in closing these gaps.
This talk will cover various medical applications of deep learning including tumor segmentation in histology slides, MRI, CT, and X-Ray data. Also, more complicated tasks such as cell counting where the challenge is to count how many objects are in an image. It will also cover generative adversarial networks and how they can be used for medical applications. This presentation is accessible to non-doctors and non-computer scientists.
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
The oral presentation of the paper titled "Crowd Density Estimation Method using Multiple Feature Categories and Multiple Regression Models".
This paper was accepted for publication and oral presentation in the 12th IEEE International Conference on Computer Engineering and Systems (ICCES 2017) held from 19 to 20 December 2017 in Cairo, Egypt.
The paper proposed a new method to estimate the number of people within crowded scenes using regression analysis. The two challenges in crowd density estimation using regression analysis are perspective distortion and non-linearity. This paper solves the perspective distortion using perspective normalization which is the best way to deal with that problem based on recent works.
The second challenge is solved by creating a new combination of features collected from multiple already existing categories including segmented region, texture, edge, and keypoints. This paper created a feature vector of length 164.
Five regression models are used which are GPR, RF, RPF, LASSO, and KNN.
Based on the experimental results, our proposed method gives better results than previous works.
----------------------------------
أحمد فوزي جاد Ahmed Fawzy Gad
قسم تكنولوجيا المعلومات Information Technology (IT) Department
كلية الحاسبات والمعلومات Faculty of Computers and Information (FCI)
جامعة المنوفية, مصر Menoufia University, Egypt
Teaching Assistant/Demonstrator
ahmed.fawzy@ci.menofia.edu.eg
---------------------------------
Find me on:
Blog
(Arabic) https://aiage-ar.blogspot.com.eg/
(English) https://aiage.blogspot.com.eg/
YouTube
https://www.youtube.com/AhmedGadFCIT
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad
reddit
https://www.reddit.com/user/AhmedGadFCIT
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://menofia.academia.edu/Gad
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
The presentation of my paper titled "#NumPyCNNAndroid: A Library for Straightforward Implementation of #ConvolutionalNeuralNetworks for #Android Devices" at the second International Conference of Innovative Trends in #ComputerEngineering (ITCE 2019).
The paper proposes a library for implementing convolutional neural networks (CNNs) in order to run on Android devices. The process of running the CNN on the mobile devices is straightforward and does not require an in-between step for model conversion as it uses #Kivy cross-platform library.
The CNN layers are implemented in #NumPy. You can find their implementation in my #GitHub project at this link: https://github.com/ahmedfgad/NumPyCNN
The library is also open source available here: https://github.com/ahmedfgad/NumPyCNNAndroid
There are 2 modes of operation for this work. The first one is training the CNN on the mobile device but it is very time-consuming at least in the current version. The second and preferred way is to train the CNN in a desktop computer and then use it on the mobile device.
This document summarizes Kevin McGuinness' presentation on deep learning for computer vision. It discusses visual attention models and their ability to predict eye gaze, applications in image cropping, retrieval and classification. It also covers medical image analysis using deep learning for knee osteoarthritis grading and neonatal brain segmentation. Deep crowd analysis is examined for crowd counting. Finally, interactive deep vision for image segmentation using user interactions is presented.
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceAlessya Visnjic
In this talk, Javier Antorán discusses the importance of uncertainty when it comes to ML interpretability. He offers a new uncertainty-based interpretability technique called CLUE and compares it to existing model interpretability techniques in two usability studies. Javier is a Ph.D. student at the University of Cambridge. His research interests include Bayesian deep learning, uncertainty in machine learning, representation learning, and information theory.
Genetic algorithms are search algorithms inspired by biological evolution that use techniques like mutation, crossover, and selection to evolve solutions to problems. They represent potential solutions as individuals in a population and evolve the population over multiple generations using genetic operators to improve the overall quality of solutions. Genetic programming is a type of genetic algorithm that evolves computer programs to solve problems by genetically breeding populations of computer programs.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
This talk will cover various medical applications of deep learning including tumor segmentation in histology slides, MRI, CT, and X-Ray data. Also, more complicated tasks such as cell counting where the challenge is to count how many objects are in an image. It will also cover generative adversarial networks and how they can be used for medical applications. This presentation is accessible to non-doctors and non-computer scientists.
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
The oral presentation of the paper titled "Crowd Density Estimation Method using Multiple Feature Categories and Multiple Regression Models".
This paper was accepted for publication and oral presentation in the 12th IEEE International Conference on Computer Engineering and Systems (ICCES 2017) held from 19 to 20 December 2017 in Cairo, Egypt.
The paper proposed a new method to estimate the number of people within crowded scenes using regression analysis. The two challenges in crowd density estimation using regression analysis are perspective distortion and non-linearity. This paper solves the perspective distortion using perspective normalization which is the best way to deal with that problem based on recent works.
The second challenge is solved by creating a new combination of features collected from multiple already existing categories including segmented region, texture, edge, and keypoints. This paper created a feature vector of length 164.
Five regression models are used which are GPR, RF, RPF, LASSO, and KNN.
Based on the experimental results, our proposed method gives better results than previous works.
----------------------------------
أحمد فوزي جاد Ahmed Fawzy Gad
قسم تكنولوجيا المعلومات Information Technology (IT) Department
كلية الحاسبات والمعلومات Faculty of Computers and Information (FCI)
جامعة المنوفية, مصر Menoufia University, Egypt
Teaching Assistant/Demonstrator
ahmed.fawzy@ci.menofia.edu.eg
---------------------------------
Find me on:
Blog
(Arabic) https://aiage-ar.blogspot.com.eg/
(English) https://aiage.blogspot.com.eg/
YouTube
https://www.youtube.com/AhmedGadFCIT
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad
reddit
https://www.reddit.com/user/AhmedGadFCIT
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://menofia.academia.edu/Gad
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
The presentation of my paper titled "#NumPyCNNAndroid: A Library for Straightforward Implementation of #ConvolutionalNeuralNetworks for #Android Devices" at the second International Conference of Innovative Trends in #ComputerEngineering (ITCE 2019).
The paper proposes a library for implementing convolutional neural networks (CNNs) in order to run on Android devices. The process of running the CNN on the mobile devices is straightforward and does not require an in-between step for model conversion as it uses #Kivy cross-platform library.
The CNN layers are implemented in #NumPy. You can find their implementation in my #GitHub project at this link: https://github.com/ahmedfgad/NumPyCNN
The library is also open source available here: https://github.com/ahmedfgad/NumPyCNNAndroid
There are 2 modes of operation for this work. The first one is training the CNN on the mobile device but it is very time-consuming at least in the current version. The second and preferred way is to train the CNN in a desktop computer and then use it on the mobile device.
This document summarizes Kevin McGuinness' presentation on deep learning for computer vision. It discusses visual attention models and their ability to predict eye gaze, applications in image cropping, retrieval and classification. It also covers medical image analysis using deep learning for knee osteoarthritis grading and neonatal brain segmentation. Deep crowd analysis is examined for crowd counting. Finally, interactive deep vision for image segmentation using user interactions is presented.
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceAlessya Visnjic
In this talk, Javier Antorán discusses the importance of uncertainty when it comes to ML interpretability. He offers a new uncertainty-based interpretability technique called CLUE and compares it to existing model interpretability techniques in two usability studies. Javier is a Ph.D. student at the University of Cambridge. His research interests include Bayesian deep learning, uncertainty in machine learning, representation learning, and information theory.
Genetic algorithms are search algorithms inspired by biological evolution that use techniques like mutation, crossover, and selection to evolve solutions to problems. They represent potential solutions as individuals in a population and evolve the population over multiple generations using genetic operators to improve the overall quality of solutions. Genetic programming is a type of genetic algorithm that evolves computer programs to solve problems by genetically breeding populations of computer programs.
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Abstract: Generative models, and in particular adversarial ones, are becoming prevalent in computer vision as they enable enhancing artistic creation, inspire designers, prove usefulness in semi-supervised learning or robotics applications.
We will see how to develop the abilities of Generative Adversarial Networks (GANs) to
deviate from training examples to generate more original images of fashion designs. As a limitation of GANs is the production of raw images of low resolution, we also present solutions to produce vectorized results, and show how the developed method may be useful for image editing.
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Anjani Dhrangadhariya
This document summarizes a study that mined biomedical literature to create a large multimodal dataset of rare cancer studies. The researchers harvested over 15,000 images and corresponding journal articles related to rare cancers from public literature databases. They used both visual and textual classification approaches to identify images of humans, neoplastic tissues, and rare cancers. The textual approach using TF-IDF and SVMs outperformed visual CNN classifiers for all tasks. This created the first dataset aimed at automatically extracting rare cancer images to help address challenges in researching these less prevalent cancers.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies. Presentation of Anjani K. Dhrangadhariya (Institute of Information Systems, HES-SO Valais-Wallis, Sierre) at SPIE Medical Imaging 2020.
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...AIST
The document describes techniques used to improve the quality of crowdsourced data from the GEO-Wiki project. It discusses preprocessing steps like blur detection, duplicate detection, and vote aggregation algorithms. Blur detection removed 2% of low-quality images, while duplicate detection based on perceptual hashing removed 6% of redundant votes. Benchmarking algorithms on expert-annotated data showed that majority voting performed comparably to more complex algorithms when there were many accurate volunteers and few spammers. Preprocessing improved results by reducing workload and increasing statistical significance.
Class imbalance is a pervasive issue in the field of disease classification from
medical images. It is necessary to balance out the class distribution while training a model. However, in the case of rare medical diseases, images from affected
patients are much harder to come by compared to images from non-affected
patients, resulting in unwanted class imbalance. Various processes of tackling
class imbalance issues have been explored so far, each having its fair share of
drawbacks. In this research, we propose an outlier detection based image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An
autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected
cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively,
performing better than large deep learning models and other published works.
As our proposed approach can provide competitive results without needing the
disease-positive samples during training, it should prove to be useful in binary
disease classification on imbalanced datasets.
The field of Artificial Intelligence (AI) has been revitalized in this decade, primarily due to the large-scale application of Deep Learning (DL) and other Machine Learning (ML) algorithms. This has been most evident in applications like computer vision, natural language processing, and game bots. However, extraordinary successes within a short period of time have also had the unintended consequence of causing a sharp difference of opinion in research and industrial communities regarding the capabilities and limitations of deep learning. A few questions you might have heard being asked (or asked yourself) include:
a. We don’t know how Deep Neural Networks make decisions, so can we trust them?
b. Can Deep Learning deal with highly non-linear continuous systems with millions of variables?
c. Can Deep Learning solve the Artificial General Intelligence problem?
The goal of this seminar is to provide a 1000-feet view of Deep Learning and hopefully answer the questions above. The seminar will touch upon the evolution, current state of the art, and peculiarities of Deep Learning, and share thoughts on using Deep Learning as a tool for developing power system solutions.
This document describes a project on age and gender detection using deep learning and convolutional neural networks. The objectives are to pretrain a model using the UTKFace dataset to detect age and gender from facial images. The methodology involves data preprocessing like grayscale conversion, resizing, normalization. A CNN model is built with convolutional blocks and fully connected layers. The model is compiled with binary cross entropy loss for gender classification and mean absolute error for age detection. The trained model is tested and deployed using streamlit. Real-world use cases discussed are advertising based on targeted audiences and an Android app that detects age from photos.
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
This document discusses scientific machine learning and differentiable simulation. It begins by explaining that scientific machine learning uses both data and physical knowledge to make accurate predictions with less data. It then discusses differentiable simulation and how universal differential equations can be used to replace unknown portions of models with neural networks while preserving known physical structure. Several examples are provided of applications in various domains like epidemiology, black hole detection, earthquake engineering, and chemistry. The document emphasizes that understanding the engineering principles and numerical properties of the domain is important for applying these methods stably and efficiently.
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
Artificial intelligence is emerging as a new paradigm in materials science. This talk describes how physical intuition and (insightful) machine learning can solve the complicated task of structure recognition in materials at the nanoscale.
The document describes a study that used a convolutional neural network with a ConvNeXtLarge architecture to classify skin cancer images into benign and malignant classes. The CNN model was trained on a dataset of 3,297 skin cancer images from Kaggle. It achieved an AUC of 0.91 for classifying the images, demonstrating the ConvNeXtLarge architecture is effective for this task. The study aims to help early diagnosis and treatment of skin cancers.
(Structural) Feature Interactions for Variability-Intensive Systems Testing Gilles Perrouin
Presentation given in the "short talks" session in the Dagstuhl seminar 14281 on "Feature Interactions - the Next Generation" , Schloss Dagstuhl, Germany, July 2014.
Generative Adversarial Networks for Robust Medical Image Analysis.pdfDaniel983829
This document presents two approaches for improving robustness in medical image segmentation using generative adversarial networks (GANs). The first approach, UltraGAN, uses a GAN to enhance the quality of ultrasound images and improve robustness to low image quality. The second approach, MedRobGAN, generates adversarial medical image examples to improve robustness against adversarial attacks. Both methods are evaluated on medical segmentation tasks to validate their effectiveness in improving robustness.
The document provides an overview of deep learning including:
- Its roots in neural networks and how it creates many layers of neurons to learn structured representations of big data
- Key deep learning algorithms like convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
- Applications that have been advanced by deep learning, including computer vision, speech recognition, natural language processing and control
Object Detection on Dental X-ray Images using R-CNNMinhazul Arefin
The document describes an object detection model using region-based convolutional neural networks to detect dental caries and root canals in dental x-ray images. The model was trained on a dataset of dental x-rays labeled with caries and root canal objects. Feature extraction was performed using multiple convolutional layers and concatenation. Anchors, multi-scale training, and hard negative mining were used to improve performance. The model achieved 83.45% accuracy for object detection, outperforming other methods. Future work could include working with larger datasets and real-time detection from video.
[ICCV 21] Influence-Balanced Loss for Imbalanced Visual ClassificationSeulki Park
This document proposes a new influence-balanced loss function for training deep neural networks on imbalanced visual classification tasks. It discovers that existing loss functions can lead to overfitting on majority classes. The new loss measures each sample's influence on the decision boundary and downweights influential majority samples to reduce overfitting. Experiments on long-tailed and real-world imbalanced datasets demonstrate state-of-the-art accuracy, especially for minority classes. The method is easy to implement and can improve generalization on imbalanced data.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Abstract: Generative models, and in particular adversarial ones, are becoming prevalent in computer vision as they enable enhancing artistic creation, inspire designers, prove usefulness in semi-supervised learning or robotics applications.
We will see how to develop the abilities of Generative Adversarial Networks (GANs) to
deviate from training examples to generate more original images of fashion designs. As a limitation of GANs is the production of raw images of low resolution, we also present solutions to produce vectorized results, and show how the developed method may be useful for image editing.
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Anjani Dhrangadhariya
This document summarizes a study that mined biomedical literature to create a large multimodal dataset of rare cancer studies. The researchers harvested over 15,000 images and corresponding journal articles related to rare cancers from public literature databases. They used both visual and textual classification approaches to identify images of humans, neoplastic tissues, and rare cancers. The textual approach using TF-IDF and SVMs outperformed visual CNN classifiers for all tasks. This created the first dataset aimed at automatically extracting rare cancer images to help address challenges in researching these less prevalent cancers.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies. Presentation of Anjani K. Dhrangadhariya (Institute of Information Systems, HES-SO Valais-Wallis, Sierre) at SPIE Medical Imaging 2020.
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...AIST
The document describes techniques used to improve the quality of crowdsourced data from the GEO-Wiki project. It discusses preprocessing steps like blur detection, duplicate detection, and vote aggregation algorithms. Blur detection removed 2% of low-quality images, while duplicate detection based on perceptual hashing removed 6% of redundant votes. Benchmarking algorithms on expert-annotated data showed that majority voting performed comparably to more complex algorithms when there were many accurate volunteers and few spammers. Preprocessing improved results by reducing workload and increasing statistical significance.
Class imbalance is a pervasive issue in the field of disease classification from
medical images. It is necessary to balance out the class distribution while training a model. However, in the case of rare medical diseases, images from affected
patients are much harder to come by compared to images from non-affected
patients, resulting in unwanted class imbalance. Various processes of tackling
class imbalance issues have been explored so far, each having its fair share of
drawbacks. In this research, we propose an outlier detection based image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An
autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected
cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively,
performing better than large deep learning models and other published works.
As our proposed approach can provide competitive results without needing the
disease-positive samples during training, it should prove to be useful in binary
disease classification on imbalanced datasets.
The field of Artificial Intelligence (AI) has been revitalized in this decade, primarily due to the large-scale application of Deep Learning (DL) and other Machine Learning (ML) algorithms. This has been most evident in applications like computer vision, natural language processing, and game bots. However, extraordinary successes within a short period of time have also had the unintended consequence of causing a sharp difference of opinion in research and industrial communities regarding the capabilities and limitations of deep learning. A few questions you might have heard being asked (or asked yourself) include:
a. We don’t know how Deep Neural Networks make decisions, so can we trust them?
b. Can Deep Learning deal with highly non-linear continuous systems with millions of variables?
c. Can Deep Learning solve the Artificial General Intelligence problem?
The goal of this seminar is to provide a 1000-feet view of Deep Learning and hopefully answer the questions above. The seminar will touch upon the evolution, current state of the art, and peculiarities of Deep Learning, and share thoughts on using Deep Learning as a tool for developing power system solutions.
This document describes a project on age and gender detection using deep learning and convolutional neural networks. The objectives are to pretrain a model using the UTKFace dataset to detect age and gender from facial images. The methodology involves data preprocessing like grayscale conversion, resizing, normalization. A CNN model is built with convolutional blocks and fully connected layers. The model is compiled with binary cross entropy loss for gender classification and mean absolute error for age detection. The trained model is tested and deployed using streamlit. Real-world use cases discussed are advertising based on targeted audiences and an Android app that detects age from photos.
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
This document discusses scientific machine learning and differentiable simulation. It begins by explaining that scientific machine learning uses both data and physical knowledge to make accurate predictions with less data. It then discusses differentiable simulation and how universal differential equations can be used to replace unknown portions of models with neural networks while preserving known physical structure. Several examples are provided of applications in various domains like epidemiology, black hole detection, earthquake engineering, and chemistry. The document emphasizes that understanding the engineering principles and numerical properties of the domain is important for applying these methods stably and efficiently.
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
Artificial intelligence is emerging as a new paradigm in materials science. This talk describes how physical intuition and (insightful) machine learning can solve the complicated task of structure recognition in materials at the nanoscale.
The document describes a study that used a convolutional neural network with a ConvNeXtLarge architecture to classify skin cancer images into benign and malignant classes. The CNN model was trained on a dataset of 3,297 skin cancer images from Kaggle. It achieved an AUC of 0.91 for classifying the images, demonstrating the ConvNeXtLarge architecture is effective for this task. The study aims to help early diagnosis and treatment of skin cancers.
(Structural) Feature Interactions for Variability-Intensive Systems Testing Gilles Perrouin
Presentation given in the "short talks" session in the Dagstuhl seminar 14281 on "Feature Interactions - the Next Generation" , Schloss Dagstuhl, Germany, July 2014.
Generative Adversarial Networks for Robust Medical Image Analysis.pdfDaniel983829
This document presents two approaches for improving robustness in medical image segmentation using generative adversarial networks (GANs). The first approach, UltraGAN, uses a GAN to enhance the quality of ultrasound images and improve robustness to low image quality. The second approach, MedRobGAN, generates adversarial medical image examples to improve robustness against adversarial attacks. Both methods are evaluated on medical segmentation tasks to validate their effectiveness in improving robustness.
The document provides an overview of deep learning including:
- Its roots in neural networks and how it creates many layers of neurons to learn structured representations of big data
- Key deep learning algorithms like convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
- Applications that have been advanced by deep learning, including computer vision, speech recognition, natural language processing and control
Object Detection on Dental X-ray Images using R-CNNMinhazul Arefin
The document describes an object detection model using region-based convolutional neural networks to detect dental caries and root canals in dental x-ray images. The model was trained on a dataset of dental x-rays labeled with caries and root canal objects. Feature extraction was performed using multiple convolutional layers and concatenation. Anchors, multi-scale training, and hard negative mining were used to improve performance. The model achieved 83.45% accuracy for object detection, outperforming other methods. Future work could include working with larger datasets and real-time detection from video.
[ICCV 21] Influence-Balanced Loss for Imbalanced Visual ClassificationSeulki Park
This document proposes a new influence-balanced loss function for training deep neural networks on imbalanced visual classification tasks. It discovers that existing loss functions can lead to overfitting on majority classes. The new loss measures each sample's influence on the decision boundary and downweights influential majority samples to reduce overfitting. Experiments on long-tailed and real-world imbalanced datasets demonstrate state-of-the-art accuracy, especially for minority classes. The method is easy to implement and can improve generalization on imbalanced data.
Similar to [Explained] "Partial Success in Closing the Gap between Human and Machine Vision" Geirhos et al., 2021 (20)
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
[Explained] "Partial Success in Closing the Gap between Human and Machine Vision" Geirhos et al., 2021
1. Cognitive Informatics Lab.
Dept. of Intelligence Science and Technology,
Graduate School of Informatics, Kyoto University
Sou Yoshihara (M2)
arXiv:2106.07411
0
2. Abstract: 3 findings
l The longstanding robustness gap between humans and CNNs is closing
l There is still a substantial image-level consistency gap, meaning that
humans make different errors than models
l In many cases, human-to-model consistency improves when training
dataset size is increased by one to three orders of magnitude
This evaluation is open-sourced as a benchmark to track future progress.
(https://github.com/bethgelab/model-vs-human/)
1
3. Introduction
l Currently, models are routinely matching and in many cases even outperforming humans
l At the same time, it is becoming increasingly clear that models systematically exploit shortcuts
shared between training and test data
l Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to
more challenging testing conditions (e.g. real-world scenarios) (Geirhos et al., 2020, arXiv:2004.07780)
toy example of shortcut learning in neural networks: When trained on a simple dataset of stars and moons, a standard
fully connected neural network learns a shortcut strategy: classifying based on the location (stars in the top right or bottom left;
moons in the top left or bottom right) rather than the shape of the objects
(Geirhos et al., 2020)
2
4. Out-of-Distribution (OOD) data
l Out-of-Distribution (OOD) data:
Testing models on more challenging test cases where there is still a ground truth category, but
certain image statistics differ from the training distribution
l Previous works
○ ImageNet-C (Hendrycks et al., 2019)
○ ImageNet-Sketch (Wang et al., 2019)
○ Stylized-ImageNet (Geirhos et al., 2019)
They lack human comparison data
The authors tested human observers in a lab on OOD datasets
(85K psychophysical trials across 90 participants)
ImageNet-C
4
5. 17 OOD datasets
high-pass
colour vs. grayscale low contrast low-pass/blurring phase noise
true power spectrum
vs. power equalisation
true vs. opponent colour rotation Eidolon I Eidolon II Eidolon III uniform noise
ImageNet
160 160 1280
5
6. 3 axes of models
l Objective function
(supervised vs. self-supervised, adversarially trained, and CLIP’s joint
language-image training)
l Architecture
(convolutional vs. vision transformer)
l Training dataset size
(ranging from 1M to 1B images)
6
7. Psychophysical experiments
l Psychophysical experiments in a lab
○ 90 observers were tested in a darkened chamber
○ 16 categories (such as chair, dog, airplane, etc.)
○ 22” monitor with 1920 × 1200 pixels resolution (refresh rate: 120 Hz)
○ Viewing distance: 107 cm
○ Target images at the center subtended 3 × 3 degrees of visual angle
○ 200 ms followed by a 1/f backward mask
Measures against COVID-19 risks
7
8. Models
l 16 categories:
The 1000 class decision vector was mapped to those 16 classes using the
WordNet hierarchy(Miller, 1995).
l 52 models
○ 24 standard ImageNet-trained CNNs
○ 8 self-supervised models
○ 6 Big Transfer models
○ 5 adversarially trained models
○ 5 vision transformers
○ 2 semi-weakly supervised models
○ Noisy Student
○ CLIP
8
9. Metrics
l OOD accuracy (averaged across conditions and datasets)
l Accuracy difference A(m)
l Observed consistency O(m)
l Error consistency E(m): It tracks whether there is above-chance consistency
1 if humans and a model are either both right or both wrong, 0 otherwise
!
𝑜!,# 𝑆$,% : expected consistency
𝑑: 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
ℎ: ℎ𝑢𝑚𝑎𝑛
𝑐: 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛
𝑠: 𝑠𝑎𝑚𝑝𝑙𝑒 (𝑎𝑛 𝑖𝑚𝑎𝑔𝑒)
9
10. Error consistency E(m)
l Error consistency E(m)
l Cohen’s kappa
indicates whether the observed consistency is larger than
what could have been expected
!
𝑜!,# 𝑆$,% : expected consistency given two independent binomial decision makers with matched accuracy,
only random consistency
Cohen’s kappa agreement
< 0 no agreement
0–0.20 slight
0.21–0.40 fair
0.41–0.60 moderate
0.61–0.80 substantial
0.81–1 almost perfect
(Cohen's kappa, Wikipedia)
(Eq. from Geirhos et al., 2020, arXiv:2006.16736)
𝑐&'(:𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦
𝑐)*+:𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦
Why error consistency?
e.g.) Two decision makers with 95% accuracy each will have at least 90% observed consistency
(intuitively, they both get most images correct and thus observed overlap is high)
𝑝:𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦
1 − 𝑝:𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦
10
11. Results on OOD datasets
The OOD robustness gap between human and machine vision is closing (top),
but an image-level consistency gap remains (bottom, especially (d)).
↓ : trained on large-scale datasets
humans
standard supervised CNNs
self-supervised models
adversarially trained models
vision transformers
noisy student
BiT
SWSL
CLIP
11
12. Results on OOD datasets
humans
standard supervised CNNs
self-supervised models
adversarially trained models
vision transformers
noisy student
BiT
SWSL
CLIP
12
13. Robustness across models
Self-supervised models
SimCLR variants (SimCLR-x1, SimCLR-x2,
SimCLR-x4) show strong generalisation
improvements on uniform noise, low contrast,
and high-pass images
This is quite remarkable given that SimCLR
models were trained on a different set of
augmentations (random crop with flip and
resize, colour distortion, and Gaussian blur)
→ Is the defining factor
objective function or
the choice of augmentations?
https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html
An illustration of SimCLR
humans, standard supervised CNNs, self-supervised models,
adversarially trained models, vision transformers, noisy student, BiT, SWSL, CLIP
13
14. objective function vs. the choice of augmentations?
triangles: self-supervised models
stars: supervised baselines
blue: ResNet x1
green: ResNet x4
red diamonds: human
Self-supervised models vs.
Augmentation-matched supervised baseline models
Augmentations:
random crop with flip and resize
colour distortion
Gaussian blur
The augmentation scheme (rather than the
self-supervised objective) indeed made the
crucial difference:
augmentation-matched supervised baselines
show just the same generalisation behaviour.
15
15. Robustness across models
Adversarially trained models
The stronger the model is trained
adversarially (darker shades of blue), the
more susceptible it becomes to (random)
image degradations.
A simple rotation by 90 degrees leads to a
50% drop in classification accuracy.
Adversarial robustness seems to come
at the cost of increased vulnerability to
large-scale perturbations.
Adversarial examples (Goodfellow et al., 2014)
humans, standard supervised CNNs, self-supervised models,
adversarially trained models, vision transformers, noisy student, BiT, SWSL, CLIP
16
16. Adversarial training increases shape bias
There is a relationship between shape bias and the degree of adversarial training
𝑠ℎ𝑎𝑝𝑒 𝑏𝑖𝑎𝑠 =
%&''(%) 𝒔𝒉𝒂𝒑𝒆 $(%/0/&10
%&''(%) 𝒔𝒉𝒂𝒑𝒆 $(%/0/&10 7%&''(%) 𝒕𝒆𝒙𝒕𝒖𝒓𝒆 $(%/0/&10
shape texture
Stimuli: texture-shape cue conflict images
Made by Style Transfer (Gatys et al., 2016)
1,200 images
(16 classes, 75 images per shape label)
URL: https://github.com/rgeirhos/texture-vs-shape/tree/master/stimuli/style-transfer-preprocessed-512 (Geirhos et al., 2019) 17
17. Robustness across models
Vision transformers
The best vision transformer (ViT-L
trained on 14M images) even exceeds
human OOD accuracy
Vision transformers trained on 1M
images (light green) are already better
than standard convolutional models
Higher shape bias (The results are not shown here)
c.f. Tuli et al. (2021), Naseer et al. (2021)
(Dosovitskiy et al., 2020, arXiv:2010.11929)
↓ : trained on large-scale datasets
humans, standard supervised CNNs, self-supervised models,
adversarially trained models, vision transformers, noisy student, BiT, SWSL, CLIP
19
18. Robustness across models
Standard models trained on
more data: BiT-M, SWSL, Noisy
Student
The biggest effect on OOD
robustness simply comes from
training on larger datasets, not from
advanced architectures
BiT: Big Transfer
(Kolesnikov et al., 2019, arXiv:1912.11370)
ResNet152x4: # neurons of each layer is 4x
BiT-M: trained on ImageNet-21k
SWSL: Semi-Weakly Supervised Learning
(Yalniz et al., 2019, arXiv:1905.00546)
Trained on 940M
Noisy Student
(Xie et al., 2020, arXiv:1911.04252)
Trained on 300M
↓ : trained on large-scale datasets
humans, standard supervised CNNs, self-supervised models,
adversarially trained models, vision transformers, noisy student, BiT, SWSL, CLIP
20
19. Robustness across models ↓ : trained on large-scale datasets
CLIP
“Special”
- more data:
trained on 400M
- novel objective:
joint language-image supervision
- non-standard architecture:
a vision transformer backbone
The most human-like model across
all of metrics
humans, standard supervised CNNs, self-supervised models,
adversarially trained models, vision transformers, noisy student, BiT, SWSL, CLIP
(Radford et al., 2021, arXiv:2103.00020) 21
20. Error consistency between models (“sketch” dataset)
Do they make errors on the same individual images?
(“sketch” dataset)
A standard supervised model, a self-supervised model,
an adversarially trained model or a vision transformer, all
those models make highly systematic errors
Humans show a very different pattern of errors
The boundary between humans and some data-rich
models, especially CLIP (400M images) and SWSL
(940M), is blurry, making more human-like errors than
standard models.
Error consistency analysis on a single dataset, ”sketch” (for other datasets see the original paper, Figures 9, 11, 12, 13, 14)
22
21. Error consistency between models (17 OOD datasets)
We can see that data-rich models approach human-to-human observed consistency, but not error consistency.
Observed consistency is not a good measure of image-level consistency since it does not take consistency by
chance into account; error consistency tracks whether there is consistency beyond chance.
We see that there is still a substantial image-level consistency gap between human and machine vision.
However, several models improve over vanilla CNNs, especially BiT-M (trained on 14M images) and CLIP
(400M images).
23
22. Error consistency aggregated over multiple datasets
humans
standard supervised CNNs
self-supervised models
adversarially trained models
vision transformers
noisy student
BiT
SWSL
CLIP
(sketch, silhouette, edge, cue conflict, low-pass)
○ : convolutional
▽: vision transformer
◇: human
OOD accuracy is a near-perfect predictor of image-level
consistency; especially data-rich models (e.g. CLIP, SWSL,
BiT) narrow the consistency gap to humans
Training on large-scale datasets leads to considerable
improvements along both architectures, convolutional and
vision transformer.
(stylized, colour/greyscale, contrast, high-pass, phase-scrambling, power-
equalisation, false colour, rotation, eidolonI, -II and -III as well as uniform noise),
The human-machine gap is large; here, more robust
models do not show improved error consistency
24
23. Error consistency
l It remains an open question why the training dataset appears to have the most important impact
on a model’s decision boundary as measured by error consistency (as opposed to other
aspects of a model’s inductive bias).
l Datasets contain various shortcut opportunities (Geirhos et al. 2020), and if two different models
are trained on similar data, they might converge to a similar solution simply by exploiting the
same shortcuts. Making models more flexible (such as transformers, a generalisation of CNNs)
wouldn’t change much in this regard
l What affects error consistency?
Dataset vs. Architecture, Flexibility vs. Constraints →Next slide
25
24. Error consistency
Error consistency between two identical models trained on very different datasets,
such as ImageNet vs. Stylized-ImageNet (Geirhos et al., 2019), is much lower than
error consistency between very different models (ResNet-50 vs. VGG-16) trained on
the same dataset.
Dataset vs. Architecture
Flexibility vs. constraints
Error consistency between ResNet-50 and a highly flexible model (e.g., a vision
transformer) is much higher than error consistency between ResNet-50 and a highly
constrained model like BagNet-9 (Brendel and Bethge, 2019, arXiv:1904.00760)
BagNet
The models extract features
from small image patches
Stylized-ImageNet (SIN)
“texture-less dataset”
→ Dataset is more important
→ Flexibility is more important
26
25. Summary
l While self-supervised and adversarially trained models lack OOD robustness,
models based on vision transformers and/or trained on large-scale datasets now
match or exceed human performance on most datasets
l The OOD robustness gap between human and machine vision is closing, as the
best models now match or exceed human accuracies.
l At the same time, an image-level consistency gap remains, however, this gap that
is at least in some cases narrowing for models trained on large-scale datasets.
This evaluation is open-sourced as a benchmark to track future progress.
(https://github.com/bethgelab/model-vs-human/)
27