- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
This is a presentation that I gave to my research group. It is about probabilistic extensions to Principal Components Analysis, as proposed by Tipping and Bishop.
Here a Review of the Combination of Machine Learning models from Bayesian Averaging, Committees to Boosting... Specifically An statistical analysis of Boosting is done
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
This is a presentation that I gave to my research group. It is about probabilistic extensions to Principal Components Analysis, as proposed by Tipping and Bishop.
Here a Review of the Combination of Machine Learning models from Bayesian Averaging, Committees to Boosting... Specifically An statistical analysis of Boosting is done
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
Image classification is perhaps the most important part of digital image analysis. In this paper, we compare the most widely used model CNN Convolutional Neural Network , and MLP Multilayer Perceptron . We aim to show how both models differ and how both models approach towards the final goal, which is image classification. Souvik Banerjee | Dr. A Rengarajan "Hand-Written Digit Classification" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42444.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42444/handwritten-digit-classification/souvik-banerjee
A Survey of Spiking Neural Networks and Support Vector Machine Performance By...ijsc
In this paper we study the performance of Spiking Neural Networks (SNN)and Support Vector Machine (SVM) by using a GPU, model GeForce 6400M. Respect to applications of SNN, the methodology may be used for clustering, classification of databases, odor, speech and image recognition..In case of methodology SVM, is typically applied for clustering, regression and progression. According to particular characteristics of these methodologies,theycan be parallelizedin several grades. However, level of parallelism is limited to architecture of hardware. So, is very sure to get better results using other hardware with more computational resources. The different approaches are evaluated by the training speed and performance. On the other hand, some authors have coded algorithms SVM light, but nobody has programming QP SVM in a GPU. Algorithms were coded by authors in the hardware, like Nvidia card, FPGA or sequential circuits that depends on methodology used, to compare learning timewith between GPU and CPU. Also, in the survey we introduce a brief description of the types of ANN and its techniques of execution to be related with results of researching.
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
Visual Content Recognition has become an attractive research oriented field of computer vision and machine learning for the last few decades. The focus of this work is monument recognition. Imagesof significant locations captured and maintainedas data bases can be used by the travelers before visiting the places. They can use images of a famous building to know the description of the building. In all these applications, the visual content recognition plays a key role. Humans can learn the contents of the images and quickly identify them by seeing again. In this paper we present a constructive training algorithm for Multi-Layer Perceptron Neural Network (MLPNN) applied to a set of targeted object recognition applications. The target set consists of famous monuments in India for travel guide applications. The training data set (TDS) consists 3000 images. The Gist features are extracted for the images. These are given to the neural network during training phase.The mean square error (MSE) on the training data is computed and used as metric to adjust the weights of the neural network,using back propagation algorithm. In the constructive learning, if the MSE is less than a predefined value, the number of hidden neurons is increased. Input patterns are trained incrementally until all patterns of TDS are presented and learned. The parameters or weights obtained during the training phase are used in the testing phase, in which new untrained images are given to the neural network for recognition. If the test image is recognized, the details of the image will also be displayed. The performance accuracy of this method is found to be 95%
The field of Artificial Intelligence (AI) has been revitalized in this decade, primarily due to the large-scale application of Deep Learning (DL) and other Machine Learning (ML) algorithms. This has been most evident in applications like computer vision, natural language processing, and game bots. However, extraordinary successes within a short period of time have also had the unintended consequence of causing a sharp difference of opinion in research and industrial communities regarding the capabilities and limitations of deep learning. A few questions you might have heard being asked (or asked yourself) include:
a. We don’t know how Deep Neural Networks make decisions, so can we trust them?
b. Can Deep Learning deal with highly non-linear continuous systems with millions of variables?
c. Can Deep Learning solve the Artificial General Intelligence problem?
The goal of this seminar is to provide a 1000-feet view of Deep Learning and hopefully answer the questions above. The seminar will touch upon the evolution, current state of the art, and peculiarities of Deep Learning, and share thoughts on using Deep Learning as a tool for developing power system solutions.
Artificial Neural Networks: Applications In ManagementIOSR Journals
With the advancement of computer and communication technology, the tools used for management decisions have undergone a gigantic change. Finding the more effective solution and tools for managerial problems is one of the most important topics in the management studies today. Artificial Neural Networks (ANNs) are one of these tools that have become a critical component for business intelligence. The purpose of this article is to describe the basic behavior of neural networks as well as the works done in application of the same in management sciences and stimulate further research interests and efforts in the identified topics.
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...CSCJournals
The Internet paved way for information sharing all over the world decades ago and its popularity for distribution of data has spread like a wildfire ever since. Data in the form of images, sounds, animations and videos is gaining users’ preference in comparison to plain text all across the globe. Despite unprecedented progress in the fields of data storage, computing speed and data transmission speed, the demands of available data and its size (due to the increase in both, quality and quantity) continue to overpower the supply of resources. One of the reasons for this may be how the uncompressed data is compressed in order to send it across the network. This paper compares the two most widely used training algorithms for multilayer perceptron (MLP) image compression – the Levenberg-Marquardt algorithm and the Scaled Conjugate Gradient algorithm. We test the performance of the two training algorithms by compressing the standard test image (Lena or Lenna) in terms of accuracy and speed. Based on our results, we conclude that both algorithms were comparable in terms of speed and accuracy. However, the Levenberg- Marquardt algorithm has shown slightly better performance in terms of accuracy (as found in the average training accuracy and mean squared error), whereas the Scaled Conjugate Gradient algorithm faired better in terms of speed (as found in the average training iteration) on a simple MLP structure (2 hidden layers).
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Image Captioning Generator using Deep Machine Learningijtsrd
Technologys scope has evolved into one of the most powerful tools for human development in a variety of fields.AI and machine learning have become one of the most powerful tools for completing tasks quickly and accurately without the need for human intervention. This project demonstrates how deep machine learning can be used to create a caption or a sentence for a given picture. This can be used for visually impaired persons, as well as automobiles for self identification, and for various applications to verify quickly and easily. The Convolutional Neural Network CNN is used to describe the alphabet, and the Long Short Term Memory LSTM is used to organize the right meaningful sentences in this model. The flicker 8k and flicker 30k datasets were used to train this. Sreejith S P | Vijayakumar A "Image Captioning Generator using Deep Machine Learning" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42344.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42344/image-captioning-generator-using-deep-machine-learning/sreejith-s-p
This article aims to classify texts and predict the categories of occurrences, through the study of Artificial Intelligence models, using Machine Learning and Deep Learning for the classification of texts and analysis of predictions, suggesting the best option with the smallest error.
The solution was designed to be implemented in two stages: Machine Learning and Application, according to the diagram below from the Data Science Academy.
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS ijgca
This paper proposes the design of a Facial Expression Recognition (FER) system based on deep
convolutional neural network by using three model. In this work, a simple solution for facial expression
recognition that uses a combination of algorithms for face detection, feature extraction and classification
is discussed. The proposed method uses CNN models with SVM classifier and evaluates them, these models
are Alex-net model, VGG-16 model and Res-Net model. Experiments are carried out on the Extended
Cohn-Kanada (CK+) datasets to determine the recognition accuracy for the proposed FER system. In this
study the accuracy of AlexNet model compared with Vgg16 model and ResNet model. The result show that
AlexNet model achieved the best accuracy (88.2%) compared to other models.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
1. Machine Learning in Science and Industry
Day 4, lectures 7 & 8
Alex Rogozhnikov, Tatiana Likhomanenko
Heidelberg, GradDays 2017
2. Recapitulation: linear models
work with many features
bag-of-words for texts / images
SVM + kernel trick or add new features to introduce interaction between features
regularizations to make it stable
1 / 66
3. Recapitulation: factorization models
recommender systems
=< U V >
− R → min
learning representations
learning through the prism of interaction between
entities
R^u,i u,⋅ ⋅,i
known Ru,i
∑ [R^u,i u,i]
2
2 / 66
4. Recapitulation: dimensionality reduction
compressing data of large dimensionality
output is of small dimensionality and can be used efficiently by ML techniques
expected to keep most of the information in an accessible form
3 / 66
5. Multilayer Perceptron
feed-forward
function which consists of sequentially applying
linear operations
nonlinear operations (activations)
appropriate loss function + optimization
in applications — one or two hidden layers
4 / 66
6. Tabular data approach
In rare decays analysis:
momenta, energies, IPs,
particle identification, isolation variables, flight
distances, angles ...
→
machine
learning
→
signal or
background
In a search engine
How many words from query in the document
(url, title, body, links...), TF-IDFs, BM25,
number of links to / from a page, geodistance,
how frequenly a link was clicked before,
estimates of fraud / search spam, ...
→
machine
learning
→
real-valued
estimation of
relevance
This approach works well enough in many areas.
Features can be either computed manually or also be estimated by ML (stacking). 5 / 66
7. When it didn't work (good enough)?
feature engineering → ML is a widespread
traditional pipeline
which features help to identify a drum on the
picture?
a car?
an animal?
in the sound processing
which features are good to detect a particular phoneme?
6 / 66
8. Image recognition today
can classify objects on images with very
high accuracy
even if those are images of 500 x 500 and
we have only 400 output neurons, that's
500 ∗ 500 ∗ 400 = 100 000 000
parameters
and that's only linear model!
1. can we reduce the number of parameters?
2. is there something we're missing about
image data when treating picture as a
vector?
′ ′
7 / 66
10. Pooling operation
Another important element for image processing is pooling.
Pooling should reduce size of image representation, but keep things important for further
analysis.
In particular, max-pooling (shown on the picture) is very popular.
9 / 66
15. Problems with deep architectures
In deep architectures, the gradient varies between
parameters drastically (10 − 10 times difference,
this is not a limit).
This problem usually referred to as 'gradient
diminishing'.
Adaptive versions of SGD are used today.
3 6
14 / 66
16. Deep learning (as it was initially thought of)
Deep Belief Network (Hinton et al, 2006) was one of the first ways to
train deep structures.
trained in an unsupervised manner layer-by-layer
next layer should "find useful dependencies" in previous layer
output
finally, a classification done using output of last layer
(optionally, minor fine-tuning of the network is also done)
Why this is important?
we can recognize objects and learn patterns without supervision
gradient backpropagation in the brain??
15 / 66
17. Deep learning (as it was initially thought of)
Deep Belief Network (Hinton et al, 2006) was one of the first ways to
train deep structures.
trained in an unsupervised manner layer-by-layer
next layer should "find useful dependencies" in previous layer
output
finally, a classification done using output of last layer
(optionally, minor fine-tuning of the network is also done)
Why this is important?
we can recognize objects and learn patterns without supervision
gradient backpropagation in the brain??
Practical? No, today deep learning relies solely on stochastic gradient optimization
16 / 66
18. Deep learning today
automatic inference of helpful intermediate representations ...
with deep architecture
mostly, trained altogether with gradient-based methods
17 / 66
19. Inference of high-level information
Ability of neural networks to extract
high-level features can be demonstrated
with artistic style transfer.
Image is smoothly modified to preserve
the activations on some convolutional
layer.
18 / 66
22. AlphaGo
too many combinations to check
no reliable way to assess the quality of position
AlphaGo has 2 CNNs:
value network predicts probability to win
policy network predicts the next step of a player
both are given the board (with additional features
for each position on the board)
21 / 66
23. Applications of CNNs
Convolutional neural networks are top-performing on the
image and image-like data
image recognition and annotation
object detection
image segmentation
segmentation of neuronal membranes
22 / 66
24. Pitfalls of applying machine
learning
CNNs were applied to jet substructure analyses and
shown promising results on simulated data, but a
simple study shows model applied to different
simulation generators output has up to twice higher
error (FPR)
thus, learnt dependencies are simulation-specific
and may turn out to be useless
machine learning uses all dependencies it finds
useful, those may have no relation to real physics
there are approaches to domain adaptation,
but their applicability is disputable
23 / 66
25. Autoencoding
compressed data is treated as a result of an algorithm
trained representations can be made stable against different noise
24 / 66
26. Autoencoding
Comparison of PCA (left) and autoencoder with 2 neurons in the middle layer (right)
You can also play with it
25 / 66
28. An example with fonts
autoencoder trained to reconstruct symbols from different fonts can generate new fonts
when walking along the latent space
27 / 66
29. Gaze correction (demo)
shift correction + light correction applied
training data was collected specially for this task
28 / 66
31. Recapitulation
analysis of tabular data
convolutional neural networks
and their applications to images and image-like data
many parameters
specific optimization methods
... amount of data played crucial role
autoencoding
30 / 66
32. Sequences and Recurrent NN
sound, text
any information that is time dependent (daily revenues, stocks prices... )
Inspiration: brain contains information about past and is able to process the information
given in sequences.
31 / 66
33. LSTM: a popular recurrent unit
Those are all tensor operations, see post for details
32 / 66
34. Character-level text generation
Given a text, network predicts next character. Then this network is used to generate a text
character-by-character. More examples
33 / 66
35. Code generation
Network trained on the linux source:
/*
* Increment the size file of the new incorrect UI_FILTER group information
* of the size generatively.
*/
static int indicate_policy(void)
{
int error;
if (fd == MARN_EPT) {
/*
* The kernel blank will coeld it to userspace.
*/
if (ss->segment < mem_total)
unblock_graph_and_set_blocked();
else
ret = 1;
goto bail;
}
segaddr = in_SB(in.addr);
selector = seg / 16;
setup_works = true;
for (i = 0; i < blocks; i++) {
seq = buf[i++];
bpf = bd->bd.next + i * search;
if (fd) {
current = blocked;
}
}
rw->name = "Getjbbregs";
bprm_self_clearl(&iv->version);
regs->new = blocks[(BPF_STATS << info->historidac)] | PFMR_CLOBATHINC_SECONDS << 12;
return segtable;
}
34 / 66
36. Style in RNNs (demo)
Position of the pen in the next moment is predicted.
35 / 66
37. Different strategies for prediction
one can predict a single observation from sequence
or map sequence to sequence
36 / 66
38. Applications of Recurrent NNs
natural language processing
translation
speech recognition
sound record is a sequence!
and speech synthesis
demand forecasting in retail systems / stock prices forecasting
37 / 66
39. Mixing RNN and CNN
Sometimes you need both to analyze dependencies in space and time. For example, in
weather forecasting a sequence of photos from satellites can be used to predict snow/rain
38 / 66
40. Why looking at sequence of images?
Animation from CIMSS Satellite Blog
39 / 66
41. (Word) Embeddings
I would like a ? of coffee.
Skip-gram model takes a context around the word:
1 word to the left and one to the right:
a ? of
2 words to the left and right
like a ? of coffee
When neural networks are trained to predict missing
word, at the first stage each word is associated with (trainable) vector — representation.
If model can predict the word well, it already learnt something very useful about the
language. 40 / 66
42. Embeddings
Capitals-countries example from Word2Vec Comparative-superlative example from GloVe
Similar (synonymic) words have very close representations.
Embeddings are frequently used to build representations e.g. of text to extract information
that can be used by various ML techniques. 41 / 66
43. RNN + Embeddings
A problem of pure factorization is the cold start:
until there is some feedback about a song, you can't
infer representation, thus can't recommend it.
To find similar tracks using the track itself:
RNN is trained to convert sound to embedding
optimization goal: embeddings of known similar
tracks should be closer
This makes it possible to recommend new tracks
that have no feedback.
42 / 66
44. Joint Embeddings
Embeddings can be built for images,
sounds, words, phrases...
We can use one space to fit them!
In this example embedding of images and
words are shown.
'cat' and 'truck' categories were not
provided in the training.
43 / 66
45. How can we generate new images? (image source)
44 / 66
46. Generative adversarial networks
Discriminator and Generator are working together
Generator converts noise to image
Discriminator distinguishes real images from
generated
Log-loss is used, D(x) is a probability that image is
fake
L = E log(1 − D(x)) + E log D(G(noise))
But the optimization is minimax:
L →
x−real noise
G
min
D
max
45 / 66
47. Detecting cosmic rays
cosmic rays coming from space have huge
energies
we don't know their source
those can be detected
need to cover large areas with detectors
46 / 66
48. Detecting cosmic rays
cosmic rays coming from space have huge
energies
we don't know their source
those can be detected
need to cover large areas with detectors
CRAYFIS idea
smartphone camera is a detector
a problem is to simulate well camera response for the particles
Materials were taken from slides (in russian) by Maxim Borisyak
47 / 66
49. Transforming adversarial networks
We can 'calibrate' the simulation with adversarial networks. That's real data images:
Input is not a noise, but output of a simulator
simulator output 1 epoch 2 epochs 3 epochs 48 / 66
50. Neural networks summary
NNs are very popular today and are a subject of intensive research
differentiable operations with tensors provide a very good basis for defining useful
predicting models
Structure of a problem can be effectively used in the structure of the model
DL requires much data and many resources that became available recently
and also numerous quite simple tricks to train it better
NNs are awesome, but not guaranteed to work well on problems without specific
structure
49 / 66
52. Motivation
some algorithms have many hyperparameters
(regularizations, depth, learning rate, min_samples_leaf, ...)
not all the parameters are guessed
checking all combinations takes too much time
How to optimize this process? Maybe automated tuning?
51 / 66
53. Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
52 / 66
54. Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
Problems
no gradient with respect to parameters
noisy results
function reconstruction is a problem
53 / 66
55. Finding optimal hyperparameters
randomly picking parameters is a partial solution
definitely much better than checking all combinations
given a target optimal value we can optimize it
Problems
no gradient with respect to parameters
noisy results
function reconstruction is a problem
Practical note: Before running grid optimization make sure your metric is stable (i.e. by
train/testing on different subsets).
Overfitting (=getting too optimistic estimate of quality on a holdout) by using many
attempts is a real issue, resolved by one more holdout. 54 / 66
56. Optimal grid search
stochastic optimization (Metropolis-Hastings, annealing)
requires too many evaluations, using only last checked combination
regression techniques, reusing all known information
(ML to optimize ML!)
55 / 66
57. Optimal grid search using regression
General algorithm (point of grid = set of parameters):
1. evaluations at random points
2. build regression model based on known results
3. select the point with best expected quality according to trained model
4. evaluate quality at this point
5. Go to 2 (until complete happiness)
Question: can we use linear regression in this case?
56 / 66
58. Optimal grid search using regression
General algorithm (point of grid = set of parameters):
1. evaluations at random points
2. build regression model based on known results
3. select the point with best expected quality according to trained model
4. evaluate quality at this point
5. Go to 2 (until complete happiness)
Question: can we use linear regression in this case?
Exploration vs. exploitation trade-off: should we try explore poorly-covered regions or try
to enhance currently seen to be optimal?
57 / 66
59. Gaussian processes for regression
Some definitions: Y ∼ GP(m, K), where m and K are functions of mean and covariance:
m(x), K(x, )
m(x) = EY (x) represents our prior expectation of quality (may be taken zero)
K(x, ) = EY (x)Y ( ) represents influence of known results on the expectation of
values in new points
RBF kernel is used here too: K(x, ) = exp −
Another popular choice: K(x, ) = exp −
We can model the posterior distribution of results in each point.
x~
x~ x~
x~ ( h2
∣∣x− ∣∣x~ 2
)
x~ ( h
∣∣x− ∣∣x~
)
58 / 66
61. Active learning with gaussian processes
Gaussian processes model posterior distribution at each point of the grid
thus, regions with well-estimated quality are known
and we are able to find regions which need exploration (those have large variance)
See also this demo.
60 / 66
62. Other applications of gaussian processes
Gaussian processes are good at black-box optimization when computations are costly:
Optimizing parameters of Monte-Carlo simulations in HEP
the goal is to decrease mismatch with real data
Optimizing airfoil of an airplane wing
to avoid testing all configurations, gaussian
processes are used
gaussian processes provide a surrogate model
which encounters two sources:
high-fidelity, wind tube tests
low-fidelity, simulation
61 / 66
63. Active learning
optimization of hyperparameters is a major usecase
can deal with optimization of noisy non-continuous functions
in a broader sense, active learning deals with situations when system can make requests
for new information
production systems should adapt dynamically to new situations
new trends / new tracks / new pages
...this user didn't like classical music a year ago, but what if...
active learning is a way to include automated experimenting in machine learning
62 / 66
65. Summary of the lectures
data arrives today in huge amounts
there is much interest in obtaining useful information from it
both in science and in industry
problems of interest become more and more complex
64 / 66
66. Summary of the lectures
data arrives today in huge amounts
there is much interest in obtaining useful information from it
both in science and in industry
problems of interest become more and more complex
machine learning techniques become more sophisticated and more powerful
and influence our world
65 / 66