Dev Dives: Streamline document processing with UiPath Studio Web
Deep Learning Challenges of Complex Data Analysis
1. Department of Information Technology
Seminar presentation on
CHALLENGES AND OPPORTUNITIES OF ANALYSING
COMPLEX DATA USING DEEP LEARNING
By: Maru Kindeneh
To:Alemu.K
tibeyinmaru@gmail.com 1
2. Content
1. Introduction
2. Background
2.1. Machine learning and computer aided analysis
2.2. Different types of data
2.3. Feature engineering – Creating structured data from unstructured data
2.4. Deep learning
2.5. Challenges and open problems in deep learning
3. Problem specification
3.1. Related work (literature review )
3.2. Delimitation
4. Method
5. conclusion tibeyinmaru@gmail.com
2
3. 1. Introduction
The era of big data and data analysis is here.
Data generation has growing exponentially and the world has
started to go through a big data revolution
Before the big data revolution, a lot of effort were put in designing
different data collection schemes and surveys for data collection.
One of the main challenges with this type of data , is that the data
often are very heterogeneous and do not follow a predefined
structure.
The data can also be stored in different formats , have different
quality and granularity and come from many different sources ,
factors making it more difficult to analyse the data.
tibeyinmaru@gmail.com 3
4. Continue…
A research field that has shown promising results solving problems
arising due to this type of unstructured data from multiple sources is
deep learning.
Deep learning is a new sub-field of machine learning that has
revolutionized several fields such as image processing, speech
recognition and natural language processing.
How to initialize and train a deep learning model is often a non trivial
problem, that requires expert knowledge.
How to select, configure and train deep learning models is by many
still seen as a “black art”.
tibeyinmaru@gmail.com
5. 2. Background
Machine learning is sub-field within artificial intelligence (AI) that is
focused on how machines may learn and draw conclusions from data.
In this research there is a special focus on complex data. This type of
data are often hard to analyse with conventional ML methods.
To solve this problem, the data is often first manually crafted into a
new dataset with new features. This is called feature learning.
In the last few years, a new sub-field of machine learning that can
handle complex data without feature engineering has emerge. This
sub-field is called Deep learning (DL).
tibeyinmaru@gmail.com
6. 2.1. Machine Learning And Computer Aided Analysis
tibeyinmaru@gmail.com
6
Machine learning is a sub-field of computer science which aims to
make computers learn.
Thus, by presenting a lot of training samples to a machine, it would be
able to extrapolate knowledge from the observed data.
The machine would then be able to use the gained knowledge to draw
conclusions in new examples for acting as a support to humans for
decision processes.
7. 2.1.1.Supervised Learning
tibeyinmaru@gmail.com
7
Supervised learning can be regarded to be similar to the learning
process of a teacher teaching a student.
For example, include classification of images where a lot of images
and their corresponding classes are presented to the computer
With this generalized knowledge the computer can later on correctly
classify new images that where not presented during the training
phase.
8. 2.1.2. Unsupervised Learning
tibeyinmaru@gmail.com
In unsupervised learning, the computer tries to generalize the data and
to learn underlying patterns of it.
This is useful when dividing the data into different clusters or trying
to find samples with similar meanings.
8
9. 2.1.3. Feature Learning
In feature learning, machine learning algorithms are trained to
create new representations of the data.
Feature leaning can be conducted in both a supervised and
unsupervised manner.
The most common purpose for learning new representations of
data is to reduce the number of dimensions.
A second reason for learning new representations of the data is to
find a more interpretable representation of the data.
The methods use models such as Restricted Boltzmann Machines
(RBM), Deep Belief Networks (DBN) and Auto encoders (AE)
tibeyinmaru@gmail.com 9
10. 2.1.4. Multimodal Learning
In multimodal learning a model learns from data with multiple
modalities.
A classical approach to analyse multimodal data is through
multiple kernel learning.
This approach is beneficial in at least two ways:
1st this reduce the importance of the choice of kernel and hyper
parameters, since kernels with better performance will be given more
significance when the kernels are combined.
2nd benefit, is that different kernels may handle different formats of
inputs.
tibeyinmaru@gmail.com
10
11. 2.2. Different Types Of Data
tibeyinmaru@gmail.com
11
To avoid any further confusion, the following definitions of different
data types will be used:
1. Structured data:
Structured data are data that are structured in a tabular form.
Relational databases are often used to store structured data.
2. Unstructured data:
all data that cannot be stored in tabular form, where each row is
independent from all other rows, are unstructured.
3. Multi-levelled data:
Multi-levelled data is data that are measured with different
granularity
12. 2.2. Different Types Of Data…
4. Multimodal data:
Multimodal data are data concerning multiple and diverse modalities.
for example, both text and images are stored for the same instance are
multimodal
Another typical example of multimodal data is video streams, containing
both a sequence of images and audio
5. Complex data:
Complex data is the same as high dimensional data.
The complexity of a dataset may either be due to that the dataset “contain
many rows as well as many attributes” or that the dataset contains “non-
trivial interactions between attributes”.
tibeyinmaru@gmail.com
12
13. 2.3.Feature Engineering-Creating Structured
data From Unstructured Data
tibeyinmaru@gmail.com 13
Most machine learning algorithms require structured data, and thus
they can not be applied to unstructured data.
To bypass this problem, the machine learning algorithm can instead be
applied to a fixed set of features that are extracted from the raw data
which is called feature engineering.
Manual feature engineering often misses complex high ordered
dependencies between variables.
Due to this deep learning methods that are capable of automatic
feature creation and selection are of great utility.
14. Deep learning is a type of representation learning where the machine itself
learns several internal representations from raw data in order to perform
regression or classification.
4.1. Artificial Neural Networks
Artificial neural networks has been around for some time.
4.2. Feed Forward Neural Networks
A feed forward neural network is an artificial neural network where
information only moves in one direction.
4.3. Convolutional Neural Networks
Convolutional neural networks (CNNs) mainly used in image analysis,
some authors successfully use CNNs for natural language processing.
Basic features in a small area of an image can be analysed independently
of their position and the rest of the image
2.4.Deep Learning
tibeyinmaru@gmail.com
14
15. 2.4. Deep Learning…
4.4. Recurrent Neural Networks
Unlike feed forward networks which are a cyclical.
One big problem with RNN, which also occurs in deep feed forward
networks, is that the gradients in the backpropagation will either go
to zero or infinity.
4.5 Generative Adversarial Networks
The idea behind a GAN is that you have two artificial neural networks
that compete.
Generator, tries to generate examples following the same distribution as
the collected data.
Discriminator, tries to distinguish between examples that are generated
by the generator and data that are sampled from the real data
distribution.
tibeyinmaru@gmail.com
15
16. 2.5. Challenges And Open Problems In Deep Learning
tibeyinmaru@gmail.com
16
One of the biggest problems is that the gradient based learning used to
train the networks is computationally demanding.
The most obvious drawbacks are the loss of interpretability and the
difficulty of training the networks.
Another big challenge in the training of deep neural networks is the
large amount of hyper-parameters.
Even while deep learning methods are more flexible and easier to
modify than many classical methods there are still some limitations on
how they can be applied to data that are very heterogeneous and
complex.
17. 3. Problem Specification
Complex data is the same as high dimensional data that are heterogeneous.
Much of the research today within the field of deep learning (and machine
learning in general), are focused on developing new methods to analyse data.
However, it is seldom reflected upon how the type, quality and complexity of
the data affects the analysis.
The different types of data that should be investigated are:
Data that are structured as a graph.
Sequential data where both long and short time dependencies exist.
The initial focus in the study of data properties will be:
The granularity of the data, thus how good the measures in the data are.
Errors in the data and how they affect the result of an analysis.
The number of interacting agents or parts in the system that has generated the data.
Skewness of different classes in the data.
tibeyinmaru@gmail.com
17
18. 3. Problem Specification…
To conclude, the research questions of this work are:
Q1 What are complex data and how can it be defined in terms of
metrics?
Q2 What properties of complex data are problematic for the current
deep learning methods to handle and what is the reason for it.
Q3 How can the current deep learning methods be refined to handle
these properties of the data, or do new methods have to be
developed?
tibeyinmaru@gmail.com
19. 3.1. Literature review
Toubiana et al defines complex data ,That complex data are data
generated by complex interactions in the studied system.
Haken gives definition fro system complexity “Systems which are
composed of many parts, or elements, or components which may
be of the same or of different kinds. The components or parts may
be connected in a more or less complicated fashion.”
Haken states that: “The data to be collected often seem to be quite
inexhaustible. In addition it is often impossible to decide which
aspect to choose a priori, and we must instead undergo a learning
process in order to know how to cope with a complex system.”
tibeyinmaru@gmail.com
19
20. 3.2. Delimitations
This work will be focused on the analysis of complex data, using
deep learning methods, and will therefore only consider and
analyse models for such data.
tibeyinmaru@gmail.com
20
21. 4. Method
The first question posed in this research aims to finding a profound
definition of complex data, for the use within the field of data
analysis and deep learning.
The first step of this research will be to conduct a literature study,
to survey current definitions and opinions of complex data.
Several case studies will then be conducted to validate, refine and
consolidate the produced definition, as well as to further study the
properties of complex data and thus also answer the second
research question.
When a sufficient amount of case studies have been conducted, a
framework for generating data will be created.
tibeyinmaru@gmail.com
21
22. 5. Conclusion
Unlike data analysis just some decades ago, the analysis today does
not only comprise data that are stored in well organized tables.
Instead the data are much more diverse and may, for example,
consist of images or text. This implies the term complex data.
However, there are no profound definition for complex data and
this term is often used to highlight that an analysis of the data is
non-trivial.
A sub-field within machine learning that has shown promising
results analyzing complex data in the last years is deep learning.
Even though deep learning has been successful in many fields,
there are still several open problems that need to be solved.
tibeyinmaru@gmail.com
22