Professor Aboul Ella Hassanien 
Chair of the Scientific Research Group in Egypt (SRGE) 
http://www.egyptscience.net 
Dean of faculty of computers and information – Bein Suef University 
Web site: http://www.fci.cu.edu.eg/~abo/ 
Face book: http://www.facebook.com/profile.php?id=100000780092307 
Research gate: http://www.researchgate.net/home.Home.html 
CU scholar: http://scholar.cu.edu.eg/abo
Agenda 
• Scientific Research Group in Egypt (SRGE) 
• SRGE Trends AND Directions 
 Information and Network Security (Geospatial Data 2D/3D)) 
 Biomedical Informatics (Biomedical/Bioinformatics) 
 Intelligent Technology for blind and deaf people 
 Intelligent Environment and Control System 
 Data mining, graph mining and Social Networks (image and 
data Registration/fusion in remote sensing) 
• Big Data Set and Complex System 
• Data Mining and Intelligent systems 
• Open Discussion
Scientific Research Group in Egypt (SRGE)
Scientific Research Group in Egypt (SRGE) 
Members 
• 1 Professor 
• 15 Assistant Professors 
• 20 Ph.D students 
• 25 M. Sc. students 
• 50 International collaborative 
researchers from 15 countries 
• 10 undergraduate student 
50 
45 
40 
35 
30 
25 
20 
15 
10 
5 
0 
SRGE member numbers. 
no. 
20 Faculties and institutes
Scientific Research Group in Egypt (SRGE) 
Objective 
• To encourage and make it easy for the Egyptian young researchers to 
cooperate and increase their contribution in academic research. 
• To integrate the various research efforts of the scientific team to be a source 
of innovation on possible scientific, technological and socio-economic 
trajectories to mould the future of machine intelligence technologies and 
applications. 
• To produce Master/PhD graduates: 
 Who can conduct high quality academic research, 
 Who can publish their research in high quality academic journals, 
 Who can obtain tenure track faculty positions at high ranking research universities, 
 Who are good teachers, and more generally who are good academics
Scientific Research Group in Egypt (SRGE) 
Research map
Scientific Research Group in 
Egypt (SRGE) 
Publications (2013-2014( 
• 2013: more than 100 publications 
▫ 32 (ISI) Journal papers 
 Elsevier AND Springer and other prestigious 
Journals 
▫ 60 International Conferences 
 IEEE/Springer 
▫ Book Chapter 
 10 book chapters (Springer) 
▫ Editing Book 
 Five (Springer) 
▫ Editing Proceeding 
 One 
▫ Special issues 
 THEE
Scientific Research Group in Egypt (SRGE) 
SRGE research tracks (2013-2014) 
• Track-(1) Network and information security 
• Track-(2) Biomedical eng. & Bioinformatics 
• Track- (3) Intelligent environment and applications 
• Track- (4) Iintelligent technology for disable people 
• Track- (5) Chem(o)informatics 
• Track- (6) Social networks/ Big Data and graph mining
Scientific Research Group in 
Egypt (SRGE) 
Research tracks 
• Track-I Network and Information Security 
▫ Intrusion Detection System 
 (Machine Intelligence, Danger theory. AIS) 
▫ Cryptanalysis (Evolutionary optimization) 
▫ Image Authentication and Applications 
▫ Watermarking (vector and raster data) 
▫ Digital Signatures 
▫ Biometrics 
 Heart sound recognition 
 Face and Finger print 
 Gait processing 
Problems: 
- Heart Sound as a biometric 
- Watermarking (Vector data) 
- Image authentication 
- Asymmetric hash function 
- Multi-Biometric-based
Network and Information Security 
Heart Sound Recognition 
Biometric
Network and Information Security 
Blind Source Separation (ICA) 
Blind Source Separation (BSS) deals with the problem of separating 
independent sources from their observed mixtures only while both 
the mixing process and original sources are unknown. 
Blind Separation of Information from 
Galaxy Spectra 
0 50 100 150 200 250 300 350 
1.4 
1.2 
1 
0.8 
0.6 
0.4 
0.2 
0 
-0.2 
Early diagnosis of pathology in fetus
Network and Information Security 
Vector geo-spatial data /3D animated 
Watermarking 
Geospatial data 3D animated object 
• Geospatial data or geographic 
information is the data or information 
that identifies the geographic location 
of features and boundaries on Earth, 
such as natural or constructed features, 
oceans, and more.
Scientific Research Group in 
Egypt (SRGE) 
Research tracks 
• Track-II Biomedical Informatics 
▫ Medical image processing 
 Breast cancer analysis, (sonar, MRI, fMRI, CT) 
 Liver fibrosis and tumour analysis (biopsy, MRI, CT) 
 Medical image annotation 
• Bioinformatics 
Problems: 
- Breast Cancer Case 
- Liver Fibrosis – HCV 
- Content-based image retrieval 
- Formal Concept Analysis (visualize (rule based))
Track-II Biomedical Informatics 
Hepatitis C Virus in Egypt -HCV* 
 The World Health 
Organization has decleared 
hepatitis C a global health 
problem, with approximately 
3% of the world’s population 
(roughly 170-200 million 
people) infected with HCV. 
 Egypt has one of the highest 
prevalence rates of the C virus 
in the world 
 In Egypt the situation is quite 
worse. 
EGYPT: 14.7 % infected with Hepatitis C
Track-II Biomedical Informatics 
Liver Fibrosis 
 Stage 0 No fibrosis (fatty 
liver) 
 Stage 1 Portal expansion with 
fibrosis (<1/3 area) 
 Stage 2 Bridging fibrosis 
(>1/3) 
 Stage 3 Marked bridging 
fibrosis or early cirrhosis ( no 
reason for tissue conversion) 
 Stage 4 Definite cirrhosis 
(<50% of biopsy fibrosis) 
 Stage 5 Definite cirrhosis 
(>50% of biopsy fibrosis) 
Challenges: distinguish between the late fibrosis stage and tumor 
Good segmentation techniques/features-based/classifier/
Scientific Research Group in 
Egypt (SRGE) 
Research tracks 
• Track-III : Intelligent Environment 
• Intelligent Water/Air Quality Monitoring 
• Smart Reading Environments 
• Intelligent Lighting system 
• Video Processing 
 (Video annotation/summarization) 
Problems: 
- Monitoring Water/air Pollutions 
- Climate Change
Intelligent Environment Track 
Water Quality Monitoring
Intelligent environment and applications track 
Cattle identification 
• Identify the origin of each animal; 
• Trace the path of each animal from 
location to location; 
• Trace each animal exposed to disease; 
• Eradicate or control an animal health 
threat; 
• Retrieve information within hours of an 
outbreak and implement intervention 
strategies; 
• Enhance the safety and security of the 
food chain; 
• Improve consumer confidence; and, 
• Facilitate efficient market transactions as 
it provides assurance to buyers regarding 
the animals life history.
Arabian horse 
Track –III Intelligent environment and applications 
Arabian Horse identification using Iris pattern 
The Arabian horse is 
a breed of horse that 
originated on the Arabian 
Peninsula. It is one of the 
oldest breeds, dating back 
4,500 years. 
Recent developments in iris 
scanning have led to a new form of 
equine identification, and research 
has indicated that the horse's eye 
could be the most telling identifier.
Intelligent environment and applications 
Cattle Identification Using Muzzle Print Images
Scientific Research Group in 
Egypt (SRGE) 
Research tracks 
• Track-IV : The intelligent technology 
for blind and visual impaired people 
▫ Text to speech processing 
▫ Document management for blind and visual 
impairment people 
▫ Developing Games for blind and visual 
impairment people 
▫ Mobil applications for blind and visual 
impairment people 
▫ Automatic Sign Language (ASL) Recognition 
for Deaf-Blind people
The intelligent technology for blind and 
visual impaired people Track
• Tongue Drive System 
• For disabled people, 
technology may do more 
than just improve their 
lives - high-tech tools may 
give them life back. 
• Researchers from the Gergia 
Institute of Technology 
created the latest device, a 
mouth retainer that allows 
people with spinal cord ( إصابات 
في النخاع الشوكي )injuries to 
operate a computer and move 
an electric wheelchair with only 
their tongues
Thought-Controlled Wheelchair 
• Users wear a cap that can read 
brain signals. Those signals 
are then relayed to a brain 
scan electroencephalograph 
(EEG) on the wheelchair 
which are then analyzed by a 
computer program and sent to 
the wheelchair. Toyota said its 
next goal is to allow users to 
think about letters in order to 
spell words. 
Analysis brain signals?
Big Data in Complex System 
Data is the new oil
Simple to start 
• What is the maximum file size you 
have dealt so far? 
▫ Movies/Files/Streaming video that 
you have used? 
▫ What have you observed? 
• What is the maximum download 
speed you get? 
• Simple computation 
▫ How much time to just transfer.
What is big data? 
• 90% of the data in the world today has 
been created in the last two years alone. 
• This data comes from everywhere: 
▫ sensors used to gather climate 
information, 
▫ posts to social media sites, 
▫ digital pictures and videos, 
▫ Cell phone GPS signals to name a few. 
This data is “big data.”
Big Data Born 
• Google, eBay, LinkedIn, and 
Facebook were built around 
Big Data from the beginning. 
• No need to integrate Big Data 
with more traditional sources 
of data and the analytics 
performed upon them 
• No merging Big Data 
technologies with their 
traditional IT infrastructures 
• Big Data could stand alone, 
Big Data analytics could be the 
only focus of analytics
What is Big Data? 
• Big Data is a term applied to data 
sets whose size is beyond the ability 
of commonly used software tools to 
capture, manage, and process the 
data within a tolerable elapsed time. 
• Big Data sizes are a constantly 
moving target currently ranging 
from a few dozen terabytes to 
many petabytes of data in a single 
data set. –Wikipedia, October 2014 
(http://en.wikipedia.org/wiki/Big_da 
ta)
Huge amount of data 
• There are huge volumes of data in the 
world: 
+ From the beginning of recorded time 
until 2003, 
+We created 5 billion gigabytes 
(exabytes) of data. 
+ In 2011, the same amount was 
created every two days 
+ In 2013, the same amount of data is 
created every 10 minutes.
How much data? 
• Google processes 20 PB a day 
(2008) 
• Wayback Machine has 3 PB + 100 
TB/month (3/2009) 
• Facebook has 2.5 PB of user data + 
15 TB/day (4/2009) 
• eBay has 6.5 PB of user data + 50 
TB/day (5/2009)
Big data spans three dimensions: 
Volume, Velocity and Variety
Big data spans three dimensions: Volume, 
Velocity and Variety 
• Volume: 
▫ Enterprises are awash with ever-growing data of 
all types, easily amassing terabytes—even 
petabytes—of information. 
 Turn 12 terabytes of Tweets created each day 
into improved product sentiment analysis
Big data spans three dimensions: Volume, 
Velocity and Variety 
• Velocity: 
• Sometimes 2 minutes is too late. For time-sensitive processes 
such as catching fraud, big data must be used as it streams into 
your enterprise in order to maximize its value. 
▫ Analyze 500 million daily call detail records in real-time to 
predict customer churn faster 
The latest I have heard is 10 
Nano seconds delay is too 
much.
Big data spans three dimensions: Volume, 
Velocity and Variety 
• Variety: 
▫ Big data is any type of data - structured and unstructured data 
such as text, sensor data, audio, video, click streams, log files 
and more. New insights are found when analyzing these data 
types together. 
 Monitor 100’s of live video feeds from surveillance cameras 
to target points of interest 
 Exploit the 80% data growth in images, video and documents 
to improve customer satisfaction
Time for thinking 
• What do you do with the data. 
▫ Lets take an example: 
 “From application developers to video streamers, organizations 
of all sizes face the challenge of capturing, searching, analyzing, 
and leveraging as much as terabytes of data per second—too 
much for the constraints of traditional system capabilities and 
database management tools.”
Finally…. 
`Big- Data’ is similar to ‘Small-data’ but bigger 
.. But having data bigger it requires different 
approaches: 
Techniques, tools, architecture 
… with an aim to solve new problems 
Or old problems in a better way
What to do with these data? 
• Aggregation and Statistics 
▫ Data warehouse and OLAP 
• Indexing, Searching, and Querying 
▫ Keyword based search 
▫ Pattern matching (XML/RDF) 
• Knowledge discovery 
▫ Data Mining 
▫ Statistical Modeling
Data Mining
What is Data mining? 
• Data mining (knowledge 
discovery from data) 
▫ Extraction of 
interesting (non-trivial, 
implicit, previously 
unknown and 
potentially useful) 
patterns or knowledge 
from huge amount of 
data 
• Alternative names 
▫ Knowledge discovery 
(mining) in databases 
(KDD), knowledge 
extraction, business 
intelligence, etc.
42 
Why Not Traditional Data Analysis? 
• Huge amount of data 
▫ Algorithms must be highly scalable to handle such as 
Tera-bytes of data 
• High-dimensionality of data 
▫ Micro-array may have tens of thousands of dimensions 
• High complexity of data 
▫ Data streams and sensor data 
▫ Time-series data, temporal data, sequence data 
▫ Structure data, graphs, social networks and multi-linked data 
▫ Heterogeneous databases and legacy databases 
▫ Spatial, spatiotemporal, multimedia, text and Web data 
▫ Software programs, scientific simulations
43 
Multi-Dimensional View of Data Mining 
• Data to be mined 
▫ Relational, data warehouse, transactional, stream, object-oriented/ 
relational, active, spatial, time-series, text, multi-media, 
heterogeneous, legacy, WWW 
• Knowledge to be mined 
▫ Characterization, discrimination, association, classification, clustering, 
trend/deviation, outlier analysis, etc. 
▫ Multiple/integrated functions and mining at multiple levels 
• Techniques utilized 
▫ Database-oriented, data warehouse (OLAP), machine learning, statistics, 
visualization, etc. 
• Applications adapted 
▫ Retail, telecommunication, banking, fraud analysis, bio-data mining, stock 
market analysis, text mining, Web mining, etc.
44 
Data Mining: On What Kinds of Data? 
• Database-oriented data sets and applications 
▫ Relational database, data warehouse, transactional database 
• Advanced data sets and advanced applications 
▫ Data streams and sensor data 
▫ Time-series data, temporal data, sequence data (incl. bio-sequences) 
▫ Structure data, graphs, social networks and multi-linked data 
▫ Object-relational databases 
▫ Heterogeneous databases and legacy databases 
▫ Spatial data and spatiotemporal data 
▫ Multimedia database 
▫ Text databases 
▫ The World-Wide Web
45 
Data Mining Functionalities 
• Multidimensional concept description: Characterization and discrimination 
▫ Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet 
regions 
• Frequent patterns, association, correlation vs. causality 
▫ Diaper  Beer [0.5%, 75%] (Correlation or causality?) 
• Classification and prediction 
▫ Construct models (functions) that describe and distinguish classes or 
concepts for future prediction 
 E.g., classify countries based on (climate), or classify cars based on (gas 
mileage) 
▫ Predict some unknown or missing numerical values
46 
Data Mining Functionalities (2) 
• Cluster analysis 
▫ Class label is unknown: Group data to form new classes, e.g., cluster houses 
to find distribution patterns 
▫ Maximizing intra-class similarity & minimizing interclass similarity 
• Outlier analysis 
▫ Outlier: Data object that does not comply with the general behavior of the 
data 
▫ Noise or exception? Useful in fraud detection, rare events analysis 
• Trend and evolution analysis 
▫ Trend and deviation: e.g., regression analysis 
▫ Periodicity analysis 
▫ Similarity-based analysis 
• Other pattern-directed or statistical analyses
47 
Data Mining Functionalities (1) 
Basic Data Mining Tasks 
• Classification maps data into predefined 
groups or classes 
▫ Supervised learning 
▫ Pattern recognition 
▫ Prediction 
• Clustering groups similar data together into 
clusters. 
▫ Unsupervised learning 
▫ Segmentation 
▫ Partitioning
48 
Data Mining Functionalities (2) 
Basic Data Mining Tasks 
• Summarization maps data into subsets with 
associated simple descriptions. 
▫ Characterization 
▫ Generalization 
• Link Analysis uncovers relationships among 
data. 
▫ Affinity Analysis 
▫ Association Rules 
▫ Sequential Analysis determines sequential patterns.
49 
Architecture: Typical Data Mining System 
Graphical User Interface 
Pattern Evaluation 
Data Mining Engine 
Database or Data 
Warehouse Server 
data cleaning, integration, and selection 
Knowl 
edge- 
Base 
Database 
Data 
Warehouse 
World-Wide 
Web 
Other Info 
Repositories
50 
Similarity Measures 
Determine similarity between two objects.
51 
Similarity Measures 
Determine similarity between two objects.
52 
Distance Measures 
Measure dissimilarity between objects
53 
Example: Information Retrieval 
• Information Retrieval (IR): retrieving desired 
information from textual data. 
• Library Science 
• Digital Libraries 
• Web Search Engines 
• Traditionally keyword based 
• Sample query: 
Find all documents about “data mining”. 
DM: Similarity measures; 
Mine text/Web data.
54 
Information Retrieval (cont’d) 
Similarity: measure of how close a query is • 
to a document. 
• Documents which are “close enough” are 
retrieved. 
Metrics: • 
▫ Precision = |Relevant and retrieved| 
|Retrieved| 
▫ Recall = |Relevant and Retrieved| 
|Relevant|
Intelligent Systems 
Bio inspiring system 
Biologically inspired computing 
relies heavily on the fields of 
biology, computer science and 
mathematics. Recommender 
system
Artificial Immune system (AIS) 
• AIS are adaptive systems, inspired by theoretical 
immunology and observed immune functions, 
principles and models, which are applied to 
problem solving 
• Applications 
▫ Bioinformatics 
▫ Intrusion detection 
▫ Virus detection
Swarm Intelligent 
Definition: 
-is an artificial intelligence technique based around the study of 
collective behavior in decentralized, self-organized systems 
-SI systems are typically made up of a population of simple agents 
interacting locally with one another and with their environment. 
Goals: 
-performance optimization and robustness 
-self-organized control and cooperation (decentralized) 
-division of labour and distributed task allocation
Swarm Intelligent Techniques 
• Ant Colony Optimization (ACO) 
• Marriage in Honey Bees Optimization (MBO) 
• Particle Swarm Optimization (PSO). 
Fish Swarm school
Ant Colony Optimization 
• Ant Colony Optimization is an 
efficient method to finding 
optimal solutions to a graph 
• Using three algorithms based on 
choosing a city, updating 
pheromone trails and 
pheromone trail decay, we can 
determine an optimal solution to a 
graph 
• Ant Colony Optimization has 
been used to figure out solutions 
to real world problems, such as 
truck routing
Ant Colony Optimization (ACO)
Ant Colony Optimization Cont. 
• Many difficult optimization problems have been solved 
by so-called ant algorithms such as 
- The Traveling Salesman Problem. 
- The Quadratic Assignment Problem 
- Other hard optimization problems . 
• These different approaches all try to take advantage of 
how social insects seem to function.
Marriage in Honey Bees Optimization (MBO) 
Bees’ Comb
Marriage in Honey Bees Optimization Cont. 
The main processes in MBO are: 
(1) the mating flight of the queen bee with drones 
(2) the creation of new broods by the queen bee 
(3) the improvement of the broods' fitness by workers. 
(4) the adaptation of the workers' fitness 
(5) the replacement of the least fittest queen(s) with the fittest brood(s).
Particle Swarm Optimization (PSO). 
• PSO method is motivated from the simulation of social 
behavior of bird flocking and fish schooling
Particle Swarm Optimization Cont. 
• In PSO, each single solution is a "bird" in the search 
space. We call it "particle". 
• All of particles have 
▫ fitness values which are evaluated by the fitness 
function to be optimized, and 
▫ velocities which direct the flying of the particles. 
• The particles fly through the problem space by 
following the current optimum particles.
Swarm Intelligent Application 
• Swarm Robotics 
• Crowd simulation 
• Ant-based routing 
• Telecommunication (routing and congestion 
problems, intrudion detection) 
• Computer Animation 
• Electronic 
• Data Mining 
• Production control 
• Industrial Design
Swarm robotics (e.g.: Swarm-bots) 
• Collective task completion 
• No need for overly complex algorithms 
• Adaptable to changing environment
Communication Networks 
• Routing packets to 
destination in shortest time 
• Similar to Shortest Route 
• Statistics kept from prior 
routing (learning from 
experience)
Weeding 
Big Data and Data Mining 
Bio-inspired techniques
Support the Egyptian disabled people
Organize several workshops in the Egyptian universities 
More than 35 workshops in Egypt
Organize International conferences
The End of Science 
1015 bytes
Ant Colony Optimization 
• Ant Colony Optimization is an 
efficient method to finding 
optimal solutions to a graph 
• Using three algorithms based on 
choosing a city, updating 
pheromone trails and pheromone 
trail decay, we can determine an 
optimal solution to a graph 
• Ant Colony Optimization has 
been used to figure out solutions 
to real world problems, such as 
truck routing
What is covered in this class? 
• Some components of Intelligent systems are 
▫ human-like - they possess human-like expertise within 
a specific domain, 
▫ adaptable - they adapt themselves and learn to do 
better in a changing environment, and 
▫ explanations - they explain how they make decisions 
or take actions
What is Soft Computing? 
• Soft Computing is a field that currently includes 
• Fuzzy Logic 
• Neural Networks 
• Probabilistic Reasoning(Genetic Algorithms, BBN), and 
• Other related methodologies 
▫ Case-Based Reasoning 
• Soft Computing combines knowledge, techniques, 
and methodologies from the sources above to 
create intelligent systems
Case-Based Reasoning - 
A methodology of 
solving new problems 
by adapting the 
solutions of previous 
similar problems 
Models the way 
experts reason using 
their experience
Genetic Algorithms 
An optimization technique • 
10010110 
01100010 
10100100 
10011001 
01111101 
. . . 
. . . 
. . . 
. . . 
10010110 
01100010 
10100100 
10011101 
01111001 
. . . 
. . . 
. . . 
. . . 
Selection Crossover Mutation 
Current 
generation 
Next 
generation 
Elitism
Other Techniques - 
• Bayesian belief networks 
• represent and reason with probabilistic knowledge 
• Decision Trees 
• classification using tree structure 
• Least-squares estimator 
• statistical regression 
• Hybrid approaches 
• use multiple techniques
How does SC Relate to Other Fields 
What is an Expert System (ES)? 
User 
Knowledge 
Engineer 
Knowledge 
Acquisition 
KB 
rules 
facts 
Questions 
Responses 
Inference 
Engine
Soft Computing Characteristics 
Human Expertise (if-then rules, cases, conventional 
knowledge representations) 
Biologically inspired computing models (NN) 
New optimization techniques (GA, simulated annealing) 
Model-free learning (NN, CBR) 
Fault tolerance (deletion of neuron, rule, or case) 
Real-world applications (large scale with uncertainties)
Introduction 
 Remote sensing has 
 a huge amount of data 
 different spatial resolution for panchromatic and 
multispectral imagery 
 For the optimum benefit of these characteristics. It 
should be collected in a single image. 
 There is no single system offers spatial or multispectral 
resolution at the same time.
Introduction 
 Image fusion is used to combine multi-image information in 
one image which is more suitable to human vision or more 
adapt to further image processing analysis. 
 Recently, image fusion has become one of the focuses in 
image processing field
The Objective 
Introduces a remote sensing image fusion approach based • 
on a modified version of the Brovey transform and wavelets 
to reduce the spectral distortion in the Brovey transform 
and spatial distortion in the wavelet transform.

Data are the new oil: Big data, data mining and bio - inspiring techniques

  • 1.
    Professor Aboul EllaHassanien Chair of the Scientific Research Group in Egypt (SRGE) http://www.egyptscience.net Dean of faculty of computers and information – Bein Suef University Web site: http://www.fci.cu.edu.eg/~abo/ Face book: http://www.facebook.com/profile.php?id=100000780092307 Research gate: http://www.researchgate.net/home.Home.html CU scholar: http://scholar.cu.edu.eg/abo
  • 2.
    Agenda • ScientificResearch Group in Egypt (SRGE) • SRGE Trends AND Directions  Information and Network Security (Geospatial Data 2D/3D))  Biomedical Informatics (Biomedical/Bioinformatics)  Intelligent Technology for blind and deaf people  Intelligent Environment and Control System  Data mining, graph mining and Social Networks (image and data Registration/fusion in remote sensing) • Big Data Set and Complex System • Data Mining and Intelligent systems • Open Discussion
  • 3.
  • 4.
    Scientific Research Groupin Egypt (SRGE) Members • 1 Professor • 15 Assistant Professors • 20 Ph.D students • 25 M. Sc. students • 50 International collaborative researchers from 15 countries • 10 undergraduate student 50 45 40 35 30 25 20 15 10 5 0 SRGE member numbers. no. 20 Faculties and institutes
  • 5.
    Scientific Research Groupin Egypt (SRGE) Objective • To encourage and make it easy for the Egyptian young researchers to cooperate and increase their contribution in academic research. • To integrate the various research efforts of the scientific team to be a source of innovation on possible scientific, technological and socio-economic trajectories to mould the future of machine intelligence technologies and applications. • To produce Master/PhD graduates:  Who can conduct high quality academic research,  Who can publish their research in high quality academic journals,  Who can obtain tenure track faculty positions at high ranking research universities,  Who are good teachers, and more generally who are good academics
  • 6.
    Scientific Research Groupin Egypt (SRGE) Research map
  • 7.
    Scientific Research Groupin Egypt (SRGE) Publications (2013-2014( • 2013: more than 100 publications ▫ 32 (ISI) Journal papers  Elsevier AND Springer and other prestigious Journals ▫ 60 International Conferences  IEEE/Springer ▫ Book Chapter  10 book chapters (Springer) ▫ Editing Book  Five (Springer) ▫ Editing Proceeding  One ▫ Special issues  THEE
  • 8.
    Scientific Research Groupin Egypt (SRGE) SRGE research tracks (2013-2014) • Track-(1) Network and information security • Track-(2) Biomedical eng. & Bioinformatics • Track- (3) Intelligent environment and applications • Track- (4) Iintelligent technology for disable people • Track- (5) Chem(o)informatics • Track- (6) Social networks/ Big Data and graph mining
  • 9.
    Scientific Research Groupin Egypt (SRGE) Research tracks • Track-I Network and Information Security ▫ Intrusion Detection System  (Machine Intelligence, Danger theory. AIS) ▫ Cryptanalysis (Evolutionary optimization) ▫ Image Authentication and Applications ▫ Watermarking (vector and raster data) ▫ Digital Signatures ▫ Biometrics  Heart sound recognition  Face and Finger print  Gait processing Problems: - Heart Sound as a biometric - Watermarking (Vector data) - Image authentication - Asymmetric hash function - Multi-Biometric-based
  • 10.
    Network and InformationSecurity Heart Sound Recognition Biometric
  • 11.
    Network and InformationSecurity Blind Source Separation (ICA) Blind Source Separation (BSS) deals with the problem of separating independent sources from their observed mixtures only while both the mixing process and original sources are unknown. Blind Separation of Information from Galaxy Spectra 0 50 100 150 200 250 300 350 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 Early diagnosis of pathology in fetus
  • 12.
    Network and InformationSecurity Vector geo-spatial data /3D animated Watermarking Geospatial data 3D animated object • Geospatial data or geographic information is the data or information that identifies the geographic location of features and boundaries on Earth, such as natural or constructed features, oceans, and more.
  • 13.
    Scientific Research Groupin Egypt (SRGE) Research tracks • Track-II Biomedical Informatics ▫ Medical image processing  Breast cancer analysis, (sonar, MRI, fMRI, CT)  Liver fibrosis and tumour analysis (biopsy, MRI, CT)  Medical image annotation • Bioinformatics Problems: - Breast Cancer Case - Liver Fibrosis – HCV - Content-based image retrieval - Formal Concept Analysis (visualize (rule based))
  • 14.
    Track-II Biomedical Informatics Hepatitis C Virus in Egypt -HCV*  The World Health Organization has decleared hepatitis C a global health problem, with approximately 3% of the world’s population (roughly 170-200 million people) infected with HCV.  Egypt has one of the highest prevalence rates of the C virus in the world  In Egypt the situation is quite worse. EGYPT: 14.7 % infected with Hepatitis C
  • 15.
    Track-II Biomedical Informatics Liver Fibrosis  Stage 0 No fibrosis (fatty liver)  Stage 1 Portal expansion with fibrosis (<1/3 area)  Stage 2 Bridging fibrosis (>1/3)  Stage 3 Marked bridging fibrosis or early cirrhosis ( no reason for tissue conversion)  Stage 4 Definite cirrhosis (<50% of biopsy fibrosis)  Stage 5 Definite cirrhosis (>50% of biopsy fibrosis) Challenges: distinguish between the late fibrosis stage and tumor Good segmentation techniques/features-based/classifier/
  • 16.
    Scientific Research Groupin Egypt (SRGE) Research tracks • Track-III : Intelligent Environment • Intelligent Water/Air Quality Monitoring • Smart Reading Environments • Intelligent Lighting system • Video Processing  (Video annotation/summarization) Problems: - Monitoring Water/air Pollutions - Climate Change
  • 17.
    Intelligent Environment Track Water Quality Monitoring
  • 18.
    Intelligent environment andapplications track Cattle identification • Identify the origin of each animal; • Trace the path of each animal from location to location; • Trace each animal exposed to disease; • Eradicate or control an animal health threat; • Retrieve information within hours of an outbreak and implement intervention strategies; • Enhance the safety and security of the food chain; • Improve consumer confidence; and, • Facilitate efficient market transactions as it provides assurance to buyers regarding the animals life history.
  • 19.
    Arabian horse Track–III Intelligent environment and applications Arabian Horse identification using Iris pattern The Arabian horse is a breed of horse that originated on the Arabian Peninsula. It is one of the oldest breeds, dating back 4,500 years. Recent developments in iris scanning have led to a new form of equine identification, and research has indicated that the horse's eye could be the most telling identifier.
  • 20.
    Intelligent environment andapplications Cattle Identification Using Muzzle Print Images
  • 21.
    Scientific Research Groupin Egypt (SRGE) Research tracks • Track-IV : The intelligent technology for blind and visual impaired people ▫ Text to speech processing ▫ Document management for blind and visual impairment people ▫ Developing Games for blind and visual impairment people ▫ Mobil applications for blind and visual impairment people ▫ Automatic Sign Language (ASL) Recognition for Deaf-Blind people
  • 22.
    The intelligent technologyfor blind and visual impaired people Track
  • 23.
    • Tongue DriveSystem • For disabled people, technology may do more than just improve their lives - high-tech tools may give them life back. • Researchers from the Gergia Institute of Technology created the latest device, a mouth retainer that allows people with spinal cord ( إصابات في النخاع الشوكي )injuries to operate a computer and move an electric wheelchair with only their tongues
  • 24.
    Thought-Controlled Wheelchair •Users wear a cap that can read brain signals. Those signals are then relayed to a brain scan electroencephalograph (EEG) on the wheelchair which are then analyzed by a computer program and sent to the wheelchair. Toyota said its next goal is to allow users to think about letters in order to spell words. Analysis brain signals?
  • 25.
    Big Data inComplex System Data is the new oil
  • 27.
    Simple to start • What is the maximum file size you have dealt so far? ▫ Movies/Files/Streaming video that you have used? ▫ What have you observed? • What is the maximum download speed you get? • Simple computation ▫ How much time to just transfer.
  • 28.
    What is bigdata? • 90% of the data in the world today has been created in the last two years alone. • This data comes from everywhere: ▫ sensors used to gather climate information, ▫ posts to social media sites, ▫ digital pictures and videos, ▫ Cell phone GPS signals to name a few. This data is “big data.”
  • 29.
    Big Data Born • Google, eBay, LinkedIn, and Facebook were built around Big Data from the beginning. • No need to integrate Big Data with more traditional sources of data and the analytics performed upon them • No merging Big Data technologies with their traditional IT infrastructures • Big Data could stand alone, Big Data analytics could be the only focus of analytics
  • 30.
    What is BigData? • Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. • Big Data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. –Wikipedia, October 2014 (http://en.wikipedia.org/wiki/Big_da ta)
  • 31.
    Huge amount ofdata • There are huge volumes of data in the world: + From the beginning of recorded time until 2003, +We created 5 billion gigabytes (exabytes) of data. + In 2011, the same amount was created every two days + In 2013, the same amount of data is created every 10 minutes.
  • 32.
    How much data? • Google processes 20 PB a day (2008) • Wayback Machine has 3 PB + 100 TB/month (3/2009) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • eBay has 6.5 PB of user data + 50 TB/day (5/2009)
  • 33.
    Big data spansthree dimensions: Volume, Velocity and Variety
  • 34.
    Big data spansthree dimensions: Volume, Velocity and Variety • Volume: ▫ Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.  Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
  • 35.
    Big data spansthree dimensions: Volume, Velocity and Variety • Velocity: • Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. ▫ Analyze 500 million daily call detail records in real-time to predict customer churn faster The latest I have heard is 10 Nano seconds delay is too much.
  • 36.
    Big data spansthree dimensions: Volume, Velocity and Variety • Variety: ▫ Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.  Monitor 100’s of live video feeds from surveillance cameras to target points of interest  Exploit the 80% data growth in images, video and documents to improve customer satisfaction
  • 37.
    Time for thinking • What do you do with the data. ▫ Lets take an example:  “From application developers to video streamers, organizations of all sizes face the challenge of capturing, searching, analyzing, and leveraging as much as terabytes of data per second—too much for the constraints of traditional system capabilities and database management tools.”
  • 38.
    Finally…. `Big- Data’is similar to ‘Small-data’ but bigger .. But having data bigger it requires different approaches: Techniques, tools, architecture … with an aim to solve new problems Or old problems in a better way
  • 39.
    What to dowith these data? • Aggregation and Statistics ▫ Data warehouse and OLAP • Indexing, Searching, and Querying ▫ Keyword based search ▫ Pattern matching (XML/RDF) • Knowledge discovery ▫ Data Mining ▫ Statistical Modeling
  • 40.
  • 41.
    What is Datamining? • Data mining (knowledge discovery from data) ▫ Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data • Alternative names ▫ Knowledge discovery (mining) in databases (KDD), knowledge extraction, business intelligence, etc.
  • 42.
    42 Why NotTraditional Data Analysis? • Huge amount of data ▫ Algorithms must be highly scalable to handle such as Tera-bytes of data • High-dimensionality of data ▫ Micro-array may have tens of thousands of dimensions • High complexity of data ▫ Data streams and sensor data ▫ Time-series data, temporal data, sequence data ▫ Structure data, graphs, social networks and multi-linked data ▫ Heterogeneous databases and legacy databases ▫ Spatial, spatiotemporal, multimedia, text and Web data ▫ Software programs, scientific simulations
  • 43.
    43 Multi-Dimensional Viewof Data Mining • Data to be mined ▫ Relational, data warehouse, transactional, stream, object-oriented/ relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW • Knowledge to be mined ▫ Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. ▫ Multiple/integrated functions and mining at multiple levels • Techniques utilized ▫ Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. • Applications adapted ▫ Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
  • 44.
    44 Data Mining:On What Kinds of Data? • Database-oriented data sets and applications ▫ Relational database, data warehouse, transactional database • Advanced data sets and advanced applications ▫ Data streams and sensor data ▫ Time-series data, temporal data, sequence data (incl. bio-sequences) ▫ Structure data, graphs, social networks and multi-linked data ▫ Object-relational databases ▫ Heterogeneous databases and legacy databases ▫ Spatial data and spatiotemporal data ▫ Multimedia database ▫ Text databases ▫ The World-Wide Web
  • 45.
    45 Data MiningFunctionalities • Multidimensional concept description: Characterization and discrimination ▫ Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions • Frequent patterns, association, correlation vs. causality ▫ Diaper  Beer [0.5%, 75%] (Correlation or causality?) • Classification and prediction ▫ Construct models (functions) that describe and distinguish classes or concepts for future prediction  E.g., classify countries based on (climate), or classify cars based on (gas mileage) ▫ Predict some unknown or missing numerical values
  • 46.
    46 Data MiningFunctionalities (2) • Cluster analysis ▫ Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns ▫ Maximizing intra-class similarity & minimizing interclass similarity • Outlier analysis ▫ Outlier: Data object that does not comply with the general behavior of the data ▫ Noise or exception? Useful in fraud detection, rare events analysis • Trend and evolution analysis ▫ Trend and deviation: e.g., regression analysis ▫ Periodicity analysis ▫ Similarity-based analysis • Other pattern-directed or statistical analyses
  • 47.
    47 Data MiningFunctionalities (1) Basic Data Mining Tasks • Classification maps data into predefined groups or classes ▫ Supervised learning ▫ Pattern recognition ▫ Prediction • Clustering groups similar data together into clusters. ▫ Unsupervised learning ▫ Segmentation ▫ Partitioning
  • 48.
    48 Data MiningFunctionalities (2) Basic Data Mining Tasks • Summarization maps data into subsets with associated simple descriptions. ▫ Characterization ▫ Generalization • Link Analysis uncovers relationships among data. ▫ Affinity Analysis ▫ Association Rules ▫ Sequential Analysis determines sequential patterns.
  • 49.
    49 Architecture: TypicalData Mining System Graphical User Interface Pattern Evaluation Data Mining Engine Database or Data Warehouse Server data cleaning, integration, and selection Knowl edge- Base Database Data Warehouse World-Wide Web Other Info Repositories
  • 50.
    50 Similarity Measures Determine similarity between two objects.
  • 51.
    51 Similarity Measures Determine similarity between two objects.
  • 52.
    52 Distance Measures Measure dissimilarity between objects
  • 53.
    53 Example: InformationRetrieval • Information Retrieval (IR): retrieving desired information from textual data. • Library Science • Digital Libraries • Web Search Engines • Traditionally keyword based • Sample query: Find all documents about “data mining”. DM: Similarity measures; Mine text/Web data.
  • 54.
    54 Information Retrieval(cont’d) Similarity: measure of how close a query is • to a document. • Documents which are “close enough” are retrieved. Metrics: • ▫ Precision = |Relevant and retrieved| |Retrieved| ▫ Recall = |Relevant and Retrieved| |Relevant|
  • 55.
    Intelligent Systems Bioinspiring system Biologically inspired computing relies heavily on the fields of biology, computer science and mathematics. Recommender system
  • 56.
    Artificial Immune system(AIS) • AIS are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving • Applications ▫ Bioinformatics ▫ Intrusion detection ▫ Virus detection
  • 57.
    Swarm Intelligent Definition: -is an artificial intelligence technique based around the study of collective behavior in decentralized, self-organized systems -SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment. Goals: -performance optimization and robustness -self-organized control and cooperation (decentralized) -division of labour and distributed task allocation
  • 58.
    Swarm Intelligent Techniques • Ant Colony Optimization (ACO) • Marriage in Honey Bees Optimization (MBO) • Particle Swarm Optimization (PSO). Fish Swarm school
  • 59.
    Ant Colony Optimization • Ant Colony Optimization is an efficient method to finding optimal solutions to a graph • Using three algorithms based on choosing a city, updating pheromone trails and pheromone trail decay, we can determine an optimal solution to a graph • Ant Colony Optimization has been used to figure out solutions to real world problems, such as truck routing
  • 60.
  • 61.
    Ant Colony OptimizationCont. • Many difficult optimization problems have been solved by so-called ant algorithms such as - The Traveling Salesman Problem. - The Quadratic Assignment Problem - Other hard optimization problems . • These different approaches all try to take advantage of how social insects seem to function.
  • 62.
    Marriage in HoneyBees Optimization (MBO) Bees’ Comb
  • 63.
    Marriage in HoneyBees Optimization Cont. The main processes in MBO are: (1) the mating flight of the queen bee with drones (2) the creation of new broods by the queen bee (3) the improvement of the broods' fitness by workers. (4) the adaptation of the workers' fitness (5) the replacement of the least fittest queen(s) with the fittest brood(s).
  • 64.
    Particle Swarm Optimization(PSO). • PSO method is motivated from the simulation of social behavior of bird flocking and fish schooling
  • 65.
    Particle Swarm OptimizationCont. • In PSO, each single solution is a "bird" in the search space. We call it "particle". • All of particles have ▫ fitness values which are evaluated by the fitness function to be optimized, and ▫ velocities which direct the flying of the particles. • The particles fly through the problem space by following the current optimum particles.
  • 66.
    Swarm Intelligent Application • Swarm Robotics • Crowd simulation • Ant-based routing • Telecommunication (routing and congestion problems, intrudion detection) • Computer Animation • Electronic • Data Mining • Production control • Industrial Design
  • 67.
    Swarm robotics (e.g.:Swarm-bots) • Collective task completion • No need for overly complex algorithms • Adaptable to changing environment
  • 68.
    Communication Networks •Routing packets to destination in shortest time • Similar to Shortest Route • Statistics kept from prior routing (learning from experience)
  • 69.
    Weeding Big Dataand Data Mining Bio-inspired techniques
  • 71.
    Support the Egyptiandisabled people
  • 72.
    Organize several workshopsin the Egyptian universities More than 35 workshops in Egypt
  • 73.
  • 74.
    The End ofScience 1015 bytes
  • 75.
    Ant Colony Optimization • Ant Colony Optimization is an efficient method to finding optimal solutions to a graph • Using three algorithms based on choosing a city, updating pheromone trails and pheromone trail decay, we can determine an optimal solution to a graph • Ant Colony Optimization has been used to figure out solutions to real world problems, such as truck routing
  • 76.
    What is coveredin this class? • Some components of Intelligent systems are ▫ human-like - they possess human-like expertise within a specific domain, ▫ adaptable - they adapt themselves and learn to do better in a changing environment, and ▫ explanations - they explain how they make decisions or take actions
  • 77.
    What is SoftComputing? • Soft Computing is a field that currently includes • Fuzzy Logic • Neural Networks • Probabilistic Reasoning(Genetic Algorithms, BBN), and • Other related methodologies ▫ Case-Based Reasoning • Soft Computing combines knowledge, techniques, and methodologies from the sources above to create intelligent systems
  • 78.
    Case-Based Reasoning - A methodology of solving new problems by adapting the solutions of previous similar problems Models the way experts reason using their experience
  • 79.
    Genetic Algorithms Anoptimization technique • 10010110 01100010 10100100 10011001 01111101 . . . . . . . . . . . . 10010110 01100010 10100100 10011101 01111001 . . . . . . . . . . . . Selection Crossover Mutation Current generation Next generation Elitism
  • 80.
    Other Techniques - • Bayesian belief networks • represent and reason with probabilistic knowledge • Decision Trees • classification using tree structure • Least-squares estimator • statistical regression • Hybrid approaches • use multiple techniques
  • 81.
    How does SCRelate to Other Fields What is an Expert System (ES)? User Knowledge Engineer Knowledge Acquisition KB rules facts Questions Responses Inference Engine
  • 82.
    Soft Computing Characteristics Human Expertise (if-then rules, cases, conventional knowledge representations) Biologically inspired computing models (NN) New optimization techniques (GA, simulated annealing) Model-free learning (NN, CBR) Fault tolerance (deletion of neuron, rule, or case) Real-world applications (large scale with uncertainties)
  • 83.
    Introduction  Remotesensing has  a huge amount of data  different spatial resolution for panchromatic and multispectral imagery  For the optimum benefit of these characteristics. It should be collected in a single image.  There is no single system offers spatial or multispectral resolution at the same time.
  • 84.
    Introduction  Imagefusion is used to combine multi-image information in one image which is more suitable to human vision or more adapt to further image processing analysis.  Recently, image fusion has become one of the focuses in image processing field
  • 85.
    The Objective Introducesa remote sensing image fusion approach based • on a modified version of the Brovey transform and wavelets to reduce the spectral distortion in the Brovey transform and spatial distortion in the wavelet transform.