TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!2
ARTIFICIAL INTELLIGENCE AND
DATA STREAM MINING
ALBERT BIFET
15 FEVRIER 2018
#Futur&Ruptures
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!3
1. ARTIFICIAL INTELLIGENCE
2. ARTIFICIAL INTELLIGENCE
CHALLENGES
3. MACHINE LEARNING FOR DATA
STREAMS
4. OPEN SOURCE TOOLS
5. SUMMARY
SOMMAIRE
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!4
ARTIFICIAL
INTELLIGENCE
• Big Data
• Internet of Things
• Data Science
• Artificial Intelligence
Artificial Intelligence is the new Electricity
real time analytics
2
What is AI?
• Artificial intelligence
(AI) is an area of
computer science
that emphasizes the
creation of intelligent
machines”.
9
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!5
ARTIFICIAL
INTELLIGENCE
CHALLENGES
Artificial Intelligence
3
Artificial Intelligence
5
European AI
3
Big Data
• GAFAM: Google, Apple, Facebook, Amazon,
Microsoft
• Personal Information
• Google, Facebook, Twitter, Linkedin,..
• All personal communications in Europe are
managed by non-European companies
3
4
AI Systems
• According to Nikola Kasabov, AI systems should exhibit the
following characteristics:
• Accommodate new problem solving rules incrementally
• Adapt online and in real time
• Are able to analyze itself in terms of behavior, error and
success.
• Learn and improve through interaction with the environment
(embodiment)
• Learn quickly from large amounts of data (Big Data)
• Have memory-based exemplar storage and retrieval capacities
• Have parameters to represent short and long term memory,
age, forgetting, etc.
2
AI Challenges
• La vision de la France doit donc consister à
développer simultanément une IA plus verte et
une IA au service de la transition écologique. 
5
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!6
MACHINE LEARNING
FOR DATA STREAMS
Machine Learning
• Machine learning is a type of
artificial intelligence (AI) that
provides computers with the
ability to learn without being
explicitly programmed.
• Machine learning focuses on
the development of computer
programs that can teach
themselves to grow and
change when exposed to new
data.
3
Analytic Standard Approach
Finite training sets

Static models
11
Data Set
Model
Classifier Algorithm
builds Model
Data Stream Approach
Infinite training sets

Dynamic models
12
D
M
Update
Model
D
M
D
M
D
M
D
M
D
M
D
M
D
M
D
M
D
M
D
M
D
M
Importance$of$O
•  As$spam$trends$change
retrain$the$model$with
Pain Points
• Need to retrain!
• Things change over time
• How often?
• Data unused until next
update!
• Value of data wasted
13
IoT Stream Mining
• Maintain models online
• Incorporate data on the fly
• Unbounded training sets
• Resource efficient
• Detect changes and adapts
• Dynamic models
14
Approximation Algorithms
• General idea, good for streaming algorithms
• Small error ε with high probability 1-δ
• True hypothesis H, and learned hypothesis Ĥ
• Pr[ |H - Ĥ| < ε|H| ] > 1-δ
15
Hoeffding Adaptive Tree
• Replace frequency counters by estimators
• No need for window of instances
• Sufficient statistics kept by estimators separately
• Parameter-free change detector + estimator with
theoretical guarantees for subtree swap (ADWIN)
• Keeps sliding window consistent with 

“no-change hypothesis”
55
A. Bifet, R. Gavaldà: “Adaptive Parameter-free Learning from Evolving Data Streams” IDA (2009)
A. Bifet, R. Gavaldà: “Learning from Time-Changing Data with Adaptive Windowing”. SDM ‘07
ADWIN
56
ADWIN
57
ADWIN
58
ADWIN
59
Adaptive Random Forest
• Why Random Forests?
• Off-the-shelf learner
• Good learning performance
Adaptive random forests for evolving data stream
classification.
Gomes, H M; Bifet, A; Read, J; Barddal, J P; Enembreck, F;
Pfharinger, B; Holmes, G; Abdessalem, T.
Machine Learning, Springer, 2017.
• Based on the original Random Forest by Breiman
60
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!7
OPEN SOURCE TOOLS
MOA
• {M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
• It is closely related to WEKA
• It includes a collection of offline and online as well as tools for
evaluation:
• classification, regression
• clustering, frequent pattern mining
• Easy to extend, design and run experiments
{M}assive {O}nline {A}
MOA (Bifet et al. 20
{M}assive {O}nline {A}nalysis is a framework
learning from data streams.
It is closely related to WEKA
APACHE SAMOA
8
http://samoa-project.net
Data
Mining
Distributed
Batch
Hadoop
Mahout
Stream
Storm, S4,
Samza
SAMOA
Non
Distributed
Batch
R,
WEKA,…
Stream
MOA
G. De Francisci Morales, A. Bifet: “SAMOA: Scalable Advanced Massive Online Analysis”. JMLR (2014)
SAMOA ARCHITECTURE
An adapter for integrating Apache Flink into Apache SAMOA was implemente
n scope of this master thesis, with the main parts of its implementation bein
addressed in this section. With the use of our adapter, ML algorithms can b
executed on top of Apache Flink. The implemented adapter will be used for th
evaluation of the ML pipelines and HT algorithm variations.
Figure 20: Apache SAMOA’s high level architecture.
StreamDM
10
http://huawei-noah/github.io/streamDM
scikit-multiflow
14
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!8
SUMMARY
INTERNET OF THINGS
IoT: sensors and actuators connected by networks to
computing systems.
• Gartner predicts 20.8 billion IoT devices by 2020.
• IDC projects 32 billion IoT devices by 2020
IoT versus Big Data
10
Applications IoT Analytics
9
IOT AND INDUSTRY 4.0
Interoperability: IoT
Information transparency: virtual copy of the physical
world
Technical assistance: support human decisions
Decentralized decisions: make decisions on their own
6
Data, Intelligence and Graphs
(DIG)
3
Data, Intelligence and Graphs
(DIG)
4
4
5
Thanks!
60
@abifet
TITRE DE LA PRÉSENTATION - MENU « INSERTION / EN-TÊTE
ET PIED DE PAGE »
30/01
/2018
!2
ARTIFICIAL INTELLIGENCE AND
DATA STREAM MINING
ALBERT BIFET
15 FEVRIER 2018
#Futur&Ruptures

Artificial intelligence and data stream mining

  • 1.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !2 ARTIFICIAL INTELLIGENCE AND DATA STREAM MINING ALBERT BIFET 15 FEVRIER 2018 #Futur&Ruptures
  • 2.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !3 1. ARTIFICIAL INTELLIGENCE 2. ARTIFICIAL INTELLIGENCE CHALLENGES 3. MACHINE LEARNING FOR DATA STREAMS 4. OPEN SOURCE TOOLS 5. SUMMARY SOMMAIRE
  • 3.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !4 ARTIFICIAL INTELLIGENCE
  • 4.
    • Big Data •Internet of Things • Data Science • Artificial Intelligence Artificial Intelligence is the new Electricity real time analytics 2
  • 5.
    What is AI? •Artificial intelligence (AI) is an area of computer science that emphasizes the creation of intelligent machines”. 9
  • 6.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !5 ARTIFICIAL INTELLIGENCE CHALLENGES
  • 7.
  • 8.
  • 9.
  • 10.
    Big Data • GAFAM:Google, Apple, Facebook, Amazon, Microsoft • Personal Information • Google, Facebook, Twitter, Linkedin,.. • All personal communications in Europe are managed by non-European companies 3
  • 11.
  • 12.
    AI Systems • Accordingto Nikola Kasabov, AI systems should exhibit the following characteristics: • Accommodate new problem solving rules incrementally • Adapt online and in real time • Are able to analyze itself in terms of behavior, error and success. • Learn and improve through interaction with the environment (embodiment) • Learn quickly from large amounts of data (Big Data) • Have memory-based exemplar storage and retrieval capacities • Have parameters to represent short and long term memory, age, forgetting, etc. 2
  • 13.
    AI Challenges • Lavision de la France doit donc consister à développer simultanément une IA plus verte et une IA au service de la transition écologique.  5
  • 14.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !6 MACHINE LEARNING FOR DATA STREAMS
  • 15.
    Machine Learning • Machinelearning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. • Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. 3
  • 16.
    Analytic Standard Approach Finitetraining sets
 Static models 11 Data Set Model Classifier Algorithm builds Model
  • 17.
    Data Stream Approach Infinitetraining sets
 Dynamic models 12 D M Update Model D M D M D M D M D M D M D M D M D M D M D M
  • 18.
    Importance$of$O •  As$spam$trends$change retrain$the$model$with Pain Points •Need to retrain! • Things change over time • How often? • Data unused until next update! • Value of data wasted 13
  • 19.
    IoT Stream Mining •Maintain models online • Incorporate data on the fly • Unbounded training sets • Resource efficient • Detect changes and adapts • Dynamic models 14
  • 20.
    Approximation Algorithms • Generalidea, good for streaming algorithms • Small error ε with high probability 1-δ • True hypothesis H, and learned hypothesis Ĥ • Pr[ |H - Ĥ| < ε|H| ] > 1-δ 15
  • 21.
    Hoeffding Adaptive Tree •Replace frequency counters by estimators • No need for window of instances • Sufficient statistics kept by estimators separately • Parameter-free change detector + estimator with theoretical guarantees for subtree swap (ADWIN) • Keeps sliding window consistent with 
 “no-change hypothesis” 55 A. Bifet, R. Gavaldà: “Adaptive Parameter-free Learning from Evolving Data Streams” IDA (2009) A. Bifet, R. Gavaldà: “Learning from Time-Changing Data with Adaptive Windowing”. SDM ‘07
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Adaptive Random Forest •Why Random Forests? • Off-the-shelf learner • Good learning performance Adaptive random forests for evolving data stream classification. Gomes, H M; Bifet, A; Read, J; Barddal, J P; Enembreck, F; Pfharinger, B; Holmes, G; Abdessalem, T. Machine Learning, Springer, 2017. • Based on the original Random Forest by Breiman 60
  • 27.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !7 OPEN SOURCE TOOLS
  • 28.
    MOA • {M}assive {O}nline{A}nalysis is a framework for online learning from data streams. • It is closely related to WEKA • It includes a collection of offline and online as well as tools for evaluation: • classification, regression • clustering, frequent pattern mining • Easy to extend, design and run experiments {M}assive {O}nline {A} MOA (Bifet et al. 20 {M}assive {O}nline {A}nalysis is a framework learning from data streams. It is closely related to WEKA
  • 29.
  • 30.
    SAMOA ARCHITECTURE An adapterfor integrating Apache Flink into Apache SAMOA was implemente n scope of this master thesis, with the main parts of its implementation bein addressed in this section. With the use of our adapter, ML algorithms can b executed on top of Apache Flink. The implemented adapter will be used for th evaluation of the ML pipelines and HT algorithm variations. Figure 20: Apache SAMOA’s high level architecture.
  • 31.
  • 32.
  • 33.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !8 SUMMARY
  • 34.
    INTERNET OF THINGS IoT:sensors and actuators connected by networks to computing systems. • Gartner predicts 20.8 billion IoT devices by 2020. • IDC projects 32 billion IoT devices by 2020
  • 35.
  • 36.
  • 37.
    IOT AND INDUSTRY4.0 Interoperability: IoT Information transparency: virtual copy of the physical world Technical assistance: support human decisions Decentralized decisions: make decisions on their own
  • 38.
  • 39.
    Data, Intelligence andGraphs (DIG) 3
  • 40.
    Data, Intelligence andGraphs (DIG) 4
  • 41.
  • 42.
  • 43.
  • 44.
    TITRE DE LAPRÉSENTATION - MENU « INSERTION / EN-TÊTE ET PIED DE PAGE » 30/01 /2018 !2 ARTIFICIAL INTELLIGENCE AND DATA STREAM MINING ALBERT BIFET 15 FEVRIER 2018 #Futur&Ruptures