SlideShare a Scribd company logo
1 of 25
Download to read offline
MOA: Massive Online Analysis, a Framework for Stream
Classification and Clustering
Albert Bifet, Geoff Holmes, Bernhard Pfahringer,
Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl
University of Waikato
Hamilton, New Zealand
Data Management and Data Exploration Group
RWTH Aachen University, Germany
Cumberland Lodge, 2 September 2010
Workshop on Applications of Pattern Analysis 2010
Mining Massive Data
2007
Digital Universe: 281 exabytes (billion gigabytes)
The amount of information created exceeded available
storage for the first time
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
Twitter
106 million registered users
3 billion requests a day via its API.
2 / 21
Efficient Algorithms
Evolving Data Streams
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Stream Mining Algorithms
Fast methods without storing all dataset in memory
Traditional methods don’t deal with restrictions
3 / 21
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
Easy to extend
Easy to design and run experiments
4 / 21
WEKA
Waikato Environment for Knowledge Analysis
Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java
Released under the GPL
Support for the whole process of experimental data mining
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Used for education, research and applications
Complements “Data Mining” by Witten & Frank
5 / 21
WEKA: the bird
6 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
Data stream learning cycle
1 Process an example at a time,
and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point
8 / 21
Classification Experimental setting
9 / 21
Classification Experimental setting
Evaluation procedures for Data
Streams
Holdout
Interleaved Test-Then-Train or
Prequential
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
10 / 21
Classification Experimental setting
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Hyperplane
SEA Generator
STAGGER Generator
10 / 21
Classification Experimental setting
Classifiers
Naive Bayes
Decision stumps
Hoeffding Tree
Hoeffding Option Tree
Bagging and Boosting
ADWIN Bagging and
Leveraging Bagging
Prediction strategies
Majority class
Naive Bayes Leaves
Adaptive Hybrid
10 / 21
Clustering Experimental setting
11 / 21
Clustering Experimental setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.
12 / 21
Clustering Experimental setting
Clusterers
StreamKM++
CluStream
ClusTree
Den-Stream
D-Stream
CobWeb
13 / 21
Web
http://www.moa.cs.waikato.ac.nz
14 / 21
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.GUI
15 / 21
Command Line
EvaluatePeriodicHeldOutTest
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar
moa.DoTask "EvaluatePeriodicHeldOutTest
-l DecisionStump -s generators.WaveformGenerator
-n 100000 -i 100000000 -f 1000000" > dsresult.csv
This command creates a comma separated values file:
training the DecisionStump classifier on the WaveformGenerator
data,
using the first 100 thousand examples for testing,
training on a total of 100 million examples, and
testing every one million examples:
16 / 21
Easy Design of a MOA classifier
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
double[] getVotesForInstance (Instance i)
17 / 21
Easy Design of a MOA clusterer
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
Clustering getClusteringResult()
18 / 21
Extensions of MOA
Multi-label Classification
Itemset Pattern Mining
Sequence Pattern Mining
19 / 21
Summary
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
http://www.moa.cs.waikato.ac.nz
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
MOA deals with evolving data streams
MOA is easy to use and extend
20 / 21
21 / 21

More Related Content

What's hot

Electronic payment systems - Presentation by IrfanAnsari.com
Electronic payment systems - Presentation by IrfanAnsari.comElectronic payment systems - Presentation by IrfanAnsari.com
Electronic payment systems - Presentation by IrfanAnsari.comLearnInUrdu.com & Ustaadjee.com
 
Software for atm manufacturer
Software for atm manufacturerSoftware for atm manufacturer
Software for atm manufacturerhandryjames
 
Bitcoin 101: The Currency, The Network, The Community
Bitcoin 101: The Currency, The Network, The CommunityBitcoin 101: The Currency, The Network, The Community
Bitcoin 101: The Currency, The Network, The CommunityEarthsite
 
Payment Gateway
Payment GatewayPayment Gateway
Payment GatewayShujaShah
 
Online voting system
Online voting systemOnline voting system
Online voting systemArti Gupta
 
Payment System History
Payment System History Payment System History
Payment System History ARRhaman
 
Ethereum Blockchain explained
Ethereum Blockchain explainedEthereum Blockchain explained
Ethereum Blockchain explainedEthWorks
 
Bank management system
Bank management systemBank management system
Bank management systemsumanadas37
 
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...Edureka!
 

What's hot (20)

Electronic payment systems - Presentation by IrfanAnsari.com
Electronic payment systems - Presentation by IrfanAnsari.comElectronic payment systems - Presentation by IrfanAnsari.com
Electronic payment systems - Presentation by IrfanAnsari.com
 
Atm transaction
Atm transactionAtm transaction
Atm transaction
 
One-Time Password
One-Time PasswordOne-Time Password
One-Time Password
 
Software for atm manufacturer
Software for atm manufacturerSoftware for atm manufacturer
Software for atm manufacturer
 
Emv Explained in few words
Emv Explained in few words Emv Explained in few words
Emv Explained in few words
 
Bitcoin 101: The Currency, The Network, The Community
Bitcoin 101: The Currency, The Network, The CommunityBitcoin 101: The Currency, The Network, The Community
Bitcoin 101: The Currency, The Network, The Community
 
Atm software
Atm softwareAtm software
Atm software
 
ATM
ATMATM
ATM
 
Atm System
Atm SystemAtm System
Atm System
 
E-money Payment System
E-money Payment SystemE-money Payment System
E-money Payment System
 
Payment Gateway
Payment GatewayPayment Gateway
Payment Gateway
 
Online voting system
Online voting systemOnline voting system
Online voting system
 
EMV Overview
EMV OverviewEMV Overview
EMV Overview
 
Payment System History
Payment System History Payment System History
Payment System History
 
Ethereum Blockchain explained
Ethereum Blockchain explainedEthereum Blockchain explained
Ethereum Blockchain explained
 
Security features of atm
Security features of atmSecurity features of atm
Security features of atm
 
Bank management system
Bank management systemBank management system
Bank management system
 
ATM Banking
ATM BankingATM Banking
ATM Banking
 
EMV chip cards
EMV chip cardsEMV chip cards
EMV chip cards
 
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
Smart Contracts Programming Tutorial | Solidity Programming Language | Solidi...
 

Viewers also liked

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2OSri Ambati
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalseSAT Journals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityJuho Kim
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesLighton Phiri
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013Doc Norton
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Daniele Dell'Aglio
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methodsweka Content
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimationjachno
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Beniamino Murgante
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An EpidemicDoc Norton
 

Viewers also liked (20)

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2O
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner Activity
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An Epidemic
 

Similar to Moa: Real Time Analytics for Data Streams

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...Raffaele Montella
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance IssuesCloudera, Inc.
 

Similar to Moa: Real Time Analytics for Data Streams (20)

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsAlbert Bifet
 

More from Albert Bifet (20)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Moa: Real Time Analytics for Data Streams

  • 1. MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl University of Waikato Hamilton, New Zealand Data Management and Data Exploration Group RWTH Aachen University, Germany Cumberland Lodge, 2 September 2010 Workshop on Applications of Pattern Analysis 2010
  • 2. Mining Massive Data 2007 Digital Universe: 281 exabytes (billion gigabytes) The amount of information created exceeded available storage for the first time Eric Schmidt, August 2010 Every two days now we create as much information as we did from the dawn of civilization up until 2003. 5 exabytes of data Twitter 106 million registered users 3 billion requests a day via its API. 2 / 21
  • 3. Efficient Algorithms Evolving Data Streams Extract information from potentially infinite sequence of data possibly varying over time using few resources Stream Mining Algorithms Fast methods without storing all dataset in memory Traditional methods don’t deal with restrictions 3 / 21
  • 4. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering Easy to extend Easy to design and run experiments 4 / 21
  • 5. WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 5 / 21
  • 7. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 8. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 9. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 10. Data stream learning cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 8 / 21
  • 12. Classification Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 10 / 21
  • 13. Classification Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Hyperplane SEA Generator STAGGER Generator 10 / 21
  • 14. Classification Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting ADWIN Bagging and Leveraging Bagging Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 10 / 21
  • 16. Clustering Experimental setting Internal measures External measures Gamma Rand statistic C Index Jaccard coefficient Point-Biserial Folkes and Mallow Index Log Likelihood Hubert Γ statistics Dunn’s Index Minkowski score Tau Purity Tau A van Dongen criterion Tau C V-measure Somer’s Gamma Completeness Ratio of Repetition Homogeneity Modified Ratio of Repetition Variation of information Adjusted Ratio of Clustering Mutual information Fagan’s Index Class-based entropy Deviation Index Cluster-based entropy Z-Score Index Precision D Index Recall Silhouette coefficient F-measure Table: Internal and external clustering evaluation measures. 12 / 21
  • 20. Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 16 / 21
  • 21. Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) 17 / 21
  • 22. Easy Design of a MOA clusterer void resetLearningImpl () void trainOnInstanceImpl (Instance inst) Clustering getClusteringResult() 18 / 21
  • 23. Extensions of MOA Multi-label Classification Itemset Pattern Mining Sequence Pattern Mining 19 / 21
  • 24. Summary {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. http://www.moa.cs.waikato.ac.nz It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering MOA deals with evolving data streams MOA is easy to use and extend 20 / 21