SlideShare a Scribd company logo
MOA: Massive Online Analysis, a Framework for Stream
Classification and Clustering
Albert Bifet, Geoff Holmes, Bernhard Pfahringer,
Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl
University of Waikato
Hamilton, New Zealand
Data Management and Data Exploration Group
RWTH Aachen University, Germany
Cumberland Lodge, 2 September 2010
Workshop on Applications of Pattern Analysis 2010
Mining Massive Data
2007
Digital Universe: 281 exabytes (billion gigabytes)
The amount of information created exceeded available
storage for the first time
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
Twitter
106 million registered users
3 billion requests a day via its API.
2 / 21
Efficient Algorithms
Evolving Data Streams
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Stream Mining Algorithms
Fast methods without storing all dataset in memory
Traditional methods don’t deal with restrictions
3 / 21
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
Easy to extend
Easy to design and run experiments
4 / 21
WEKA
Waikato Environment for Knowledge Analysis
Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java
Released under the GPL
Support for the whole process of experimental data mining
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Used for education, research and applications
Complements “Data Mining” by Witten & Frank
5 / 21
WEKA: the bird
6 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
Data stream learning cycle
1 Process an example at a time,
and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point
8 / 21
Classification Experimental setting
9 / 21
Classification Experimental setting
Evaluation procedures for Data
Streams
Holdout
Interleaved Test-Then-Train or
Prequential
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
10 / 21
Classification Experimental setting
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Hyperplane
SEA Generator
STAGGER Generator
10 / 21
Classification Experimental setting
Classifiers
Naive Bayes
Decision stumps
Hoeffding Tree
Hoeffding Option Tree
Bagging and Boosting
ADWIN Bagging and
Leveraging Bagging
Prediction strategies
Majority class
Naive Bayes Leaves
Adaptive Hybrid
10 / 21
Clustering Experimental setting
11 / 21
Clustering Experimental setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.
12 / 21
Clustering Experimental setting
Clusterers
StreamKM++
CluStream
ClusTree
Den-Stream
D-Stream
CobWeb
13 / 21
Web
http://www.moa.cs.waikato.ac.nz
14 / 21
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.GUI
15 / 21
Command Line
EvaluatePeriodicHeldOutTest
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar
moa.DoTask "EvaluatePeriodicHeldOutTest
-l DecisionStump -s generators.WaveformGenerator
-n 100000 -i 100000000 -f 1000000" > dsresult.csv
This command creates a comma separated values file:
training the DecisionStump classifier on the WaveformGenerator
data,
using the first 100 thousand examples for testing,
training on a total of 100 million examples, and
testing every one million examples:
16 / 21
Easy Design of a MOA classifier
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
double[] getVotesForInstance (Instance i)
17 / 21
Easy Design of a MOA clusterer
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
Clustering getClusteringResult()
18 / 21
Extensions of MOA
Multi-label Classification
Itemset Pattern Mining
Sequence Pattern Mining
19 / 21
Summary
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
http://www.moa.cs.waikato.ac.nz
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
MOA deals with evolving data streams
MOA is easy to use and extend
20 / 21
21 / 21

More Related Content

What's hot

OSS についてあれこれ
OSS についてあれこれOSS についてあれこれ
OSS についてあれこれ
Takuto Wada
 
難易度ボラタリティグラフという分析手法
難易度ボラタリティグラフという分析手法難易度ボラタリティグラフという分析手法
難易度ボラタリティグラフという分析手法
Tokoroten Nakayama
 
地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
 地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市  地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
Yoshiaki Hirai
 
テストとリファクタリングに関する深い方法論 #wewlc_jp
テストとリファクタリングに関する深い方法論 #wewlc_jpテストとリファクタリングに関する深い方法論 #wewlc_jp
テストとリファクタリングに関する深い方法論 #wewlc_jp
kyon mm
 
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
Masaya Tahara
 
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
@yuzutas0 Yokoyama
 
運用してわかったLookerの本質的メリット : Data Engineering Study #8
運用してわかったLookerの本質的メリット : Data Engineering Study #8運用してわかったLookerの本質的メリット : Data Engineering Study #8
運用してわかったLookerの本質的メリット : Data Engineering Study #8
Masatoshi Abe
 
Lv1から始めるWebサービスのインフラ構築
Lv1から始めるWebサービスのインフラ構築Lv1から始めるWebサービスのインフラ構築
Lv1から始めるWebサービスのインフラ構築
伊藤 祐策
 
SREチームとしてSREしてみた話
SREチームとしてSREしてみた話SREチームとしてSREしてみた話
SREチームとしてSREしてみた話
Yahoo!デベロッパーネットワーク
 
Raspberry Pi I/O控制與感測器讀取
Raspberry Pi I/O控制與感測器讀取Raspberry Pi I/O控制與感測器讀取
Raspberry Pi I/O控制與感測器讀取
艾鍗科技
 
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
Yuki Ando
 
「品質ダッシュボード」と「データによる意思決定」
「品質ダッシュボード」と「データによる意思決定」「品質ダッシュボード」と「データによる意思決定」
「品質ダッシュボード」と「データによる意思決定」
Kohei Tomita
 
「いい人がいない」のメカニズム
「いい人がいない」のメカニズム「いい人がいない」のメカニズム
「いい人がいない」のメカニズム
林 要
 
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
JumpeiIto2
 
持続的なサービス提供のための計測と分析
持続的なサービス提供のための計測と分析持続的なサービス提供のための計測と分析
持続的なサービス提供のための計測と分析
Miyabi Goji
 
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
Kaz Aiso
 
アドテクを支える技術 〜1日40億リクエストを捌くには〜
アドテクを支える技術 〜1日40億リクエストを捌くには〜アドテクを支える技術 〜1日40億リクエストを捌くには〜
アドテクを支える技術 〜1日40億リクエストを捌くには〜
MicroAd, Inc.(Engineer)
 
リテラル文字列型までの道
リテラル文字列型までの道リテラル文字列型までの道
リテラル文字列型までの道Satoshi Sato
 
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでくださいカスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
Takaaki Umada
 
さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2
ozuma5119
 

What's hot (20)

OSS についてあれこれ
OSS についてあれこれOSS についてあれこれ
OSS についてあれこれ
 
難易度ボラタリティグラフという分析手法
難易度ボラタリティグラフという分析手法難易度ボラタリティグラフという分析手法
難易度ボラタリティグラフという分析手法
 
地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
 地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市  地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
地震すまっぽん! 全国クラウド活用大賞 エントリー in 福津市
 
テストとリファクタリングに関する深い方法論 #wewlc_jp
テストとリファクタリングに関する深い方法論 #wewlc_jpテストとリファクタリングに関する深い方法論 #wewlc_jp
テストとリファクタリングに関する深い方法論 #wewlc_jp
 
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
OSS+AWSでここまでできるDevSecOps (Security-JAWS第24回)
 
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
事業のグロースを支えるDataOpsの現場 #DataOps #DevSumi #デブサミ
 
運用してわかったLookerの本質的メリット : Data Engineering Study #8
運用してわかったLookerの本質的メリット : Data Engineering Study #8運用してわかったLookerの本質的メリット : Data Engineering Study #8
運用してわかったLookerの本質的メリット : Data Engineering Study #8
 
Lv1から始めるWebサービスのインフラ構築
Lv1から始めるWebサービスのインフラ構築Lv1から始めるWebサービスのインフラ構築
Lv1から始めるWebサービスのインフラ構築
 
SREチームとしてSREしてみた話
SREチームとしてSREしてみた話SREチームとしてSREしてみた話
SREチームとしてSREしてみた話
 
Raspberry Pi I/O控制與感測器讀取
Raspberry Pi I/O控制與感測器讀取Raspberry Pi I/O控制與感測器讀取
Raspberry Pi I/O控制與感測器讀取
 
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
CODT2020 ビジネスプラットフォームを支えるCI/CDパイプライン ~エンタープライズのDevOpsを加速させる運用改善Tips~
 
「品質ダッシュボード」と「データによる意思決定」
「品質ダッシュボード」と「データによる意思決定」「品質ダッシュボード」と「データによる意思決定」
「品質ダッシュボード」と「データによる意思決定」
 
「いい人がいない」のメカニズム
「いい人がいない」のメカニズム「いい人がいない」のメカニズム
「いい人がいない」のメカニズム
 
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
 
持続的なサービス提供のための計測と分析
持続的なサービス提供のための計測と分析持続的なサービス提供のための計測と分析
持続的なサービス提供のための計測と分析
 
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
【DELPHI / C++BUILDER STARTER チュートリアルシリーズ】 シーズン2 Delphi の部 第5回 「配列 と レコード 」
 
アドテクを支える技術 〜1日40億リクエストを捌くには〜
アドテクを支える技術 〜1日40億リクエストを捌くには〜アドテクを支える技術 〜1日40億リクエストを捌くには〜
アドテクを支える技術 〜1日40億リクエストを捌くには〜
 
リテラル文字列型までの道
リテラル文字列型までの道リテラル文字列型までの道
リテラル文字列型までの道
 
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでくださいカスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
カスタマーサポートのことは嫌いでも、カスタマーサクセスは嫌いにならないでください
 
さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2
 

Viewers also liked

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
Albert Bifet
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
Albert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
Albert Bifet
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
Albert Bifet
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
Pier Luca Lanzi
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2O
Sri Ambati
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
Albert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Albert Bifet
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
eSAT Journals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
k_tauhid
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner Activity
Juho Kim
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Lighton Phiri
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013
Doc Norton
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
Daniele Dell'Aglio
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
weka Content
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
Databricks
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
jachno
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Beniamino Murgante
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An Epidemic
Doc Norton
 

Viewers also liked (20)

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2O
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner Activity
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An Epidemic
 

Similar to Moa: Real Time Analytics for Data Streams

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
Albert Bifet
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
James Sirota
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Albert Bifet
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Gilles Fedak
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
DataWorks Summit
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
Oscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
prashant 100702007
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
Srinath Perera
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
Ian Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
Ian Foster
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
DataWorks Summit/Hadoop Summit
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
raj_vij
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
Raffaele Montella
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
Cloudera, Inc.
 

Similar to Moa: Real Time Analytics for Data Streams (20)

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
Albert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
Albert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Albert Bifet
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
Albert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
Albert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
Albert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Albert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
Albert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
Albert Bifet
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
Albert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
Albert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
Albert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
Albert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
Albert Bifet
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Albert Bifet
 

More from Albert Bifet (20)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Moa: Real Time Analytics for Data Streams

  • 1. MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl University of Waikato Hamilton, New Zealand Data Management and Data Exploration Group RWTH Aachen University, Germany Cumberland Lodge, 2 September 2010 Workshop on Applications of Pattern Analysis 2010
  • 2. Mining Massive Data 2007 Digital Universe: 281 exabytes (billion gigabytes) The amount of information created exceeded available storage for the first time Eric Schmidt, August 2010 Every two days now we create as much information as we did from the dawn of civilization up until 2003. 5 exabytes of data Twitter 106 million registered users 3 billion requests a day via its API. 2 / 21
  • 3. Efficient Algorithms Evolving Data Streams Extract information from potentially infinite sequence of data possibly varying over time using few resources Stream Mining Algorithms Fast methods without storing all dataset in memory Traditional methods don’t deal with restrictions 3 / 21
  • 4. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering Easy to extend Easy to design and run experiments 4 / 21
  • 5. WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 5 / 21
  • 7. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 8. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 9. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 10. Data stream learning cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 8 / 21
  • 12. Classification Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 10 / 21
  • 13. Classification Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Hyperplane SEA Generator STAGGER Generator 10 / 21
  • 14. Classification Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting ADWIN Bagging and Leveraging Bagging Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 10 / 21
  • 16. Clustering Experimental setting Internal measures External measures Gamma Rand statistic C Index Jaccard coefficient Point-Biserial Folkes and Mallow Index Log Likelihood Hubert Γ statistics Dunn’s Index Minkowski score Tau Purity Tau A van Dongen criterion Tau C V-measure Somer’s Gamma Completeness Ratio of Repetition Homogeneity Modified Ratio of Repetition Variation of information Adjusted Ratio of Clustering Mutual information Fagan’s Index Class-based entropy Deviation Index Cluster-based entropy Z-Score Index Precision D Index Recall Silhouette coefficient F-measure Table: Internal and external clustering evaluation measures. 12 / 21
  • 20. Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 16 / 21
  • 21. Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) 17 / 21
  • 22. Easy Design of a MOA clusterer void resetLearningImpl () void trainOnInstanceImpl (Instance inst) Clustering getClusteringResult() 18 / 21
  • 23. Extensions of MOA Multi-label Classification Itemset Pattern Mining Sequence Pattern Mining 19 / 21
  • 24. Summary {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. http://www.moa.cs.waikato.ac.nz It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering MOA deals with evolving data streams MOA is easy to use and extend 20 / 21