SlideShare a Scribd company logo
1 of 26
Download to read offline
Streaming Outlier Analysis for Fun and Scalability
Casey Stella
2016
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Table of Contents
Streaming Analytics
Framework
Demos
Questions
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Introduction
Hi, Iā€™m Casey Stella!
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The future involves non-trivial analytics done on streaming data
ā€¢ Itā€™s not just IoT
ā€¢ There is a need for insights to keep pace with the velocity of your data
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
ā€¢ The Bad: There is a lot of data and it comes at you fast
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
ā€¢ The Bad: There is a lot of data and it comes at you fast
ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
ā€¢ The Bad: There is a lot of data and it comes at you fast
ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app
ā€¢ The Bad: Outlier analysis can be computationally intensive
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
ā€¢ The Bad: There is a lot of data and it comes at you fast
ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app
ā€¢ The Bad: Outlier analysis can be computationally intensive
ā€¢ The Good: There is no shortage of computational frameworks to handle streaming
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Streaming Analytics
ā€¢ The Good: Much of the data can be coerced into timeseries
ā€¢ The Bad: There is a lot of data and it comes at you fast
ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app
ā€¢ The Bad: Outlier analysis can be computationally intensive
ā€¢ The Good: There is no shortage of computational frameworks to handle streaming
ā€¢ The Bad: There are not an overabundance of high-quality outlier analysis
frameworks
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis
Outlier analysis or anomaly detection is the analytical technique by which ā€œinterestingā€
points are diļ¬€erentiated from ā€œnormalā€ points. Often ā€œinterestingā€ implies some sort of
error or state which should be researched further.
1
http://arxiv.org/pdf/1603.00567v1.pdf
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis
Outlier analysis or anomaly detection is the analytical technique by which ā€œinterestingā€
points are diļ¬€erentiated from ā€œnormalā€ points. Often ā€œinterestingā€ implies some sort of
error or state which should be researched further.
Macrobase1, an outlier analysis system built for IoT by MIT and Stanford and
Cambridge Mobile Telematics, noted several properties of IoT data:
ā€¢ Data produced by IoT applications often have come from some ā€œordinaryā€
distribution
ā€¢ IoT anomalies are often systemic
ā€¢ They are often fairly rare
1
http://arxiv.org/pdf/1603.00567v1.pdf
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute
deviation) that uses distributional sketching (e.g. Q-trees)
ā—¦ Gather a biased sample (biased by recency)
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute
deviation) that uses distributional sketching (e.g. Q-trees)
ā—¦ Gather a biased sample (biased by recency)
ā—¦ Extremely deterministic in space and cheap in computation
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute
deviation) that uses distributional sketching (e.g. Q-trees)
ā—¦ Gather a biased sample (biased by recency)
ā—¦ Extremely deterministic in space and cheap in computation
ā€¢ For every outlier candidate
ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g.
Robust PCA) on the biased sample
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute
deviation) that uses distributional sketching (e.g. Q-trees)
ā—¦ Gather a biased sample (biased by recency)
ā—¦ Extremely deterministic in space and cheap in computation
ā€¢ For every outlier candidate
ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g.
Robust PCA) on the biased sample
ā—¦ Expensive computationally, but run infrequently
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Outlier Analysis: A Hybrid Approach
In order to function at scale, a two-phase approach is taken
ā€¢ For every data point
ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute
deviation) that uses distributional sketching (e.g. Q-trees)
ā—¦ Gather a biased sample (biased by recency)
ā—¦ Extremely deterministic in space and cheap in computation
ā€¢ For every outlier candidate
ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g.
Robust PCA) on the biased sample
ā—¦ Expensive computationally, but run infrequently
This becomes a data ļ¬lter which can be attached to a timeseries data stream
within a distributed computational framework (i.e. Storm, Spark, Flink, NiFi)
to detect outliers.
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Sketchy Outlier Estimator: Median Absolute Deviation
ā€¢ Median absolute deviation (or MAD) is a robust statistic
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Sketchy Outlier Estimator: Median Absolute Deviation
ā€¢ Median absolute deviation (or MAD) is a robust statistic
ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range
of non-normally distributed probability distributions
ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the
presence of outliers.
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Sketchy Outlier Estimator: Median Absolute Deviation
ā€¢ Median absolute deviation (or MAD) is a robust statistic
ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range
of non-normally distributed probability distributions
ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the
presence of outliers.
ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with
Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}).
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Sketchy Outlier Estimator: Median Absolute Deviation
ā€¢ Median absolute deviation (or MAD) is a robust statistic
ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range
of non-normally distributed probability distributions
ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the
presence of outliers.
ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with
Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}).
ā€¢ A point is considered an outlier if its distance from the current window median,
scaled by the MAD for the previous window, is above a threshold.
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Sketchy Outlier Estimator: Median Absolute Deviation
ā€¢ Median absolute deviation (or MAD) is a robust statistic
ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range
of non-normally distributed probability distributions
ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the
presence of outliers.
ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with
Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}).
ā€¢ A point is considered an outlier if its distance from the current window median,
scaled by the MAD for the previous window, is above a threshold.
tl;dr: A formal way to encode our intuition: If a point is far away from the
ā€œcentralā€ point of our window, then itā€™s likely an outlier.
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Architecture
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Demos
Demos
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
Questions
Thanks for your attention! Questions?
ā€¢ Code & scripts for this talk available at
http://github.com/cestella/streaming_outliers
ā€¢ Find me at http://caseystella.com
ā€¢ Twitter handle: @casey_stella
ā€¢ Email address: cstella@hortonworks.com
Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016

More Related Content

What's hot

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
Ā 
Opportunistic Persistent Data Storage
Opportunistic Persistent Data StorageOpportunistic Persistent Data Storage
Opportunistic Persistent Data StorageLuke Weerasooriya
Ā 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
Ā 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
Ā 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
Ā 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyMatti Heino
Ā 
Swapnil soni Thesis_Presentation
Swapnil soni Thesis_PresentationSwapnil soni Thesis_Presentation
Swapnil soni Thesis_PresentationSwapnil Soni
Ā 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
Ā 
Workshop on Systematic Searching (Oslo)
Workshop on Systematic Searching (Oslo)Workshop on Systematic Searching (Oslo)
Workshop on Systematic Searching (Oslo)jstaaks
Ā 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
Ā 
Introduction to Systematic Reviews (Oslo)
Introduction to Systematic Reviews (Oslo)Introduction to Systematic Reviews (Oslo)
Introduction to Systematic Reviews (Oslo)jstaaks
Ā 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram LudƤscher
Ā 
Program theory evaluation
Program theory evaluationProgram theory evaluation
Program theory evaluationMatti Heino
Ā 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
Ā 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsPaul Hofmann
Ā 
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)Young-Geun Choi
Ā 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
Ā 
Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterBen De Meester
Ā 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
Ā 

What's hot (19)

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Ā 
Opportunistic Persistent Data Storage
Opportunistic Persistent Data StorageOpportunistic Persistent Data Storage
Opportunistic Persistent Data Storage
Ā 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
Ā 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Ā 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Ā 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychology
Ā 
Swapnil soni Thesis_Presentation
Swapnil soni Thesis_PresentationSwapnil soni Thesis_Presentation
Swapnil soni Thesis_Presentation
Ā 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
Ā 
Workshop on Systematic Searching (Oslo)
Workshop on Systematic Searching (Oslo)Workshop on Systematic Searching (Oslo)
Workshop on Systematic Searching (Oslo)
Ā 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
Ā 
Introduction to Systematic Reviews (Oslo)
Introduction to Systematic Reviews (Oslo)Introduction to Systematic Reviews (Oslo)
Introduction to Systematic Reviews (Oslo)
Ā 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
Ā 
Program theory evaluation
Program theory evaluationProgram theory evaluation
Program theory evaluation
Ā 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
Ā 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
Ā 
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)
Differential privacy (ź°œģøģ •ė³“ ģ°Øė“±ė³“ķ˜ø)
Ā 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Ā 
Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De Meester
Ā 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
Ā 

Viewers also liked

Real time applications using the R Language
Real time applications using the R LanguageReal time applications using the R Language
Real time applications using the R LanguageLou Bajuk
Ā 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
Ā 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysisDataminingTools Inc
Ā 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in RRory Winston
Ā 
Real-TIme Market Data in R
Real-TIme Market Data in RReal-TIme Market Data in R
Real-TIme Market Data in RRory Winston
Ā 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
Ā 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
Ā 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
Ā 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
Ā 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
Ā 

Viewers also liked (12)

Real time applications using the R Language
Real time applications using the R LanguageReal time applications using the R Language
Real time applications using the R Language
Ā 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Ā 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
Ā 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in R
Ā 
Real-TIme Market Data in R
Real-TIme Market Data in RReal-TIme Market Data in R
Real-TIme Market Data in R
Ā 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
Ā 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
Ā 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
Ā 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
Ā 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
Ā 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Ā 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
Ā 

Similar to Streaming Outlier Analysis for Fun and Scalability

Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
Ā 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
Ā 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
Ā 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
Ā 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...tboubez
Ā 
Stance classification. Presentation at QMUL 11 Nov 2020
Stance classification. Presentation at QMUL 11 Nov 2020Stance classification. Presentation at QMUL 11 Nov 2020
Stance classification. Presentation at QMUL 11 Nov 2020Weverify
Ā 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
Ā 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
Ā 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronMaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronDataWorks Summit
Ā 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
Ā 
Stance classification. Uni Cambridge 22 Jan 2021
Stance classification. Uni Cambridge 22 Jan 2021Stance classification. Uni Cambridge 22 Jan 2021
Stance classification. Uni Cambridge 22 Jan 2021Weverify
Ā 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
Ā 
Automatic Visualization
Automatic VisualizationAutomatic Visualization
Automatic VisualizationSri Ambati
Ā 
MRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxionMRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxionMichael Kelly
Ā 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
Ā 

Similar to Streaming Outlier Analysis for Fun and Scalability (20)

Data Preparation for Data Science
Data Preparation for Data ScienceData Preparation for Data Science
Data Preparation for Data Science
Ā 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
Ā 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
Ā 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
Ā 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
Ā 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
Ā 
Statistics
StatisticsStatistics
Statistics
Ā 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
Ā 
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Velocity Europe 2013: Beyond Pretty Charts: Analytics for the cloud infrastru...
Ā 
Stance classification. Presentation at QMUL 11 Nov 2020
Stance classification. Presentation at QMUL 11 Nov 2020Stance classification. Presentation at QMUL 11 Nov 2020
Stance classification. Presentation at QMUL 11 Nov 2020
Ā 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Ā 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
Ā 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronMaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
Ā 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
Ā 
Stance classification. Uni Cambridge 22 Jan 2021
Stance classification. Uni Cambridge 22 Jan 2021Stance classification. Uni Cambridge 22 Jan 2021
Stance classification. Uni Cambridge 22 Jan 2021
Ā 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Ā 
Automatic Visualization
Automatic VisualizationAutomatic Visualization
Automatic Visualization
Ā 
MRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxionMRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxion
Ā 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
Ā 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
Ā 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionDataWorks Summit/Hadoop Summit
Ā 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinDataWorks Summit/Hadoop Summit
Ā 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
Ā 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
Ā 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinDataWorks Summit/Hadoop Summit
Ā 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
Ā 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
Ā 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
Ā 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
Ā 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit
Ā 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
Ā 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopDataWorks Summit/Hadoop Summit
Ā 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
Ā 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
Ā 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Ā 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Ā 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Ā 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Ā 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Ā 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
Ā 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Ā 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
Ā 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
Ā 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
Ā 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Ā 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Ā 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
Ā 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
Ā 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
Ā 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Ā 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Ā 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Ā 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Ā 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
Ā 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Ā 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
Ā 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
Ā 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
Ā 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
Ā 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
Ā 
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024BookNet Canada
Ā 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
Ā 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
Ā 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
Ā 
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Wonjun Hwang
Ā 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Ā 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
Ā 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
Ā 
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge GraphSIEMENS: RAPUNZEL ā€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge GraphNeo4j
Ā 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Ā 
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort ServiceHot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Ā 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Ā 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Ā 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Ā 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Ā 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Ā 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
Ā 
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
Ā 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Ā 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Ā 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Ā 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Ā 
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Ā 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Ā 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Ā 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
Ā 
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge GraphSIEMENS: RAPUNZEL ā€“ A Tale About Knowledge Graph
SIEMENS: RAPUNZEL ā€“ A Tale About Knowledge Graph
Ā 

Streaming Outlier Analysis for Fun and Scalability

  • 1. Streaming Outlier Analysis for Fun and Scalability Casey Stella 2016 Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 2. Table of Contents Streaming Analytics Framework Demos Questions Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 3. Introduction Hi, Iā€™m Casey Stella! Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 4. Streaming Analytics ā€¢ The future involves non-trivial analytics done on streaming data ā€¢ Itā€™s not just IoT ā€¢ There is a need for insights to keep pace with the velocity of your data Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 5. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 6. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries ā€¢ The Bad: There is a lot of data and it comes at you fast Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 7. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries ā€¢ The Bad: There is a lot of data and it comes at you fast ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 8. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries ā€¢ The Bad: There is a lot of data and it comes at you fast ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app ā€¢ The Bad: Outlier analysis can be computationally intensive Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 9. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries ā€¢ The Bad: There is a lot of data and it comes at you fast ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app ā€¢ The Bad: Outlier analysis can be computationally intensive ā€¢ The Good: There is no shortage of computational frameworks to handle streaming Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 10. Streaming Analytics ā€¢ The Good: Much of the data can be coerced into timeseries ā€¢ The Bad: There is a lot of data and it comes at you fast ā€¢ The Good: Outlier analysis or anomaly detection is a killer-app ā€¢ The Bad: Outlier analysis can be computationally intensive ā€¢ The Good: There is no shortage of computational frameworks to handle streaming ā€¢ The Bad: There are not an overabundance of high-quality outlier analysis frameworks Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 11. Outlier Analysis Outlier analysis or anomaly detection is the analytical technique by which ā€œinterestingā€ points are diļ¬€erentiated from ā€œnormalā€ points. Often ā€œinterestingā€ implies some sort of error or state which should be researched further. 1 http://arxiv.org/pdf/1603.00567v1.pdf Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 12. Outlier Analysis Outlier analysis or anomaly detection is the analytical technique by which ā€œinterestingā€ points are diļ¬€erentiated from ā€œnormalā€ points. Often ā€œinterestingā€ implies some sort of error or state which should be researched further. Macrobase1, an outlier analysis system built for IoT by MIT and Stanford and Cambridge Mobile Telematics, noted several properties of IoT data: ā€¢ Data produced by IoT applications often have come from some ā€œordinaryā€ distribution ā€¢ IoT anomalies are often systemic ā€¢ They are often fairly rare 1 http://arxiv.org/pdf/1603.00567v1.pdf Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 13. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 14. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees) ā—¦ Gather a biased sample (biased by recency) Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 15. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees) ā—¦ Gather a biased sample (biased by recency) ā—¦ Extremely deterministic in space and cheap in computation Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 16. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees) ā—¦ Gather a biased sample (biased by recency) ā—¦ Extremely deterministic in space and cheap in computation ā€¢ For every outlier candidate ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g. Robust PCA) on the biased sample Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 17. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees) ā—¦ Gather a biased sample (biased by recency) ā—¦ Extremely deterministic in space and cheap in computation ā€¢ For every outlier candidate ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g. Robust PCA) on the biased sample ā—¦ Expensive computationally, but run infrequently Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 18. Outlier Analysis: A Hybrid Approach In order to function at scale, a two-phase approach is taken ā€¢ For every data point ā—¦ Detect outlier candidates using a robust estimator of variability (e.g. median absolute deviation) that uses distributional sketching (e.g. Q-trees) ā—¦ Gather a biased sample (biased by recency) ā—¦ Extremely deterministic in space and cheap in computation ā€¢ For every outlier candidate ā—¦ Use traditional, more computationally complex approaches to outlier analysis (e.g. Robust PCA) on the biased sample ā—¦ Expensive computationally, but run infrequently This becomes a data ļ¬lter which can be attached to a timeseries data stream within a distributed computational framework (i.e. Storm, Spark, Flink, NiFi) to detect outliers. Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 19. Sketchy Outlier Estimator: Median Absolute Deviation ā€¢ Median absolute deviation (or MAD) is a robust statistic Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 20. Sketchy Outlier Estimator: Median Absolute Deviation ā€¢ Median absolute deviation (or MAD) is a robust statistic ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 21. Sketchy Outlier Estimator: Median Absolute Deviation ā€¢ Median absolute deviation (or MAD) is a robust statistic ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}). Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 22. Sketchy Outlier Estimator: Median Absolute Deviation ā€¢ Median absolute deviation (or MAD) is a robust statistic ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}). ā€¢ A point is considered an outlier if its distance from the current window median, scaled by the MAD for the previous window, is above a threshold. Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 23. Sketchy Outlier Estimator: Median Absolute Deviation ā€¢ Median absolute deviation (or MAD) is a robust statistic ā—¦ Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions ā—¦ Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. ā€¢ The median absolute deviation is deļ¬ned for a series of univariate samples X with Ėœx =median(X), MAD(X)=median({āˆ€xi āˆˆ X||xi āˆ’ Ėœx|}). ā€¢ A point is considered an outlier if its distance from the current window median, scaled by the MAD for the previous window, is above a threshold. tl;dr: A formal way to encode our intuition: If a point is far away from the ā€œcentralā€ point of our window, then itā€™s likely an outlier. Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 24. Architecture Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 25. Demos Demos Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016
  • 26. Questions Thanks for your attention! Questions? ā€¢ Code & scripts for this talk available at http://github.com/cestella/streaming_outliers ā€¢ Find me at http://caseystella.com ā€¢ Twitter handle: @casey_stella ā€¢ Email address: cstella@hortonworks.com Casey Stella (Hortonworks) Streaming Outlier Analysis for Fun and Scalability 2016