SlideShare a Scribd company logo
Device anomaly 
detection using Spark 
k-means 
9/3/2014
Introduction 
Detect device anomaly based on the device 
information ( feature property vector ) 
● battery % 
● cpu % 
● RAM % 
● wifi strength 
● build number ( in numerical ) 
● exception 
● charging 
● gps long 
● gps lat 
● bundle version
K-means clustering 
Clustering is an unsupervised learning problem whereby we aim to group 
subsets of entities with one another based on some notion of similarity. 
Clustering is often used for exploratory analysis and/or as a component of a 
hierarchical supervised learning pipeline (in which distinct classifiers or 
regression models are trained for each cluster). 
MLlib supports k-means clustering, one of the most commonly used clustering 
algorithms that clusters the data points into predefined number of clusters.
Example Data 
battery, cpu, RAM, wifi, exception, charging count 
70.00 15.00 70.00 89.00 3.00, 3 
75.00 16.00 68.00 90.00 4.00, 0 
60.00 19.00 67.00 90.00 3.00, 0 
65.00 19.00 67.00 90.00 3.00, 0 
67.00 17.00 67.00 90.00 3.00, 0 
68.00 19.00 69.00 90.00 3.00, 0 
68.00 19.00 69.00 90.00 3.00, 0 
68.00 19.00 69.00 90.00 3.00, 0 
68.00 19.00 89.00 80.00 4.00, 0 
33.00 49.00 79.00 90.00 3.00, 0 
33.00 49.00 79.00 90.00 3.00, 0 
33.00 49.00 79.00 98.00 3.00, 0 
43.00 49.00 79.00 90.00 3.00, 0 
53.00 49.00 78.00 90.00 3.00, 0 
38.00 49.00 79.00 90.00 3.00, 0 
38.00 49.00 89.00 90.00 3.00, 0 
68.00 19.00 69.00 90.00 3.00, 0
Example Scala code 
import org.apache.spark.mllib.clustering.KMeans 
import org.apache.spark.mllib.linalg.Vectors 
val data = sc.textFile("data/device_anomaly.txt").map { line => Vectors.dense(line.split(' 
').map(_.toDouble))}.cache() 
val K = 3 
val maxIteration = 20 
val runs =20 
val clusters= KMeans.train(data, K, maxIteration, runs) 
val vectorsAndClusterIdx = data.map{ point => 
val prediction = clusters.predict(point) 
(point.toString, prediction) 
} 
vectorsAndClusterIdx.foreach ( k => printf(k.toString()))
Normalize 
data.unpersist(true) 
val numCols = data.take(1)(0).length 
val n = data.count 
val sums = data.reduce((a,b) => a.zip(b).map(t => t._1 + t._2)) 
val sumSquares = data.fold(new Array[Double](numCols)) ((a,b) => a.zip(b).map(t => t._1 + t._2*t._2)) 
val stdevs = sumSquares.zip(sums).map { case(sumSq,sum) => sqrt(n*sumSq - sum*sum)/n } 
val means = sums.map(_ / n) 
val normalizedData = data.map( 
(_,means,stdevs).zipped.map((value,mean,stdev) => 
if (stdev <= 0) (value-mean) else 
(value-mean)/stdev)).cache() 
val kScores = (50 to 120 by 10).par.map(k => (k, clusteringScore(normalizedData, k)))
Result 
([70.0,15.0,70.0,89.0,3.0],2) 
([33.0,49.0,79.0,90.0,3.0],1) 
([75.0,16.0,68.0,90.0,4.0],2) 
([33.0,49.0,79.0,90.0,3.0],1) 
([60.0,19.0,67.0,90.0,3.0],2) 
([33.0,49.0,79.0,98.0,3.0],1) 
([65.0,19.0,67.0,90.0,3.0],2) 
([43.0,49.0,79.0,90.0,3.0],1) 
([67.0,17.0,67.0,90.0,3.0],2) 
([53.0,49.0,78.0,90.0,3.0],1) 
([68.0,19.0,69.0,90.0,3.0],2) 
([38.0,49.0,79.0,90.0,3.0],1) 
([68.0,19.0,69.0,90.0,3.0],2) 
([38.0,49.0,89.0,90.0,3.0],1) 
([68.0,19.0,69.0,90.0,3.0],2) 
([68.0,19.0,69.0,90.0,3.0],2) 
([68.0,19.0,89.0,80.0,4.0],0)
Heatmap ( sample ) 
Venue MAC Time CPU Battery 
Sneakers 0A-94-05- 
F7-93 
9/1/2014 
7:30:20 
89% 13% 
McCoverys 0A-94-05- 
F7-76 
9/3/2014 
5:30:20 
73% 10% 
...
References 
● https://www.youtube.com/watch?v=TC5cKYBZAeI 
● https://www.youtube.com/watch?v=FjhRkfAuU7I 
● http://www.ebaytechblog.com/2014/05/28/using-spark- 
to-ignite-data-analytics/#.VAc0PWRdXCw 
● http://stanford.edu/~rezab/sparkworkshop/slides/xia 
ngrui.pdf 
● http://stanford.edu/~rezab/sparkworkshop/slides/xia 
ngrui.pdf

More Related Content

What's hot

kubernates and micro-services
kubernates and micro-serviceskubernates and micro-services
kubernates and micro-services
Megha Sahu
 
Node Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsNode Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functions
Matt Lavin
 
CloudAnts - Kubernetes
CloudAnts - KubernetesCloudAnts - Kubernetes
CloudAnts - Kubernetes
Aron Beurskens
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
Eueung Mulyana
 
Serverless on Kubernetes
Serverless on KubernetesServerless on Kubernetes
Serverless on Kubernetes
Sebastien Goasguen
 
Using Kubernetes to deploy Django in GCP
Using Kubernetes to deploy Django in GCPUsing Kubernetes to deploy Django in GCP
Using Kubernetes to deploy Django in GCP
Walter Liu
 
Cloud brew cloudcamp
Cloud brew cloudcampCloud brew cloudcamp
Cloud brew cloudcamp
Henry Been
 
Enable IPv6 on Route53 AWS ELB, docker and node App
Enable IPv6 on Route53 AWS ELB, docker and  node AppEnable IPv6 on Route53 AWS ELB, docker and  node App
Enable IPv6 on Route53 AWS ELB, docker and node App
Fyllo
 
Automating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageAutomating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngage
Vishal Uderani
 
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Matt Butcher
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!
Nicolas (Nick) Barcet
 
Kubeflow control plane
Kubeflow control planeKubeflow control plane
Kubeflow control plane
Weiqiang Zhuang
 
Shelly cloud & heroku & engineyard. Pros & Cons
Shelly cloud & heroku & engineyard. Pros & ConsShelly cloud & heroku & engineyard. Pros & Cons
Shelly cloud & heroku & engineyard. Pros & Cons
Giedrius Rimkus
 
Kubernetes and Amazon ECS
Kubernetes and Amazon ECSKubernetes and Amazon ECS
Kubernetes and Amazon ECS
Geert Pante
 
Helm – The package manager for Kubernetes
Helm – The package manager for KubernetesHelm – The package manager for Kubernetes
Helm – The package manager for Kubernetes
FabianRosenthal1
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
Philipp Garbe
 
KubeCon 2018 - Running VM Workloads Side by Side with Container Workloads
KubeCon 2018 -  Running VM Workloads Side by Side with Container Workloads KubeCon 2018 -  Running VM Workloads Side by Side with Container Workloads
KubeCon 2018 - Running VM Workloads Side by Side with Container Workloads
loodse
 
Multi cloud Serverless platform using Kubernetes
Multi cloud Serverless platform using KubernetesMulti cloud Serverless platform using Kubernetes
Multi cloud Serverless platform using Kubernetes
Fahri Yardımcı
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
Joerg Henning
 
Google Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKEGoogle Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKE
Simon Su
 

What's hot (20)

kubernates and micro-services
kubernates and micro-serviceskubernates and micro-services
kubernates and micro-services
 
Node Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsNode Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functions
 
CloudAnts - Kubernetes
CloudAnts - KubernetesCloudAnts - Kubernetes
CloudAnts - Kubernetes
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Serverless on Kubernetes
Serverless on KubernetesServerless on Kubernetes
Serverless on Kubernetes
 
Using Kubernetes to deploy Django in GCP
Using Kubernetes to deploy Django in GCPUsing Kubernetes to deploy Django in GCP
Using Kubernetes to deploy Django in GCP
 
Cloud brew cloudcamp
Cloud brew cloudcampCloud brew cloudcamp
Cloud brew cloudcamp
 
Enable IPv6 on Route53 AWS ELB, docker and node App
Enable IPv6 on Route53 AWS ELB, docker and  node AppEnable IPv6 on Route53 AWS ELB, docker and  node App
Enable IPv6 on Route53 AWS ELB, docker and node App
 
Automating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageAutomating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngage
 
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!
 
Kubeflow control plane
Kubeflow control planeKubeflow control plane
Kubeflow control plane
 
Shelly cloud & heroku & engineyard. Pros & Cons
Shelly cloud & heroku & engineyard. Pros & ConsShelly cloud & heroku & engineyard. Pros & Cons
Shelly cloud & heroku & engineyard. Pros & Cons
 
Kubernetes and Amazon ECS
Kubernetes and Amazon ECSKubernetes and Amazon ECS
Kubernetes and Amazon ECS
 
Helm – The package manager for Kubernetes
Helm – The package manager for KubernetesHelm – The package manager for Kubernetes
Helm – The package manager for Kubernetes
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
 
KubeCon 2018 - Running VM Workloads Side by Side with Container Workloads
KubeCon 2018 -  Running VM Workloads Side by Side with Container Workloads KubeCon 2018 -  Running VM Workloads Side by Side with Container Workloads
KubeCon 2018 - Running VM Workloads Side by Side with Container Workloads
 
Multi cloud Serverless platform using Kubernetes
Multi cloud Serverless platform using KubernetesMulti cloud Serverless platform using Kubernetes
Multi cloud Serverless platform using Kubernetes
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Google Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKEGoogle Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKE
 

Similar to Device status anomaly detection

Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Comsysto Reply GmbH
 
MLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
Max Kleiner
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
Jim Cooley
 
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
scalaconfjp
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
Samir Bessalah
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
TarunPaparaju
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Big Data Spain
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
Vivian S. Zhang
 
SISY 2008
SISY 2008SISY 2008
SISY 2008
Zoran Popovic
 
Machine Learning Model for M.S admissions
Machine Learning Model for M.S admissionsMachine Learning Model for M.S admissions
Machine Learning Model for M.S admissions
Omkar Rane
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
Adam Doyle
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
Shifu plugin-trainer and pmml-adapter
Shifu plugin-trainer and pmml-adapterShifu plugin-trainer and pmml-adapter
Shifu plugin-trainer and pmml-adapter
Lisa Hua
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
Guido Schmutz
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
Lviv Startup Club
 

Similar to Device status anomaly detection (20)

Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
 
MLconf NYC Xiangrui Meng
MLconf NYC Xiangrui MengMLconf NYC Xiangrui Meng
MLconf NYC Xiangrui Meng
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
 
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
 
SISY 2008
SISY 2008SISY 2008
SISY 2008
 
Machine Learning Model for M.S admissions
Machine Learning Model for M.S admissionsMachine Learning Model for M.S admissions
Machine Learning Model for M.S admissions
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Shifu plugin-trainer and pmml-adapter
Shifu plugin-trainer and pmml-adapterShifu plugin-trainer and pmml-adapter
Shifu plugin-trainer and pmml-adapter
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 

Recently uploaded

Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 

Recently uploaded (20)

Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 

Device status anomaly detection

  • 1. Device anomaly detection using Spark k-means 9/3/2014
  • 2. Introduction Detect device anomaly based on the device information ( feature property vector ) ● battery % ● cpu % ● RAM % ● wifi strength ● build number ( in numerical ) ● exception ● charging ● gps long ● gps lat ● bundle version
  • 3. K-means clustering Clustering is an unsupervised learning problem whereby we aim to group subsets of entities with one another based on some notion of similarity. Clustering is often used for exploratory analysis and/or as a component of a hierarchical supervised learning pipeline (in which distinct classifiers or regression models are trained for each cluster). MLlib supports k-means clustering, one of the most commonly used clustering algorithms that clusters the data points into predefined number of clusters.
  • 4. Example Data battery, cpu, RAM, wifi, exception, charging count 70.00 15.00 70.00 89.00 3.00, 3 75.00 16.00 68.00 90.00 4.00, 0 60.00 19.00 67.00 90.00 3.00, 0 65.00 19.00 67.00 90.00 3.00, 0 67.00 17.00 67.00 90.00 3.00, 0 68.00 19.00 69.00 90.00 3.00, 0 68.00 19.00 69.00 90.00 3.00, 0 68.00 19.00 69.00 90.00 3.00, 0 68.00 19.00 89.00 80.00 4.00, 0 33.00 49.00 79.00 90.00 3.00, 0 33.00 49.00 79.00 90.00 3.00, 0 33.00 49.00 79.00 98.00 3.00, 0 43.00 49.00 79.00 90.00 3.00, 0 53.00 49.00 78.00 90.00 3.00, 0 38.00 49.00 79.00 90.00 3.00, 0 38.00 49.00 89.00 90.00 3.00, 0 68.00 19.00 69.00 90.00 3.00, 0
  • 5. Example Scala code import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors val data = sc.textFile("data/device_anomaly.txt").map { line => Vectors.dense(line.split(' ').map(_.toDouble))}.cache() val K = 3 val maxIteration = 20 val runs =20 val clusters= KMeans.train(data, K, maxIteration, runs) val vectorsAndClusterIdx = data.map{ point => val prediction = clusters.predict(point) (point.toString, prediction) } vectorsAndClusterIdx.foreach ( k => printf(k.toString()))
  • 6. Normalize data.unpersist(true) val numCols = data.take(1)(0).length val n = data.count val sums = data.reduce((a,b) => a.zip(b).map(t => t._1 + t._2)) val sumSquares = data.fold(new Array[Double](numCols)) ((a,b) => a.zip(b).map(t => t._1 + t._2*t._2)) val stdevs = sumSquares.zip(sums).map { case(sumSq,sum) => sqrt(n*sumSq - sum*sum)/n } val means = sums.map(_ / n) val normalizedData = data.map( (_,means,stdevs).zipped.map((value,mean,stdev) => if (stdev <= 0) (value-mean) else (value-mean)/stdev)).cache() val kScores = (50 to 120 by 10).par.map(k => (k, clusteringScore(normalizedData, k)))
  • 7. Result ([70.0,15.0,70.0,89.0,3.0],2) ([33.0,49.0,79.0,90.0,3.0],1) ([75.0,16.0,68.0,90.0,4.0],2) ([33.0,49.0,79.0,90.0,3.0],1) ([60.0,19.0,67.0,90.0,3.0],2) ([33.0,49.0,79.0,98.0,3.0],1) ([65.0,19.0,67.0,90.0,3.0],2) ([43.0,49.0,79.0,90.0,3.0],1) ([67.0,17.0,67.0,90.0,3.0],2) ([53.0,49.0,78.0,90.0,3.0],1) ([68.0,19.0,69.0,90.0,3.0],2) ([38.0,49.0,79.0,90.0,3.0],1) ([68.0,19.0,69.0,90.0,3.0],2) ([38.0,49.0,89.0,90.0,3.0],1) ([68.0,19.0,69.0,90.0,3.0],2) ([68.0,19.0,69.0,90.0,3.0],2) ([68.0,19.0,89.0,80.0,4.0],0)
  • 8. Heatmap ( sample ) Venue MAC Time CPU Battery Sneakers 0A-94-05- F7-93 9/1/2014 7:30:20 89% 13% McCoverys 0A-94-05- F7-76 9/3/2014 5:30:20 73% 10% ...
  • 9. References ● https://www.youtube.com/watch?v=TC5cKYBZAeI ● https://www.youtube.com/watch?v=FjhRkfAuU7I ● http://www.ebaytechblog.com/2014/05/28/using-spark- to-ignite-data-analytics/#.VAc0PWRdXCw ● http://stanford.edu/~rezab/sparkworkshop/slides/xia ngrui.pdf ● http://stanford.edu/~rezab/sparkworkshop/slides/xia ngrui.pdf