Discussion for Anomaly & Prediction Engine

•Download as PPTX, PDF•

0 likes•393 views

HisashiOsanai

This slide is prepared for OpenStack Monasca Mitaka Mid-Cycle

Software

Copyright 2016 FUJITSU LIMITED
Discussion for Anomaly &
Prediction Engine
04 Feb. 2016
Hisashi Osanai
0

Agenda
Copyright 2016 FUJITSU LIMITED
 POC Introduction
 POC Demo
 System Configuration
 Parallel distributed processing platform
 Ex. Batch process / Stream process
 Findings/Problems from POC
 Why I’m interested in Monasca …
 Current Concerns and Approach
1

POC Demo
Copyright 2016 FUJITSU LIMITED2

Copyright 2016 FUJITSU LIMITED
Demo System Configuration
Master server Visualization
server
OS
Elastic
Search
Apache
(httpd)
Kibana
JDK
OS
collection/store
definition
Hadoop
Sparkfluentd
RabbitMQ
Parallel distributed
processing platform
process
definition
Stream
process
SparkStreaming/
SparkSQL
Data
converter
Target
server
#1
OS
fluentd
collection
definition fluentd
collection/store
definition
Slave server
#3
Spark
OS
Hadoop
JDK
JDK
Data collection target
Slave server
#2
Spark
OS
Hadoop
JDK
Slave server
#1
Spark
OS
Hadoop
JDK
Batch
process
Task
controller
Target
server
#2
OS
fluentd
collection
definition
Target server
#n
OS
fluentd
collection
definition
3

Copyright 2016 FUJITSU LIMITED
Parallel distributed processing platform
Apache Spark(Core)
SparkSQL
(SQL query)
SparkStreaming
(Event stream
processing)
Parallel distributed processing platform
Job
Definition
(XML)
RabbitMQ
(Message
broker)Fluentd
(Data
collector)
HDFS
(Distributed File System )
ElasticSearch
(Real time search engine)
Kibana
(Data
visualization)
Stream
data
reception
Data
process
with SQL
Create
time-series
data
Analysis
process
Ex. “stream data analysis” in the anomaly detection process
 Enable to execute Stream process and Batch process
 Fast-acting data conversion based on XML-based Job
Definition
4

Copyright 2016 FUJITSU LIMITED
Ex. Batch process
Parallel distributed processing platform
Job definition (XML)
TASK:1
Read “master data”
SparkBatch
Application
TASK:2
Read “Web access log”
Web access log
Analysis
TASK:3
Query and Save
Spark Cluster
HDFS
HDFS
 Analyze a lot of Web access log on file system
5

Copyright 2016 FUJITSU LIMITED
Ex. Stream process
Parallel distributed processing platform
Job definition (XML)
RabbitMQ
Receiver
RabbitMQ
TASK:1
Process and store
the CPU information
HDFS
Spark
Streaming
Application
TASK:2
Process and store
the MEM information
Analysis
Target server
 Analyze statistics information (CPU/MEM) in real-time
6

Copyright 2016 FUJITSU LIMITED
Findings/Problems from POC
 Needs manpower for data collection on target servers
 Have discussions with customers to define collecting data and
then configure fluentd agents (Num of POCs is limited)
 Difficult to store experiences of IT analytics
 Data and its format are different each customer so suitable
anomaly detection libraries are also different
 Difficult to catch up for anomaly detection libraries
 Rapid tech evolution for Machine Learning such as Mllib,
TensorFlow, CNTK and so on
7

Copyright 2016 FUJITSU LIMITED
 Seems to solve two problems from POC
 Needs manpower for data collection on target servers
•Monasca provides agents for OpenStack env so we just use them.
 Difficult to store experiences of IT analytics
•Data come from Monasca agents and the format is stable. So we use
the data as stable input and are looking for “which libraries are
suitable for this env which is monitored by Monasca”
 Add a catching function to Monasca
 Boosts Monasca sales
•A lot of our customers are interested in IT analytics
•Fujitsu sells Monasca-based product 
Why I’m interested in Monasca…
8

Copyright 2016 FUJITSU LIMITED
 Current Concerns
 Performance for real time anomaly detection (Storm vs.
ApacheStreaming)
 Rapid tech evolution for Machine Learning (Needs to have plugin
arch for the libraries)
 Approach (a base for discussion)
 How to move Anomaly & Prediction Engine (APE) dev ahead?
 Idea
•First Rebase current prototype on Monasca master (If possible, I would
like to do this with Roland’s help)
•Then use it to find out problems 
Current Concerns & Approach
9

Discussion for Anomaly & Prediction Engine

What's hot

The SparkSQL things you maybe confusevito jeng

Sydney Spark Meetup - September 2015Andy Huang

Sydney Apache Spark Meetup - Spark Natural Language ProcessingAndy Huang

What's New in Spark 2?Eyal Ben Ivri

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Matt Fuller

Distributed ML in Apache SparkDatabricks

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks

Presto: SQL-on-anythingDataWorks Summit

Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson

PrestoKnoldus Inc.

Presto - Analytical Database. Overview and use cases.Wojciech Biela

Javantura v4 - Getting started with Apache Spark - Dinko SrkočHUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association

Presto Meetup 2016 Small StartHiroshi Toyama

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...Databricks

TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...Databricks

Presto @ Facebook: Past, Present and FutureDataWorks Summit

Big Telco - Yousun JeongSpark Summit

Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney

A Journey into Databricks' Pipelines: Journey and Lessons LearnedDatabricks

What's hot (20)

The SparkSQL things you maybe confuse

Sydney Spark Meetup - September 2015

Sydney Apache Spark Meetup - Spark Natural Language Processing

What's New in Spark 2?

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)

Distributed ML in Apache Spark

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Presto: SQL-on-anything

Open Source Big Data Ingestion - Without the Heartburn!

Presto

Presto - Analytical Database. Overview and use cases.

Javantura v4 - Getting started with Apache Spark - Dinko Srkoč

Presto Meetup 2016 Small Start

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...

TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...

Presto @ Facebook: Past, Present and Future

Big Telco - Yousun Jeong

Apache Arrow -- Cross-language development platform for in-memory data

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Similar to Discussion for Anomaly & Prediction Engine

Data streamingAlberto Paro

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA

Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks

Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit

Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi

Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi

SnappyData Toronto Meetup Nov 2017SnappyData

SnappyData @ Seattle Spark MeetupSnappyData

Keep Calm and Use ParserOPNFV

DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann

C5 journey to_the_cloud_with_oracle_sparcDr. Wilfred Lin (Ph.D.)

Cloud lunch and learn real-time streaming in azureTimothy Spann

Getting Started with Spark ScalaKnoldus Inc.

Oracle CloudMarketingArrowECS_CZ

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Exadata 12c New Features RMOUGFuad Arshad

Distributed messaging through KafkaDileep Kalidindi

Started with-apache-sparkHappiest Minds Technologies

Connect K of SMACK:pykafka, kafka-python or?Micron Technology

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis

Similar to Discussion for Anomaly & Prediction Engine (20)

Data streaming

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...

Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks

Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks

Overview of Apache Fink: The 4G of Big Data Analytics Frameworks

SnappyData Toronto Meetup Nov 2017

SnappyData @ Seattle Spark Meetup

Keep Calm and Use Parser

DBCC 2021 - FLiP Stack for Cloud Data Lakes

C5 journey to_the_cloud_with_oracle_sparc

Cloud lunch and learn real-time streaming in azure

Getting Started with Spark Scala

Oracle Cloud

Apache Kafka - Scalable Message-Processing and more !

Exadata 12c New Features RMOUG

Distributed messaging through Kafka

Started with-apache-spark

Connect K of SMACK:pykafka, kafka-python or?

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Recently uploaded

DNT_Corporate presentation know about usDynamic Netsoft

Professional Resume Template for Software DevelopersVinodh Ram

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Project Based Learning (A.I).pptx detail explanationkaushalgiri8080

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

Introduction to Decentralized Applications (dApps)Intelisync

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

What is Binary Language? Computer Number SystemsJheuzeDellosa

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

TECUNIQUE: Success Stories: IT Service providermohitmore19

Recently uploaded (20)

DNT_Corporate presentation know about us

Professional Resume Template for Software Developers

Salesforce Certified Field Service Consultant

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Hand gesture recognition PROJECT PPT.pptx

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Project Based Learning (A.I).pptx detail explanation

Unit 1.1 Excite Part 1, class 9, cbse...

Introduction to Decentralized Applications (dApps)

Unlocking the Future of AI Agents with Large Language Models

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

why an Opensea Clone Script might be your perfect match.pdf

Exploring iOS App Development: Simplifying the Process

Der Spagat zwischen BIAS und FAIRNESS (2024)

What is Binary Language? Computer Number Systems

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

A Secure and Reliable Document Management System is Essential.docx

TECUNIQUE: Success Stories: IT Service provider

Discussion for Anomaly & Prediction Engine

2. Agenda Copyright 2016 FUJITSU LIMITED  POC Introduction  POC Demo  System Configuration  Parallel distributed processing platform  Ex. Batch process / Stream process  Findings/Problems from POC  Why I’m interested in Monasca …  Current Concerns and Approach 1

4. Copyright 2016 FUJITSU LIMITED Demo System Configuration Master server Visualization server OS Elastic Search Apache (httpd) Kibana JDK OS collection/store definition Hadoop Sparkfluentd RabbitMQ Parallel distributed processing platform process definition Stream process SparkStreaming/ SparkSQL Data converter Target server #1 OS fluentd collection definition fluentd collection/store definition Slave server #3 Spark OS Hadoop JDK JDK Data collection target Slave server #2 Spark OS Hadoop JDK Slave server #1 Spark OS Hadoop JDK Batch process Task controller Target server #2 OS fluentd collection definition Target server #n OS fluentd collection definition 3

5. Copyright 2016 FUJITSU LIMITED Parallel distributed processing platform Apache Spark(Core) SparkSQL (SQL query) SparkStreaming (Event stream processing) Parallel distributed processing platform Job Definition (XML) RabbitMQ (Message broker)Fluentd (Data collector) HDFS (Distributed File System ) ElasticSearch (Real time search engine) Kibana (Data visualization) Stream data reception Data process with SQL Create time-series data Analysis process Ex. “stream data analysis” in the anomaly detection process  Enable to execute Stream process and Batch process  Fast-acting data conversion based on XML-based Job Definition 4

6. Copyright 2016 FUJITSU LIMITED Ex. Batch process Parallel distributed processing platform Job definition (XML) TASK:1 Read “master data” SparkBatch Application TASK:2 Read “Web access log” Web access log Analysis TASK:3 Query and Save Spark Cluster HDFS HDFS  Analyze a lot of Web access log on file system 5

7. Copyright 2016 FUJITSU LIMITED Ex. Stream process Parallel distributed processing platform Job definition (XML) RabbitMQ Receiver RabbitMQ TASK:1 Process and store the CPU information HDFS Spark Streaming Application TASK:2 Process and store the MEM information Analysis Target server  Analyze statistics information (CPU/MEM) in real-time 6

8. Copyright 2016 FUJITSU LIMITED Findings/Problems from POC  Needs manpower for data collection on target servers  Have discussions with customers to define collecting data and then configure fluentd agents (Num of POCs is limited)  Difficult to store experiences of IT analytics  Data and its format are different each customer so suitable anomaly detection libraries are also different  Difficult to catch up for anomaly detection libraries  Rapid tech evolution for Machine Learning such as Mllib, TensorFlow, CNTK and so on 7

9. Copyright 2016 FUJITSU LIMITED  Seems to solve two problems from POC  Needs manpower for data collection on target servers •Monasca provides agents for OpenStack env so we just use them.  Difficult to store experiences of IT analytics •Data come from Monasca agents and the format is stable. So we use the data as stable input and are looking for “which libraries are suitable for this env which is monitored by Monasca”  Add a catching function to Monasca  Boosts Monasca sales •A lot of our customers are interested in IT analytics •Fujitsu sells Monasca-based product  Why I’m interested in Monasca… 8

10. Copyright 2016 FUJITSU LIMITED  Current Concerns  Performance for real time anomaly detection (Storm vs. ApacheStreaming)  Rapid tech evolution for Machine Learning (Needs to have plugin arch for the libraries)  Approach (a base for discussion)  How to move Anomaly & Prediction Engine (APE) dev ahead?  Idea •First Rebase current prototype on Monasca master (If possible, I would like to do this with Roland’s help) •Then use it to find out problems  Current Concerns & Approach 9

Discussion for Anomaly & Prediction Engine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Discussion for Anomaly & Prediction Engine

Similar to Discussion for Anomaly & Prediction Engine (20)

Recently uploaded

Recently uploaded (20)

Discussion for Anomaly & Prediction Engine

Editor's Notes