SlideShare a Scribd company logo
1 of 40
Download to read offline
Intelligent Monitoring

        Denis A. Vieira Jr.
       Ricardo Clemente
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
Intelligent Monitoring


Motivation:

    Only ponctual monitoring available

    Decrease time to repair incidents

    Proactive monitoring

    Realistic view from live environment
Intelligent Monitoring


Motivation:

    Learn (identify patterns )

    Automation

    Store historical data with no loss

    Improve credibility and Situational Awareness
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
Intelligent Monitoring


 Where are we?:

    Lots of information (1200 servers with more than 14000 monitors)
     – more than 40000 graphs being plot

    Lots of tools for monitoring running (SME, IPMonitor, Cricket,
     SiteScope, SiteSeer, Logs)

    Difficulties with specific customizations, performance and cost

    No credibility (lots of emails) with alarms. But much better than
     before.
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
Intelligent Monitoring


Were are we going:

    Use of events. E.g.: Appenders for log frameworks to integrate
     information from applications

    Knowledge to antecipate undesired situations

    Unified interface for monitoring

    Root cause detection
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
Intelligent Monitoring


Action Plan:

    Unify the monitoring tools with Nagios (scalability and integration)

    Integrate Nagios with correlation system using NEB (Nagios Event
     Broker)
    available ate:
         code.google.com/p/neb2activemq

    Map event and systems to correlate
   (manual and analytic task)
Intelligent Monitoring

Summary:


 Motivation
 Where are we?
 Where are we going?
 Action Plan
 Event Correlation
    Orverview and system architecture
    Event Bus
    Correlation tecnique
    Correlation egine
    Visualization
    Machine Learning
    Project
Overview and system architecture

 Modular and event-driven architecture



                                  CORRELATION
             COLLECTOR
                                    ENGINE




                              EVENT BUS




                     MACHINE LEARN        VISUALIZATION
Overview and system architecture
What is the system architecture?

 Unique bus for message exchange
 Modules are separte process for operating system and can be on
  differente machines
 Modules can publish / subscribe to queue / topic from bus

Why an Event Driven Architecture ?

 Loose coupled e Distributed
    Less intrusive for monitored systems
    Modules are independent
Event bus
Open source project

Chosen Apache ActiveMQ:
 Stable
 Performance
 Active Comunity
 Conectivity
     JMS
     STOMP
     REST
     XMPP (...)
Event Bus
Message format

 JSON ( not XML)
     Simplicity
 Structure
     Header : channel type(queue or topic) and event type
     Body: data



 $ curl -d "type=queue&body={'idle'=70, 'sys’=20,
 'usr'=10, 'host'='ws122' }&eventtype=CPU"
 http://barramento/message/events;
Correlation Technique

CEP (Complex Event Processing )
 Technology that enables processing mutiple events in real time with
  the goal to identify meaningful events
 Based on rules or queries (“SQL like”)
 Queries created on execution time

History
 On1995, professor David Luckham from Stanford, working on Rapide
  project coined the term CEP
 Database research topic: Data Stream Management Systems (DSMS)
Correlation technique
                 “upside down database”

 query                answer        continuos
                                                                answer
                                    query


                                                Processamento de
         Query Processing
                                dados               consultas            dados
             Memory                                Memória


                                                  Data stream



            Dados
             Dados
               Data

    Persistents relations
Correlation Technique
 Marketing
 Trend(Buzz)
  CEP market is estimated on 460 milion dolars by 2010 (source: IEEE
   Computer Society – April 2009)

 Useful where there are data streams and necessity to extract
   information on real time from that data
  Financial Market
  Logistic process (RFID)
  Airport control
  ICUs
  Datacenters
Correlation Technique
 Big Players
Correlation Technique
 Open Source Players
 Academic projects:
  STREAM – Stanford – 2003 (officialy deprecated)
  TelegraphCQ – Berkeley - 2003
      Based on PostgreSQL 7.3.2
      No activity
      Cayuga – Cornell

 From the industry:
 Esper, a codehaus project complete in terms features
  Compact syntax and flexible
  Excelent documentation
  Performance
  Our choice!
Correlation Engine
 Application




                     If session raised 10% on the
                     last 3 min, and the average
                     from Servers cpu didn’t raise
                     5%, and Mysql slow queries
                     are above 10, so there is a
                     database retention causing
                     users to queue
Correlation Engine
Application
                t – 3 min      t


              Vip           session

                t – 3 min      t


              Server        cpu_usr

                                   t


              Mysql         slow_query
Correlation Engine
 Application

 SELECT Server.host , Server.cpu_usr, Server_PAST.cpu_usr, Vip.session,
   Vip_PAST.session, Mysql.slow_query
   FROM
          Server.win:time(1 min) as Server,
          Server.win:ext_timed(current_timestamp(), 3 min) as Server_PAST,
          Vip.win:time(1 min) as Vip,
          Vip.win:ext_timed(current_timestamp(), 3 min) as Vip_PAST ,
          Mysql.win:time (1min) as Mysql
   HAVING
          Vip.session > Vip_PAST.session * 1.10 AND
          avg(Server.cpu_usr) < avg (Server_PAST.cpu_usr) * 1.05 AND
          Mysql.slow_query > 10
Correlation Engine
 Identifing na outlier
     select host, free, avg(free)
     from Memory.win:time(240 sec) group by host
     having free < avg(free)

 Events sequence
    select * from
       pattern [every Memory(free < 10) ->
            (timer:interval(60 sec) and Log(text like ‘%OutOfMemory%’)) ]

 Schedule and extensions
     select idle from pattern [every timer:at(*, [16:22], *, [0,3], *) ].win:time(30
        sec), CPU.win:time(30) where idle < 30 AND Filter.isInNode(id,
        “Sports.BigFarm")
Motor de correlação
 Performance Esper

      Item                     Especificação
      HW Servidor Esper        2 x Intel Xeon 5130 2GHz (4 cores total), 16GB RAM
      VM config                -Xms2g -Xmx2g -Xns128m -Xgc:gencon


  Consulta                         # cons.    evt/s     Latência      Latência         Nota
                                                                      média
  select '$' as ticker from    1000           519 728 99.66% <        2.8us            CPU com 85%,
  Market(ticker='$').win:lengt                        10us                             70 Mbit/s
  h(1000).stat:weighted_avg('p
  rice', 'volume') output last
  every 30 seconds


Source: Esper Performance - http://docs.codehaus.org/display/ESPER/Esper+performance
Correlation engine
 Process inside Correlaion engine
Visualization – Console
Quering the live environment
Visualization – Troubleshooting
Antecipating and solving incidents quicker
Visualization- Dashboard
Consolidate view of environment
What about unseen problems?
Machine Learning

Choice for non-supervised and incremental algorithms

Incremental PCA
 Transforms a number of possible correlated variables in a minor
  number of non-correlated, the principal componnents
 A change on principal componnents means a broken correlation, or
  annomaly
 Can be used for data compression

Inspired on a paper from Carnegie Mellon University (Hoke et al. 2006)
Source: http://www.pdl.cmu.edu/PDL-FTP/SelfStar/osr_sub.pdf


Implementation had two main challenges: measures with missing values
  and different scales
Machine Learning

60 input signals
Machine Learning

Summarized on 1 principal component + gerenation matriz
Machine Learning


                      Second principal component




                   sensibility




                                              three annomaly
Project

Status

 Developed all functionalities

 Algorithms being validated through tests with
  RRDs and meeting with operation team

 Performance tests on going

 System on live enviroment with reduced scope
Project at Globo.com – Next challenges


Scale
    Events“Sharding”
    Rule balance
    Cache

Otimize algorithm
    Adaptative control of memory and sensibility parameters
    Insert a supervisioned layer
    Other algorithms to cooperate
Intelligent Monitoring

      Final considerations
References




       http://delicious.com/fisl10
Questions

 Contacts
   Denis A. Vieira Jr
   denis@corp.globo.com (www.globo.com)
   Ricardo Clemente
   ricardo@intelie.com.br (www.intelie.com.br)

 Globo.com stand
    This afternoon

 Raise your hand!

More Related Content

Viewers also liked

INTELIE - Inteligência em Operação
INTELIE - Inteligência em OperaçãoINTELIE - Inteligência em Operação
INTELIE - Inteligência em OperaçãoDC-DinsmoreCompass
 
Security Events correlation with ESPER
Security Events correlation with ESPERSecurity Events correlation with ESPER
Security Events correlation with ESPERNikolay Klendar
 
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...Intelie
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSAmazon Web Services
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperAntónio Alegria
 

Viewers also liked (6)

Intelie BPMS
Intelie BPMSIntelie BPMS
Intelie BPMS
 
INTELIE - Inteligência em Operação
INTELIE - Inteligência em OperaçãoINTELIE - Inteligência em Operação
INTELIE - Inteligência em Operação
 
Security Events correlation with ESPER
Security Events correlation with ESPERSecurity Events correlation with ESPER
Security Events correlation with ESPER
 
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...
Gartner ITxpo 2015 - 3 casos de operações digitais mais inteligentes usando r...
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 

Similar to Intelligent Monitoring

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...In-Memory Computing Summit
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresIvo Andreev
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05Rajesh Gupta
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overviewIstván Dávid
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesBoyan Dimitrov
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
 
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...spChains: A Declarative Framework for Data Stream Processing in Pervasive App...
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...Fulvio Corno
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyBATMUNHMUNHZAYA
 
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Priyanka Aash
 
Adventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaAdventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaMarcel Birkner
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...Altinity Ltd
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 

Similar to Intelligent Monitoring (20)

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...spChains: A Declarative Framework for Data Stream Processing in Pervasive App...
spChains: A Declarative Framework for Data Stream Processing in Pervasive App...
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
 
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
 
Adventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaAdventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and Instana
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Intelligent Monitoring

  • 1. Intelligent Monitoring Denis A. Vieira Jr. Ricardo Clemente
  • 2. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation
  • 3. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation
  • 4. Intelligent Monitoring Motivation:  Only ponctual monitoring available  Decrease time to repair incidents  Proactive monitoring  Realistic view from live environment
  • 5. Intelligent Monitoring Motivation:  Learn (identify patterns )  Automation  Store historical data with no loss  Improve credibility and Situational Awareness
  • 6. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation
  • 7. Intelligent Monitoring Where are we?:  Lots of information (1200 servers with more than 14000 monitors) – more than 40000 graphs being plot  Lots of tools for monitoring running (SME, IPMonitor, Cricket, SiteScope, SiteSeer, Logs)  Difficulties with specific customizations, performance and cost  No credibility (lots of emails) with alarms. But much better than before.
  • 8. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation
  • 9. Intelligent Monitoring Were are we going:  Use of events. E.g.: Appenders for log frameworks to integrate information from applications  Knowledge to antecipate undesired situations  Unified interface for monitoring  Root cause detection
  • 10. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation
  • 11. Intelligent Monitoring Action Plan:  Unify the monitoring tools with Nagios (scalability and integration)  Integrate Nagios with correlation system using NEB (Nagios Event Broker)  available ate: code.google.com/p/neb2activemq  Map event and systems to correlate (manual and analytic task)
  • 12. Intelligent Monitoring Summary:  Motivation  Where are we?  Where are we going?  Action Plan  Event Correlation  Orverview and system architecture  Event Bus  Correlation tecnique  Correlation egine  Visualization  Machine Learning  Project
  • 13. Overview and system architecture  Modular and event-driven architecture CORRELATION COLLECTOR ENGINE EVENT BUS MACHINE LEARN VISUALIZATION
  • 14. Overview and system architecture What is the system architecture?  Unique bus for message exchange  Modules are separte process for operating system and can be on differente machines  Modules can publish / subscribe to queue / topic from bus Why an Event Driven Architecture ?  Loose coupled e Distributed  Less intrusive for monitored systems  Modules are independent
  • 15. Event bus Open source project Chosen Apache ActiveMQ:  Stable  Performance  Active Comunity  Conectivity  JMS  STOMP  REST  XMPP (...)
  • 16. Event Bus Message format  JSON ( not XML)  Simplicity  Structure  Header : channel type(queue or topic) and event type  Body: data $ curl -d "type=queue&body={'idle'=70, 'sys’=20, 'usr'=10, 'host'='ws122' }&eventtype=CPU" http://barramento/message/events;
  • 17. Correlation Technique CEP (Complex Event Processing )  Technology that enables processing mutiple events in real time with the goal to identify meaningful events  Based on rules or queries (“SQL like”)  Queries created on execution time History  On1995, professor David Luckham from Stanford, working on Rapide project coined the term CEP  Database research topic: Data Stream Management Systems (DSMS)
  • 18. Correlation technique “upside down database” query answer continuos answer query Processamento de Query Processing dados consultas dados Memory Memória Data stream Dados Dados Data Persistents relations
  • 19. Correlation Technique Marketing Trend(Buzz)  CEP market is estimated on 460 milion dolars by 2010 (source: IEEE Computer Society – April 2009) Useful where there are data streams and necessity to extract information on real time from that data  Financial Market  Logistic process (RFID)  Airport control  ICUs  Datacenters
  • 21. Correlation Technique Open Source Players Academic projects:  STREAM – Stanford – 2003 (officialy deprecated)  TelegraphCQ – Berkeley - 2003  Based on PostgreSQL 7.3.2  No activity  Cayuga – Cornell From the industry: Esper, a codehaus project complete in terms features  Compact syntax and flexible  Excelent documentation  Performance  Our choice!
  • 22. Correlation Engine Application If session raised 10% on the last 3 min, and the average from Servers cpu didn’t raise 5%, and Mysql slow queries are above 10, so there is a database retention causing users to queue
  • 23. Correlation Engine Application t – 3 min t Vip session t – 3 min t Server cpu_usr t Mysql slow_query
  • 24. Correlation Engine Application SELECT Server.host , Server.cpu_usr, Server_PAST.cpu_usr, Vip.session, Vip_PAST.session, Mysql.slow_query FROM Server.win:time(1 min) as Server, Server.win:ext_timed(current_timestamp(), 3 min) as Server_PAST, Vip.win:time(1 min) as Vip, Vip.win:ext_timed(current_timestamp(), 3 min) as Vip_PAST , Mysql.win:time (1min) as Mysql HAVING Vip.session > Vip_PAST.session * 1.10 AND avg(Server.cpu_usr) < avg (Server_PAST.cpu_usr) * 1.05 AND Mysql.slow_query > 10
  • 25. Correlation Engine Identifing na outlier select host, free, avg(free) from Memory.win:time(240 sec) group by host having free < avg(free) Events sequence select * from pattern [every Memory(free < 10) -> (timer:interval(60 sec) and Log(text like ‘%OutOfMemory%’)) ] Schedule and extensions select idle from pattern [every timer:at(*, [16:22], *, [0,3], *) ].win:time(30 sec), CPU.win:time(30) where idle < 30 AND Filter.isInNode(id, “Sports.BigFarm")
  • 26. Motor de correlação Performance Esper Item Especificação HW Servidor Esper 2 x Intel Xeon 5130 2GHz (4 cores total), 16GB RAM VM config -Xms2g -Xmx2g -Xns128m -Xgc:gencon Consulta # cons. evt/s Latência Latência Nota média select '$' as ticker from 1000 519 728 99.66% < 2.8us CPU com 85%, Market(ticker='$').win:lengt 10us 70 Mbit/s h(1000).stat:weighted_avg('p rice', 'volume') output last every 30 seconds Source: Esper Performance - http://docs.codehaus.org/display/ESPER/Esper+performance
  • 27. Correlation engine Process inside Correlaion engine
  • 28. Visualization – Console Quering the live environment
  • 29. Visualization – Troubleshooting Antecipating and solving incidents quicker
  • 31. What about unseen problems?
  • 32. Machine Learning Choice for non-supervised and incremental algorithms Incremental PCA  Transforms a number of possible correlated variables in a minor number of non-correlated, the principal componnents  A change on principal componnents means a broken correlation, or annomaly  Can be used for data compression Inspired on a paper from Carnegie Mellon University (Hoke et al. 2006) Source: http://www.pdl.cmu.edu/PDL-FTP/SelfStar/osr_sub.pdf Implementation had two main challenges: measures with missing values and different scales
  • 34. Machine Learning Summarized on 1 principal component + gerenation matriz
  • 35. Machine Learning Second principal component sensibility three annomaly
  • 36. Project Status  Developed all functionalities  Algorithms being validated through tests with RRDs and meeting with operation team  Performance tests on going  System on live enviroment with reduced scope
  • 37. Project at Globo.com – Next challenges Scale Events“Sharding” Rule balance Cache Otimize algorithm Adaptative control of memory and sensibility parameters Insert a supervisioned layer Other algorithms to cooperate
  • 38. Intelligent Monitoring Final considerations
  • 39. References http://delicious.com/fisl10
  • 40. Questions Contacts Denis A. Vieira Jr denis@corp.globo.com (www.globo.com) Ricardo Clemente ricardo@intelie.com.br (www.intelie.com.br) Globo.com stand This afternoon Raise your hand!