Gain New Insights by Analyzing Machine Logs using Machine Data Analytics and BigInsights.
Half of Fortune 500 companies experience more than 80 hours of system down time annually. Spread evenly over a year, that amounts to approximately 13 minutes every day. As a consumer, the thought of online bank operations being inaccessible so frequently is disturbing. As a business owner, when systems go down, all processes come to a stop. Work in progress is destroyed and failure to meet SLA’s and contractual obligations can result in expensive fees, adverse publicity, and loss of current and potential future customers. Ultimately the inability to provide a reliable and stable system results in loss of $$$’s. While the failure of these systems is inevitable, the ability to timely predict failures and intercept them before they occur is now a requirement.
A possible solution to the problem can be found is in the huge volumes of diagnostic big data generated at hardware, firmware, middleware, application, storage and management layers indicating failures or errors. Machine analysis and understanding of this data is becoming an important part of debugging, performance analysis, root cause analysis and business analysis. In addition to preventing outages, machine data analysis can also provide insights for fraud detection, customer retention and other important use cases.
2. Relevant Operations Data is Huge
A Typical Enterprise of 5000 servers with 125 applications across 2 or 3
data centers generates in excess of 1.4 TB of data per day
Op
Daily Metric Output:
era
15-2 tional d
0%
per ata grow
year
ing
.
•250 Mb of event data from 125,000 Events
•125Mb of endpoint mgmt data from 5K servers
•12 Gb of performance data for 5000 servers
•1 Gb of performance for 5000 Virtual Machine
•8 Gb or Application middleware data
Assumptions: 40% of servers running
monitored middleware
Average 60 metrics each, collected every 15
minutes
• 9 Gb Storage Data per day: 175K fiber ports
Average PMDB insert 1000 bytes, 40
inserts/server
175 fiber ports,10 metrics per port, collected every 5
minutes, .5KB per port
•500 Mb Application transaction tracking data
25K volumes, 10 metrics per volume, .5KB per
for 125 Applications
volume
•1 Tb Log file data per day
5KB*(65K ports and volumes)*12*24 = 9.3 GB/day
200 Mb average per server (some will be
• 2Gb Network performance data for Data Center
smaller, some larger)
networks (not access networks)
Example: WAS instances typically
180x64 port Switches and 4 Routers to manage
produce 400MB-750MB logs/day
physical network.
•.35Tb Security data collected per day
Data flow of approximately 1TB unstructured data, and .4TB metric data per day,
Scaled to 20K servers, approx 4TB unstructured, 1.6TB metric data
3. Shifting market for IT Operations
Operational Visibility
APM Digest survey* of Senior IT Ops @ Fortune
500
−
50% growing dissatisfaction with traditional
performance management solutions for
Production IT
−
Inability to adapt to rapidly changing applications
& workloads
−
30% of them believe that they do not have a way
to proactively detect problems
−
Looking to operate on raw data and gain
actionable insights
IT Overwhelmed by data
IT Analytics solutions can predict, detect and
help solve problems by churning through piles of
data and translating this to understandable,
relevant information, and actionable insights.
* Source: APMDigest:
http://apmdigest.com/it-analytics-emerging-as-dissatisfaction-grows-with-apm-and-bsm-tools
4. Exploiting IBM’s breadth of Analytics Initiatives
Proactively mitigate risk, attain insights to optimize actions, and reduce cost of
ownership across Business, IT Operations, Asset Management, and more….
Simple ad-hoc and scheduled
Reporting to enable comparison of
multiple metrics and data-sources
Self-learning
capabilities to
automatically adapt
to change
Reduce false alerts
to lower
management costs
Notice problems
sooner and more
accurately
Leveraging analytics for
IT Operations
Performance
trending to plan
for growth
Automated
threshold setting for
quicker deployment
Detect capacity issues
prior to business
impact
Streaming data analytics to provide
realtime information and process Big
Data volumes easily
Predictive Analytics enables
forecasting and trending to provide
foresight in resource demand, capacity &
availability and clarify potential risks.
Provide holistic and accurate diagnosis
by using guiding technology with
behavioral learning capabilities.
Advanced correlation and pattern
recognition to identify and resolve
complex and undetectable events in realtime.
InfoSphere
BigInsights
5. IT Operations needs analytics to predict,
to search and to optimize
• How can we get early warning of failures in my critical retail applications?
Predict
• Can we predict/project failure occurrences for specific asset types?
• Can I predict which KPIs are going to cause application issues without manually configuring
thresholds? I have 100s of thousands of KPIs.
• I want to predict my online banking outages and take corrective actions before customers hit
them.
• What is driving my high maintenance costs and what can I do to address this?
• How do we make sense out of the terabytes of metric and log data that is generated by
our applications and the infrastructure on which they run to isolate problems and reduce
downtime?
Search
• How can I reduce reserved material inventory due to work order backlog?
• Can I use analysis of my channel traffic analysis to achieve improved customer insight
and intelligence?
• “What-if” we change our preventive maintenance strategy?
• Help me track capacity and performance of applications & services in cloud / virtual
environments, when do I need to add more capacity?
• Show me how to reduce cost of running my virtual infrastructure & making it more
compliant with best practices.
Optimize
• How should I plan maintenance to efficiently keep my assets operational, given what
I know today about my six month resource availability.
6. How the Big Data Platform Can Help?
Raghuram Velega - IBM Software Architect
(Big Data Analytics)
7. IBM Provides a Holistic and Integrated Approach to
Big Data and Analytics
CONSULTING and IMPLEMENTATION SERVICES
Assemble and combine relevant mix of information
SOLUTIONS
Sales | Marketing | Finance | Operations | IT | Risk | HR
Industry
Risk
Analytics
Decision
Management
Content
Analytics
Business Intelligence and Predictive Analytics
Hadoop
System
Stream
Computing
Take action and automate processes
Optimize analytical performance and IT costs
Reduced infrastructure complexity and cost
BIG DATA PLATFORM
Content
Management
Discover and explore with smart visualizations
Analyze, predict and automate
for more accurate answers
ANALYTICS
Performance
Management
Enabling organizations to
Data
Warehouse
Information Integration and Governance
SECURITY, SYSTEMS, STORAGE AND CLOUD
Manage, govern and secure information
8. The Platform for New Insight and Applications
InfoSphere Data Explorer
BIG DATA PLATFORM
Systems
Management
Application
Development
Discovery
InfoSphere BigInsights
Accelerators
Hadoop
System
Stream
Computing
Discover, understand, search, and
navigate federated sources of big data
Data
Warehouse
Information Integration & Governance
Cost-effectively analyze Petabytes
of unstructured and structured data
InfoSphere Streams
Analyze streaming data and large data
bursts for real-time insights
Data
Media
Content
Machine
Social
9. The 5 High Value Big Data Use Cases
Big Data Exploration
Find, visualize, understand
all big data to improve
business knowledge
Enhanced 360o View
of the Customer
Security/Intelligence
Extension
Achieve a true unified view,
incorporating internal and
external sources
Lower risk, detect fraud
and monitor cyber security
in real-time
Operations Analysis
Data Warehouse Augmentation
Analyze a variety of machine
data for improved business results
Integrate big data and data warehouse
capabilities to increase operational efficiency
10. Observed Big Data Use Cases
Machine Data Analysis
Customer behavior/Social analysis
Database Offload, reporting,mining
Text Analytics
Telco Apps
Audio, Video, Image Analysis
Analytic Apps
Cyber Security
Geospatial Location/ Space exploration
Statistical /predictiveAnalysis
Financial Apps Algo Trading
Fraud / Risk
Real Time Processing
Environmental Sensor apps
Smart Grid Apps
Event Processing
File storage or ECM offload
Medical/ Transcriptional Profiling
Transportation/ SCM
BigInsights as NoSQL store
197
143
139
71
32
29
24
23
22
20
19
18
14
13
13
10
8
8
5
4
0
20
40
60
80
100
120
Source: Multiple websites , n=933 available data for n= 812, count of use cases is not mutually exclusive
10
12/11/2013
140
160
180
200
11. Big Data Creates A Challenge – And an Opportunity
What If You Could...
Traditional
Big Data Approach
Leverage All of the Data
Captured
Reduce Effort
Required to Leverage
Data
Let Data Lead The Way, and
continuously explore
Leverage data as it is captured – In
Motion
13. Machine Data Analytics: Customer Example
• Intelligent Infrastructure Management: log analytics, energy bill
forecasting, energy consumption optimization, anomalous energy
usage detection, presence-aware energy management
• Optimized building energy consumption with centralized monitoring;
Automated preventive and corrective maintenance
• Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos
Would Operations Analysis benefit you?
Do you deal with large volumes of machine data?
How do you access and search that data?
How do you perform root cause analysis?
How do you perform complex real-time analysis to
correlate across different data sets?
How do you monitor and visualize streaming data
in real time and generate alerts?
Product Starting Point: InfoSphere BigInsights, InfoSphere Streams
14. BigInsights : Machine Data Analytics
Raw Logs and Machine Data
Indexing, Search
Only store
what is needed
Statistical Modeling
Machine Data
Accelerator
Root Cause Analysis
Real-time Analysis
Federated Navigation
& Discovery
15. Taking Full Advantage of Machine Data Requires New Thinking
Machine Data Characteristics
From variety of complex systems with complex formats – no
standards
May not always have context
Structured and unstructured data
Extremely large volumes of data
Streaming data as well as data at rest
Time sensitive - agile in interpretation and ability to respond
Requires sophisticated text analysis
Adaptive/dynamic algorithms to efficiently process data
Large scale indexing
16. Taking Full Advantage of Machine Data Requires New Thinking
Correlation across different data sets and/or different
environments
Data may need to be enriched or transformed to provide proper
context
Causal analysis (if problem on Tuesday, what happened on
Monday to cause this)
Pattern analysis
Time and spatial based analysis
Unique Visualization/UI needs based on data type and
industry/application
Sophisticated search capabilities.
17. Customer Usage Pattern of Log Analysis with
MDA
Step 1:
−
“What is happening in my systems?”
Step 2:
−
“Let me try to use my experience to correlate the events and
sequence”
Step 3:
−
“I need a tool to do Step 2 – I have too many systems and too
many logs”
Step 4:
−
“I need to combine with my system KPI data and monitor / report
in a dashboard. Provide possible solutions to the problem /
anomaly”
Step 5:
−
“I need to predict the behavior when I make changes, add error
codes. or add new systems”
18. Step 1: What is happening in my system?
This is accomplished get all the log data, extract, parse, index
and search through a faceted interface.
This is also the phase where basic event level metrics – max,
min, counts, builtin range metrics, alerts when KPIs are not in
range – are desired and tested.
Dashboards that are dynamic and actionable in sync with the
searches are highly desirable.
The MDA provides the Faceted Search interface.
KEY TECHNOLOGIES – Text Analytics, Faceted Search, BI
19. Step 2: Let me correlate events
In this phase, the customer performs searches and endeavors
to make sense of the events and sequences
−
We usually work side by side with the customer in this stage
−
We extract the vital tribal knowledge and applications in the
domain.
−
We log their “experiential” notions of event sequences and
correlations – this is essential to verify results when the user
wants to go to Step 3.
KEY TECHINOLOGIES – Big Sheets
20. Step 3: I have too many systems and logs to
correlate
In this phase, the customer essentially wants to find
relationships and patterns of occurrence between log events
across systems and applications.
The MDA provides uses sessionization and sequence mining
capability to accomplish this step.
KEY TECHNOLOGIES – Text Analytics, Machine Learning
21. Step 4: Combine with my KPI, Topology data
Once Step 3 is completed, the integration with the KPI,
topology, and monitoring data is possible.
This step allows us to expose the capabilities to the Network
Operator and end user.
KEY TECHNOLOGIES – Data Joins, SQL/JAQL, Big Sheets,
Reporting Dashboards
22. Step 5: Predict events based on patterns
The more advanced customers and network operators would
like to build predictive models based on the patterns they see in
the events in log data.
Customers want to build models that help with meeting
enterprise SLAs for systems
Downtime scheduling for systems is a complex problem for
most data centers.
KEY TECHNOLOGIES – Machine Learning (R, SPSS, System
ML)
24. Import
What
– Copy the logs from these machines where logs are generated using
into hdfs.
How
– BigInsights Distributed copy app + MDA extensions
Advantages
• Use ftp/ sftp protocols supported by Distributed Copy App
• MDA extensions allow batch incremental processing, batch replement
• MDA extensions associating metadata like server names, or any other,
which is available to downstream analysis
25. Extract
What
– Identify log record boundaries
– Extract information from log records in text and XML
How
– BigInsights Text Analytics
Advantages
– Robust text extraction using SQL like language
• Avoid ‘brittle’ custom parsers
– Library of extractors for common log files
• Syslogs, websphere, web access, datapower, csv, generic
– Extensive tooling for custom extractor development and app customization
• Eclipse based IDE
26. The Extract Stage: Text analytics applied to log files
Field and
Entity
Extraction
Record
Splitting
(HDFS/GPFS)
Log
Records
(text)
Raw Log
Files
AQL
To
Transform
Stage
SemiStructured
Data
(JSON)
AQL
AQL extractors available
for many common formats
[syslog, websphere, csv,
...] BigInsights ships with
tools for creating new
extractors.
27. Index
What
−
Index and facet extracted records and fields so it can be available for searching via the
faceted searching user interface
How
−
BigInsights BigIndex
Advantages
Find correlated, log entries based on time through interactive UI
Add/inject other data (e.g Excel) to enrich log context.
Allow operations staff to quickly find log entries based on search terms such as, web
service name, server name, exception code, transaction id etc
28. Transform
What
– Link and enrich log information from different entities
• Find relationships between log records
• Integrate structured data with log data
– network configuration, user account information…
How
– JAQL
Advantages
– High level language that is Big Data aware
– Out of the box transformers
– Extensive tooling for application customization
• Eclipse IDE
29. The Transform Stage: Linking logs from and other
information from varied sources
Text Files
Raw Logs
(HDFS/GPFS)
Link logs
corresponding to
1.IT logs of a single
business activity or
transaction
– Up & down the IT
stack
Performance
and Fault data
Web log
Network log
Correlations,
Predictive
Models
2.Log of a activity across
one layer of IT stack
(e.g. OS layer)
3.…
Structured data
from non-log
sources
Outlier
Detection
MQ log
– Messages flowing
through a sequence
of routers
(HDFS/GPFS)
Performance
Data
Fault data
Transaction log
Server log
Input: Parsed log records, additional
structured data
Output: Individual log records, from different
IT entities, linked and enriched
30. Analyze
What
– Correlate across fields
– Find frequently occurring sequences and combinations of events
– Potential for predictive modeling in the future
How
– System ML
Advantages
– Scalable to perform analytics on Big Data
– Flexible and customizable
– Easy to plugin into applications via a JAQL/Java interface
32. Machine Data Adapters
What are Adapters
−
Adapt a variety of inputs to a standard output
Why do we need Machine Data Adapters
−
To handle different ‘machine data’ formats
34. Adapter Functions
Create
−
Enter Adapter-Name, LogType, ‘sample machine data’ and first ‘timestamp’ in the ‘sample
machine data’
−
Check the recommended ‘DataTime Format’ and ‘preTimeStamp Regex’ and select defaults
like ‘timezone’, ‘year’ and ‘month’.
−
Verify the extracted output and save if you find it good
−
If extracted out is bad, then you can go back and edit parameter ‘Data Source Type’, ‘DataTime
Format’ and ‘preTimeStamp Regex’
Edit
View
Apply
Delete
46. Data Explorer for Indexing Application
Data Explorer Index Configuration File to support generic schema for extracted
machine data.
Parallelizing data pushing to Data Explorer Indexer.
Run Data Explorer Index Application
47. Data Explorer Index Configuration File
The Data Explorer index config file specifies which fields to index, which field
contains record ID as well as Data Explorer index field definitions: field name,
type, searchable, retrievable, filterable and sortable.
Example:
{ "source": {
"dateFormat": "MMM dd yyyy HH:mm:ss.SSS Z",
"LogDatetime[].normalized_text",
"deFieldName": "LogDatetime",
"retrievable": true,
"fieldName":
"suppress": false }, "target": {
"filterable": true,
"searchable": true,
"isRecordID": false,
"sortable": true,
Default Index Configuration file is provided.
"type": "Date" }}
48. Parallelizing data pushing to Data Explorer Indexer
The application uses Oozie jaql action to parallelize the job to multiple tasks.
HDFS
Jaql Hadoop Task 1
Jaql Hadoop Task M
…
BigSearch
BigSearch
BigSearch
BigSearch
Zookeeper
Zookeeper
Cluster
Cluster
Locate shards
Indexing app
BI platform/IDE
DE Backend
DE Backend
Shard 1
Shard 1
…
DE Backend
DE Backend
Shard N
Shard N
51. BI Log Monitoring and Analysis
•
Ingest BigInsights logs in HBase in real time.
•
Create Log Monitoring Extraction application that extracts log records from
HBase.
•
Create Index Management application to delete old index log records from DFS.
•
Embed the MDA Search UI within the BigInsights Dashboard for BigInsights log
search.
52. Ingesting BigInsights Logs into HBase
Chukwa agents setup on Name Node and each of the Data Nodes
Adapters are programmatically installed and removed depending on user
configuration.
Custom Chukwa writer class created to add logs into HBASE in real time.
Log4j Interface streams logs to the adaptors which stream logs to HBASE
Different log types are concurrently recorded in HBASE in a single table
53. Data Collection Diagram
HDFS with Apache Map Reduce
Data Node 2
Name Node
Hadoop Data Node
Hadoop Secondary Name Node
Hadoop Task Tracker
Hadoop Name Node
Hadoop Task Attempt
Hadoop Jobtracker
HBASE
Data Node 1
Hadoop Data Node
Data Node 3
Hadoop Task Tracker
Hadoop Task Attempt
Hadoop Data Node
Hadoop Task Tracker
Hadoop Task Attempt
For HDFS with Symphony MapReduce Installation: Hadoop Data Node, Hadoop
Name Node and Hadoop Secondary Name Node logs are supported
For GPFS with Apache MapReduce Installation: Hadoop Job Tracker, Hadoop
Task Tracker and Hadoop Task Attempt logs are supported
For GPFS with Symphony MapReduce Installation: Only Hadoop Task Attempt
logs are supported
54. BigInsights Dashboard
User starts the BigInsights log collection from the LogCollection app.
User is able to stop the BigInsights log collection from the LogCollection app. Or
by turning off the Monitoring.
The MDA Search UI is wrapped by a frame in BigInsights Dashboard.
57. BigInsights Log Monitoring Application
Is a BigInsights Chained application.
Contains Log Monitoring Extraction application and Index application.
Assumes that Log Monitoring Extraction application is running on schedule
mode.
The BigInsights Logs is selected assumed as the workflow for Index application.
Any configuration files are assumed to be the default configuration files installed
with MDA
“Index Only New Logs” check-box in the Index application is assumed to be
unchecked.