SlideShare a Scribd company logo
1 of 59
Download to read offline
Big Data Meetup
Machine Data Analytics
Raghuram Velega
IBM Software Architect
Big Data Analytics

© 2013 IBM Corporation
Relevant Operations Data is Huge
A Typical Enterprise of 5000 servers with 125 applications across 2 or 3
data centers generates in excess of 1.4 TB of data per day
Op
Daily Metric Output:

era
15-2 tional d
0%
per ata grow
year
ing
.

•250 Mb of event data from 125,000 Events
•125Mb of endpoint mgmt data from 5K servers
•12 Gb of performance data for 5000 servers
•1 Gb of performance for 5000 Virtual Machine
•8 Gb or Application middleware data
Assumptions: 40% of servers running
monitored middleware
Average 60 metrics each, collected every 15
minutes
• 9 Gb Storage Data per day: 175K fiber ports
Average PMDB insert 1000 bytes, 40
inserts/server
175 fiber ports,10 metrics per port, collected every 5
minutes, .5KB per port
•500 Mb Application transaction tracking data
25K volumes, 10 metrics per volume, .5KB per
for 125 Applications
volume
•1 Tb Log file data per day
5KB*(65K ports and volumes)*12*24 = 9.3 GB/day
200 Mb average per server (some will be
• 2Gb Network performance data for Data Center
smaller, some larger)
networks (not access networks)
Example: WAS instances typically
180x64 port Switches and 4 Routers to manage
produce 400MB-750MB logs/day
physical network.
•.35Tb Security data collected per day

Data flow of approximately 1TB unstructured data, and .4TB metric data per day,
Scaled to 20K servers, approx 4TB unstructured, 1.6TB metric data
Shifting market for IT Operations
Operational Visibility

APM Digest survey* of Senior IT Ops @ Fortune
500
−

50% growing dissatisfaction with traditional
performance management solutions for
Production IT

−

Inability to adapt to rapidly changing applications
& workloads

−

30% of them believe that they do not have a way
to proactively detect problems

−

Looking to operate on raw data and gain
actionable insights

IT Overwhelmed by data

IT Analytics solutions can predict, detect and
help solve problems by churning through piles of
data and translating this to understandable,
relevant information, and actionable insights.
* Source: APMDigest:
http://apmdigest.com/it-analytics-emerging-as-dissatisfaction-grows-with-apm-and-bsm-tools
Exploiting IBM’s breadth of Analytics Initiatives
Proactively mitigate risk, attain insights to optimize actions, and reduce cost of
ownership across Business, IT Operations, Asset Management, and more….
Simple ad-hoc and scheduled
Reporting to enable comparison of
multiple metrics and data-sources
Self-learning
capabilities to
automatically adapt
to change

Reduce false alerts
to lower
management costs

Notice problems
sooner and more
accurately

Leveraging analytics for
IT Operations

Performance
trending to plan
for growth

Automated
threshold setting for
quicker deployment

Detect capacity issues
prior to business
impact

Streaming data analytics to provide
realtime information and process Big
Data volumes easily
Predictive Analytics enables
forecasting and trending to provide
foresight in resource demand, capacity &
availability and clarify potential risks.
Provide holistic and accurate diagnosis
by using guiding technology with
behavioral learning capabilities.
Advanced correlation and pattern
recognition to identify and resolve
complex and undetectable events in realtime.
InfoSphere
BigInsights
IT Operations needs analytics to predict,
to search and to optimize
• How can we get early warning of failures in my critical retail applications?

Predict

• Can we predict/project failure occurrences for specific asset types?

• Can I predict which KPIs are going to cause application issues without manually configuring
thresholds? I have 100s of thousands of KPIs.
• I want to predict my online banking outages and take corrective actions before customers hit
them.

• What is driving my high maintenance costs and what can I do to address this?

• How do we make sense out of the terabytes of metric and log data that is generated by
our applications and the infrastructure on which they run to isolate problems and reduce
downtime?

Search

• How can I reduce reserved material inventory due to work order backlog?

• Can I use analysis of my channel traffic analysis to achieve improved customer insight
and intelligence?

• “What-if” we change our preventive maintenance strategy?
• Help me track capacity and performance of applications & services in cloud / virtual
environments, when do I need to add more capacity?
• Show me how to reduce cost of running my virtual infrastructure & making it more
compliant with best practices.

Optimize

• How should I plan maintenance to efficiently keep my assets operational, given what
I know today about my six month resource availability.
How the Big Data Platform Can Help?

Raghuram Velega - IBM Software Architect
(Big Data Analytics)
IBM Provides a Holistic and Integrated Approach to
Big Data and Analytics
CONSULTING and IMPLEMENTATION SERVICES

Assemble and combine relevant mix of information

SOLUTIONS
Sales | Marketing | Finance | Operations | IT | Risk | HR

Industry

Risk
Analytics

Decision
Management

Content
Analytics

Business Intelligence and Predictive Analytics

Hadoop
System

Stream
Computing

Take action and automate processes
Optimize analytical performance and IT costs
Reduced infrastructure complexity and cost

BIG DATA PLATFORM
Content
Management

Discover and explore with smart visualizations
Analyze, predict and automate
for more accurate answers

ANALYTICS
Performance
Management

Enabling organizations to

Data
Warehouse

Information Integration and Governance

SECURITY, SYSTEMS, STORAGE AND CLOUD

Manage, govern and secure information
The Platform for New Insight and Applications

InfoSphere Data Explorer
BIG DATA PLATFORM
Systems
Management

Application
Development

Discovery

InfoSphere BigInsights

Accelerators
Hadoop
System

Stream
Computing

Discover, understand, search, and
navigate federated sources of big data

Data
Warehouse

Information Integration & Governance

Cost-effectively analyze Petabytes
of unstructured and structured data

InfoSphere Streams
Analyze streaming data and large data
bursts for real-time insights

Data

Media

Content

Machine

Social
The 5 High Value Big Data Use Cases

Big Data Exploration
Find, visualize, understand
all big data to improve
business knowledge

Enhanced 360o View
of the Customer

Security/Intelligence
Extension

Achieve a true unified view,
incorporating internal and
external sources

Lower risk, detect fraud
and monitor cyber security
in real-time

Operations Analysis

Data Warehouse Augmentation

Analyze a variety of machine
data for improved business results

Integrate big data and data warehouse
capabilities to increase operational efficiency
Observed Big Data Use Cases
Machine Data Analysis
Customer behavior/Social analysis
Database Offload, reporting,mining
Text Analytics
Telco Apps
Audio, Video, Image Analysis
Analytic Apps
Cyber Security
Geospatial Location/ Space exploration
Statistical /predictiveAnalysis
Financial Apps Algo Trading
Fraud / Risk
Real Time Processing
Environmental Sensor apps
Smart Grid Apps
Event Processing
File storage or ECM offload
Medical/ Transcriptional Profiling
Transportation/ SCM
BigInsights as NoSQL store

197
143
139
71
32
29
24
23
22
20
19
18
14
13
13
10
8
8
5
4
0

20

40

60

80

100

120

Source: Multiple websites , n=933 available data for n= 812, count of use cases is not mutually exclusive
10

12/11/2013

140

160

180

200
Big Data Creates A Challenge – And an Opportunity
What If You Could...
Traditional

Big Data Approach

Leverage All of the Data
Captured

Reduce Effort
Required to Leverage
Data

Let Data Lead The Way, and
continuously explore

Leverage data as it is captured – In
Motion
IBM Infosphere BigInsights :
Machine Data Analytics
Machine Data Analytics: Customer Example

• Intelligent Infrastructure Management: log analytics, energy bill
forecasting, energy consumption optimization, anomalous energy
usage detection, presence-aware energy management
• Optimized building energy consumption with centralized monitoring;
Automated preventive and corrective maintenance
• Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos
Would Operations Analysis benefit you?
Do you deal with large volumes of machine data?
How do you access and search that data?
How do you perform root cause analysis?

How do you perform complex real-time analysis to
correlate across different data sets?
How do you monitor and visualize streaming data
in real time and generate alerts?

Product Starting Point: InfoSphere BigInsights, InfoSphere Streams
BigInsights : Machine Data Analytics

Raw Logs and Machine Data

Indexing, Search

Only store
what is needed
Statistical Modeling

Machine Data
Accelerator

Root Cause Analysis

Real-time Analysis
Federated Navigation
& Discovery
Taking Full Advantage of Machine Data Requires New Thinking
Machine Data Characteristics
From variety of complex systems with complex formats – no
standards
May not always have context
Structured and unstructured data
Extremely large volumes of data
Streaming data as well as data at rest
Time sensitive - agile in interpretation and ability to respond

Requires sophisticated text analysis
Adaptive/dynamic algorithms to efficiently process data
Large scale indexing
Taking Full Advantage of Machine Data Requires New Thinking

Correlation across different data sets and/or different
environments
Data may need to be enriched or transformed to provide proper
context
Causal analysis (if problem on Tuesday, what happened on
Monday to cause this)
Pattern analysis
Time and spatial based analysis
Unique Visualization/UI needs based on data type and
industry/application
Sophisticated search capabilities.
Customer Usage Pattern of Log Analysis with
MDA
Step 1:
−

“What is happening in my systems?”

Step 2:
−

“Let me try to use my experience to correlate the events and
sequence”

Step 3:
−

“I need a tool to do Step 2 – I have too many systems and too
many logs”

Step 4:
−

“I need to combine with my system KPI data and monitor / report
in a dashboard. Provide possible solutions to the problem /
anomaly”

Step 5:
−

“I need to predict the behavior when I make changes, add error
codes. or add new systems”
Step 1: What is happening in my system?
This is accomplished get all the log data, extract, parse, index
and search through a faceted interface.
This is also the phase where basic event level metrics – max,
min, counts, builtin range metrics, alerts when KPIs are not in
range – are desired and tested.
Dashboards that are dynamic and actionable in sync with the
searches are highly desirable.
The MDA provides the Faceted Search interface.
KEY TECHNOLOGIES – Text Analytics, Faceted Search, BI
Step 2: Let me correlate events
In this phase, the customer performs searches and endeavors
to make sense of the events and sequences
−

We usually work side by side with the customer in this stage

−

We extract the vital tribal knowledge and applications in the
domain.

−

We log their “experiential” notions of event sequences and
correlations – this is essential to verify results when the user
wants to go to Step 3.

KEY TECHINOLOGIES – Big Sheets
Step 3: I have too many systems and logs to
correlate
In this phase, the customer essentially wants to find
relationships and patterns of occurrence between log events
across systems and applications.
The MDA provides uses sessionization and sequence mining
capability to accomplish this step.
KEY TECHNOLOGIES – Text Analytics, Machine Learning
Step 4: Combine with my KPI, Topology data
Once Step 3 is completed, the integration with the KPI,
topology, and monitoring data is possible.
This step allows us to expose the capabilities to the Network
Operator and end user.
KEY TECHNOLOGIES – Data Joins, SQL/JAQL, Big Sheets,
Reporting Dashboards
Step 5: Predict events based on patterns
The more advanced customers and network operators would
like to build predictive models based on the patterns they see in
the events in log data.
Customers want to build models that help with meeting
enterprise SLAs for systems
Downtime scheduling for systems is a complex problem for
most data centers.
KEY TECHNOLOGIES – Machine Learning (R, SPSS, System
ML)
High-Level Workflow

Apply
Adapter
Import
What
– Copy the logs from these machines where logs are generated using
into hdfs.

How
– BigInsights Distributed copy app + MDA extensions

Advantages
• Use ftp/ sftp protocols supported by Distributed Copy App
• MDA extensions allow batch incremental processing, batch replement
• MDA extensions associating metadata like server names, or any other,
which is available to downstream analysis
Extract
What
– Identify log record boundaries
– Extract information from log records in text and XML

How
– BigInsights Text Analytics

Advantages
– Robust text extraction using SQL like language
• Avoid ‘brittle’ custom parsers
– Library of extractors for common log files
• Syslogs, websphere, web access, datapower, csv, generic
– Extensive tooling for custom extractor development and app customization
• Eclipse based IDE
The Extract Stage: Text analytics applied to log files

Field and
Entity
Extraction

Record
Splitting
(HDFS/GPFS)

Log
Records
(text)

Raw Log
Files
AQL

To
Transform
Stage
SemiStructured
Data
(JSON)

AQL

AQL extractors available
for many common formats
[syslog, websphere, csv,
...] BigInsights ships with
tools for creating new
extractors.
Index
What
−

Index and facet extracted records and fields so it can be available for searching via the
faceted searching user interface

How
−

BigInsights BigIndex

Advantages
Find correlated, log entries based on time through interactive UI
Add/inject other data (e.g Excel) to enrich log context.
Allow operations staff to quickly find log entries based on search terms such as, web
service name, server name, exception code, transaction id etc
Transform
What
– Link and enrich log information from different entities
• Find relationships between log records
• Integrate structured data with log data
– network configuration, user account information…

How
– JAQL

Advantages
– High level language that is Big Data aware
– Out of the box transformers
– Extensive tooling for application customization
• Eclipse IDE
The Transform Stage: Linking logs from and other
information from varied sources
Text Files

Raw Logs
(HDFS/GPFS)

Link logs
corresponding to
1.IT logs of a single
business activity or
transaction
– Up & down the IT
stack

Performance
and Fault data

Web log
Network log

Correlations,
Predictive
Models

2.Log of a activity across
one layer of IT stack
(e.g. OS layer)

3.…

Structured data
from non-log
sources

Outlier
Detection

MQ log

– Messages flowing
through a sequence
of routers

(HDFS/GPFS)

Performance
Data

Fault data
Transaction log

Server log

Input: Parsed log records, additional
structured data
Output: Individual log records, from different
IT entities, linked and enriched
Analyze
What
– Correlate across fields
– Find frequently occurring sequences and combinations of events
– Potential for predictive modeling in the future

How
– System ML

Advantages
– Scalable to perform analytics on Big Data
– Flexible and customizable
– Easy to plugin into applications via a JAQL/Java interface
Agenda
Introduction
High Level Workflow
Some Highlights
Demo
Machine Data Adapters

What are Adapters
−

Adapt a variety of inputs to a standard output

Why do we need Machine Data Adapters
−

To handle different ‘machine data’ formats
Adapters in High-Level Workflow

Apply
Adapter
Adapter Functions

Create
−

Enter Adapter-Name, LogType, ‘sample machine data’ and first ‘timestamp’ in the ‘sample
machine data’

−

Check the recommended ‘DataTime Format’ and ‘preTimeStamp Regex’ and select defaults
like ‘timezone’, ‘year’ and ‘month’.

−

Verify the extracted output and save if you find it good

−

If extracted out is bad, then you can go back and edit parameter ‘Data Source Type’, ‘DataTime
Format’ and ‘preTimeStamp Regex’

Edit
View
Apply
Delete
Create Machine Data Adapter – Step-1
Create Machine Data Adapter – Step-2
Create Machine Data Adapter – Step-3
Display Machine Data Adapter
Edit Machine Data Adapter – Step-1
Edit Machine Data Adapter – Step-2
Edit Machine Data Adapter – Step-3
Display Machine Data Adapter
Apply Machine Data Adapter
Verify the Adapter (metadata.json)
Delete
Data Explorer for Indexing Application
Data Explorer Index Configuration File to support generic schema for extracted
machine data.
Parallelizing data pushing to Data Explorer Indexer.
Run Data Explorer Index Application
Data Explorer Index Configuration File
The Data Explorer index config file specifies which fields to index, which field
contains record ID as well as Data Explorer index field definitions: field name,
type, searchable, retrievable, filterable and sortable.
Example:
{ "source": {

"dateFormat": "MMM dd yyyy HH:mm:ss.SSS Z",

"LogDatetime[].normalized_text",
"deFieldName": "LogDatetime",
"retrievable": true,

"fieldName":

"suppress": false }, "target": {
"filterable": true,

"searchable": true,

"isRecordID": false,

"sortable": true,

Default Index Configuration file is provided.

"type": "Date" }}
Parallelizing data pushing to Data Explorer Indexer
The application uses Oozie jaql action to parallelize the job to multiple tasks.

HDFS

Jaql Hadoop Task 1

Jaql Hadoop Task M

…
BigSearch
BigSearch

BigSearch
BigSearch

Zookeeper
Zookeeper
Cluster
Cluster
Locate shards

Indexing app

BI platform/IDE

DE Backend
DE Backend
Shard 1
Shard 1

…

DE Backend
DE Backend
Shard N
Shard N
Run Data Explorer index Application
Basic Facet Search UI on Application Builder
BI Log Monitoring and Analysis
•

Ingest BigInsights logs in HBase in real time.

•

Create Log Monitoring Extraction application that extracts log records from
HBase.

•

Create Index Management application to delete old index log records from DFS.

•

Embed the MDA Search UI within the BigInsights Dashboard for BigInsights log
search.
Ingesting BigInsights Logs into HBase
Chukwa agents setup on Name Node and each of the Data Nodes
Adapters are programmatically installed and removed depending on user
configuration.
Custom Chukwa writer class created to add logs into HBASE in real time.
Log4j Interface streams logs to the adaptors which stream logs to HBASE
Different log types are concurrently recorded in HBASE in a single table
Data Collection Diagram
HDFS with Apache Map Reduce
Data Node 2

Name Node

Hadoop Data Node

Hadoop Secondary Name Node

Hadoop Task Tracker

Hadoop Name Node

Hadoop Task Attempt

Hadoop Jobtracker

HBASE

Data Node 1
Hadoop Data Node

Data Node 3

Hadoop Task Tracker
Hadoop Task Attempt

Hadoop Data Node
Hadoop Task Tracker
Hadoop Task Attempt

For HDFS with Symphony MapReduce Installation: Hadoop Data Node, Hadoop
Name Node and Hadoop Secondary Name Node logs are supported
For GPFS with Apache MapReduce Installation: Hadoop Job Tracker, Hadoop
Task Tracker and Hadoop Task Attempt logs are supported
For GPFS with Symphony MapReduce Installation: Only Hadoop Task Attempt
logs are supported
BigInsights Dashboard
User starts the BigInsights log collection from the LogCollection app.
User is able to stop the BigInsights log collection from the LogCollection app. Or
by turning off the Monitoring.
The MDA Search UI is wrapped by a frame in BigInsights Dashboard.
Dashboard
LogCollection app.
BigInsights Log Monitoring Application
Is a BigInsights Chained application.
Contains Log Monitoring Extraction application and Index application.
Assumes that Log Monitoring Extraction application is running on schedule
mode.
The BigInsights Logs is selected assumed as the workflow for Index application.
Any configuration files are assumed to be the default configuration files installed
with MDA
“Index Only New Logs” check-box in the Index application is assumed to be
unchecked.
BigInsights Log Monitoring Application
Agenda
Introduction
High Level Workflow
New Features in MDA 2.1
Demo

More Related Content

What's hot

Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityDatabricks
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorDataWorks Summit
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiDatabricks
 
The Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyThe Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyYellowbrick Data
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemmagda3695
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationGreg Goltsov
 

What's hot (20)

Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Necessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services SectorNecessity of Data Lakes in the Financial Services Sector
Necessity of Data Lakes in the Financial Services Sector
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
BigData
BigDataBigData
BigData
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
 
The Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyThe Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategy
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
 

Viewers also liked

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisAnton Chuvakin
 
Smart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffSmart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffJohn Nixon
 
Credit insurance Solutions
Credit insurance SolutionsCredit insurance Solutions
Credit insurance SolutionsZayd Soobedar
 
Community Insurance by The Goat trust
Community Insurance by The Goat trustCommunity Insurance by The Goat trust
Community Insurance by The Goat trustSanjeev Kumar
 
Want to work for The Insurance Barn
Want to work for The Insurance BarnWant to work for The Insurance Barn
Want to work for The Insurance BarnTim Barnes Clu
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 

Viewers also liked (10)

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
Smart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffSmart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - Grindstaff
 
Credit insurance Solutions
Credit insurance SolutionsCredit insurance Solutions
Credit insurance Solutions
 
Community Insurance by The Goat trust
Community Insurance by The Goat trustCommunity Insurance by The Goat trust
Community Insurance by The Goat trust
 
Want to work for The Insurance Barn
Want to work for The Insurance BarnWant to work for The Insurance Barn
Want to work for The Insurance Barn
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 

Similar to Machine Data Analytics

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcastWilfried Hoge
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital TransformationMukund Babbar
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Itron and Teradata: Active Smart Grid Analytics
Itron and Teradata: Active Smart Grid AnalyticsItron and Teradata: Active Smart Grid Analytics
Itron and Teradata: Active Smart Grid AnalyticsTeradata
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Analytics Service Framework
Analytics Service Framework Analytics Service Framework
Analytics Service Framework Vishwanath Ramdas
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 

Similar to Machine Data Analytics (20)

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Itron and Teradata: Active Smart Grid Analytics
Itron and Teradata: Active Smart Grid AnalyticsItron and Teradata: Active Smart Grid Analytics
Itron and Teradata: Active Smart Grid Analytics
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Analytics Service Framework
Analytics Service Framework Analytics Service Framework
Analytics Service Framework
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 

More from Nicolas Morales

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Nicolas Morales
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014Nicolas Morales
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014Nicolas Morales
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easyNicolas Morales
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Nicolas Morales
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0Nicolas Morales
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Nicolas Morales
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Nicolas Morales
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesNicolas Morales
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big DataNicolas Morales
 

More from Nicolas Morales (14)

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
 
Text Analytics
Text Analytics Text Analytics
Text Analytics
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Machine Data Analytics

  • 1. Big Data Meetup Machine Data Analytics Raghuram Velega IBM Software Architect Big Data Analytics © 2013 IBM Corporation
  • 2. Relevant Operations Data is Huge A Typical Enterprise of 5000 servers with 125 applications across 2 or 3 data centers generates in excess of 1.4 TB of data per day Op Daily Metric Output: era 15-2 tional d 0% per ata grow year ing . •250 Mb of event data from 125,000 Events •125Mb of endpoint mgmt data from 5K servers •12 Gb of performance data for 5000 servers •1 Gb of performance for 5000 Virtual Machine •8 Gb or Application middleware data Assumptions: 40% of servers running monitored middleware Average 60 metrics each, collected every 15 minutes • 9 Gb Storage Data per day: 175K fiber ports Average PMDB insert 1000 bytes, 40 inserts/server 175 fiber ports,10 metrics per port, collected every 5 minutes, .5KB per port •500 Mb Application transaction tracking data 25K volumes, 10 metrics per volume, .5KB per for 125 Applications volume •1 Tb Log file data per day 5KB*(65K ports and volumes)*12*24 = 9.3 GB/day 200 Mb average per server (some will be • 2Gb Network performance data for Data Center smaller, some larger) networks (not access networks) Example: WAS instances typically 180x64 port Switches and 4 Routers to manage produce 400MB-750MB logs/day physical network. •.35Tb Security data collected per day Data flow of approximately 1TB unstructured data, and .4TB metric data per day, Scaled to 20K servers, approx 4TB unstructured, 1.6TB metric data
  • 3. Shifting market for IT Operations Operational Visibility APM Digest survey* of Senior IT Ops @ Fortune 500 − 50% growing dissatisfaction with traditional performance management solutions for Production IT − Inability to adapt to rapidly changing applications & workloads − 30% of them believe that they do not have a way to proactively detect problems − Looking to operate on raw data and gain actionable insights IT Overwhelmed by data IT Analytics solutions can predict, detect and help solve problems by churning through piles of data and translating this to understandable, relevant information, and actionable insights. * Source: APMDigest: http://apmdigest.com/it-analytics-emerging-as-dissatisfaction-grows-with-apm-and-bsm-tools
  • 4. Exploiting IBM’s breadth of Analytics Initiatives Proactively mitigate risk, attain insights to optimize actions, and reduce cost of ownership across Business, IT Operations, Asset Management, and more…. Simple ad-hoc and scheduled Reporting to enable comparison of multiple metrics and data-sources Self-learning capabilities to automatically adapt to change Reduce false alerts to lower management costs Notice problems sooner and more accurately Leveraging analytics for IT Operations Performance trending to plan for growth Automated threshold setting for quicker deployment Detect capacity issues prior to business impact Streaming data analytics to provide realtime information and process Big Data volumes easily Predictive Analytics enables forecasting and trending to provide foresight in resource demand, capacity & availability and clarify potential risks. Provide holistic and accurate diagnosis by using guiding technology with behavioral learning capabilities. Advanced correlation and pattern recognition to identify and resolve complex and undetectable events in realtime. InfoSphere BigInsights
  • 5. IT Operations needs analytics to predict, to search and to optimize • How can we get early warning of failures in my critical retail applications? Predict • Can we predict/project failure occurrences for specific asset types? • Can I predict which KPIs are going to cause application issues without manually configuring thresholds? I have 100s of thousands of KPIs. • I want to predict my online banking outages and take corrective actions before customers hit them. • What is driving my high maintenance costs and what can I do to address this? • How do we make sense out of the terabytes of metric and log data that is generated by our applications and the infrastructure on which they run to isolate problems and reduce downtime? Search • How can I reduce reserved material inventory due to work order backlog? • Can I use analysis of my channel traffic analysis to achieve improved customer insight and intelligence? • “What-if” we change our preventive maintenance strategy? • Help me track capacity and performance of applications & services in cloud / virtual environments, when do I need to add more capacity? • Show me how to reduce cost of running my virtual infrastructure & making it more compliant with best practices. Optimize • How should I plan maintenance to efficiently keep my assets operational, given what I know today about my six month resource availability.
  • 6. How the Big Data Platform Can Help? Raghuram Velega - IBM Software Architect (Big Data Analytics)
  • 7. IBM Provides a Holistic and Integrated Approach to Big Data and Analytics CONSULTING and IMPLEMENTATION SERVICES Assemble and combine relevant mix of information SOLUTIONS Sales | Marketing | Finance | Operations | IT | Risk | HR Industry Risk Analytics Decision Management Content Analytics Business Intelligence and Predictive Analytics Hadoop System Stream Computing Take action and automate processes Optimize analytical performance and IT costs Reduced infrastructure complexity and cost BIG DATA PLATFORM Content Management Discover and explore with smart visualizations Analyze, predict and automate for more accurate answers ANALYTICS Performance Management Enabling organizations to Data Warehouse Information Integration and Governance SECURITY, SYSTEMS, STORAGE AND CLOUD Manage, govern and secure information
  • 8. The Platform for New Insight and Applications InfoSphere Data Explorer BIG DATA PLATFORM Systems Management Application Development Discovery InfoSphere BigInsights Accelerators Hadoop System Stream Computing Discover, understand, search, and navigate federated sources of big data Data Warehouse Information Integration & Governance Cost-effectively analyze Petabytes of unstructured and structured data InfoSphere Streams Analyze streaming data and large data bursts for real-time insights Data Media Content Machine Social
  • 9. The 5 High Value Big Data Use Cases Big Data Exploration Find, visualize, understand all big data to improve business knowledge Enhanced 360o View of the Customer Security/Intelligence Extension Achieve a true unified view, incorporating internal and external sources Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Data Warehouse Augmentation Analyze a variety of machine data for improved business results Integrate big data and data warehouse capabilities to increase operational efficiency
  • 10. Observed Big Data Use Cases Machine Data Analysis Customer behavior/Social analysis Database Offload, reporting,mining Text Analytics Telco Apps Audio, Video, Image Analysis Analytic Apps Cyber Security Geospatial Location/ Space exploration Statistical /predictiveAnalysis Financial Apps Algo Trading Fraud / Risk Real Time Processing Environmental Sensor apps Smart Grid Apps Event Processing File storage or ECM offload Medical/ Transcriptional Profiling Transportation/ SCM BigInsights as NoSQL store 197 143 139 71 32 29 24 23 22 20 19 18 14 13 13 10 8 8 5 4 0 20 40 60 80 100 120 Source: Multiple websites , n=933 available data for n= 812, count of use cases is not mutually exclusive 10 12/11/2013 140 160 180 200
  • 11. Big Data Creates A Challenge – And an Opportunity What If You Could... Traditional Big Data Approach Leverage All of the Data Captured Reduce Effort Required to Leverage Data Let Data Lead The Way, and continuously explore Leverage data as it is captured – In Motion
  • 12. IBM Infosphere BigInsights : Machine Data Analytics
  • 13. Machine Data Analytics: Customer Example • Intelligent Infrastructure Management: log analytics, energy bill forecasting, energy consumption optimization, anomalous energy usage detection, presence-aware energy management • Optimized building energy consumption with centralized monitoring; Automated preventive and corrective maintenance • Utilized InfoSphere Streams, InfoSphere BigInsights, IBM Cognos Would Operations Analysis benefit you? Do you deal with large volumes of machine data? How do you access and search that data? How do you perform root cause analysis? How do you perform complex real-time analysis to correlate across different data sets? How do you monitor and visualize streaming data in real time and generate alerts? Product Starting Point: InfoSphere BigInsights, InfoSphere Streams
  • 14. BigInsights : Machine Data Analytics Raw Logs and Machine Data Indexing, Search Only store what is needed Statistical Modeling Machine Data Accelerator Root Cause Analysis Real-time Analysis Federated Navigation & Discovery
  • 15. Taking Full Advantage of Machine Data Requires New Thinking Machine Data Characteristics From variety of complex systems with complex formats – no standards May not always have context Structured and unstructured data Extremely large volumes of data Streaming data as well as data at rest Time sensitive - agile in interpretation and ability to respond Requires sophisticated text analysis Adaptive/dynamic algorithms to efficiently process data Large scale indexing
  • 16. Taking Full Advantage of Machine Data Requires New Thinking Correlation across different data sets and/or different environments Data may need to be enriched or transformed to provide proper context Causal analysis (if problem on Tuesday, what happened on Monday to cause this) Pattern analysis Time and spatial based analysis Unique Visualization/UI needs based on data type and industry/application Sophisticated search capabilities.
  • 17. Customer Usage Pattern of Log Analysis with MDA Step 1: − “What is happening in my systems?” Step 2: − “Let me try to use my experience to correlate the events and sequence” Step 3: − “I need a tool to do Step 2 – I have too many systems and too many logs” Step 4: − “I need to combine with my system KPI data and monitor / report in a dashboard. Provide possible solutions to the problem / anomaly” Step 5: − “I need to predict the behavior when I make changes, add error codes. or add new systems”
  • 18. Step 1: What is happening in my system? This is accomplished get all the log data, extract, parse, index and search through a faceted interface. This is also the phase where basic event level metrics – max, min, counts, builtin range metrics, alerts when KPIs are not in range – are desired and tested. Dashboards that are dynamic and actionable in sync with the searches are highly desirable. The MDA provides the Faceted Search interface. KEY TECHNOLOGIES – Text Analytics, Faceted Search, BI
  • 19. Step 2: Let me correlate events In this phase, the customer performs searches and endeavors to make sense of the events and sequences − We usually work side by side with the customer in this stage − We extract the vital tribal knowledge and applications in the domain. − We log their “experiential” notions of event sequences and correlations – this is essential to verify results when the user wants to go to Step 3. KEY TECHINOLOGIES – Big Sheets
  • 20. Step 3: I have too many systems and logs to correlate In this phase, the customer essentially wants to find relationships and patterns of occurrence between log events across systems and applications. The MDA provides uses sessionization and sequence mining capability to accomplish this step. KEY TECHNOLOGIES – Text Analytics, Machine Learning
  • 21. Step 4: Combine with my KPI, Topology data Once Step 3 is completed, the integration with the KPI, topology, and monitoring data is possible. This step allows us to expose the capabilities to the Network Operator and end user. KEY TECHNOLOGIES – Data Joins, SQL/JAQL, Big Sheets, Reporting Dashboards
  • 22. Step 5: Predict events based on patterns The more advanced customers and network operators would like to build predictive models based on the patterns they see in the events in log data. Customers want to build models that help with meeting enterprise SLAs for systems Downtime scheduling for systems is a complex problem for most data centers. KEY TECHNOLOGIES – Machine Learning (R, SPSS, System ML)
  • 24. Import What – Copy the logs from these machines where logs are generated using into hdfs. How – BigInsights Distributed copy app + MDA extensions Advantages • Use ftp/ sftp protocols supported by Distributed Copy App • MDA extensions allow batch incremental processing, batch replement • MDA extensions associating metadata like server names, or any other, which is available to downstream analysis
  • 25. Extract What – Identify log record boundaries – Extract information from log records in text and XML How – BigInsights Text Analytics Advantages – Robust text extraction using SQL like language • Avoid ‘brittle’ custom parsers – Library of extractors for common log files • Syslogs, websphere, web access, datapower, csv, generic – Extensive tooling for custom extractor development and app customization • Eclipse based IDE
  • 26. The Extract Stage: Text analytics applied to log files Field and Entity Extraction Record Splitting (HDFS/GPFS) Log Records (text) Raw Log Files AQL To Transform Stage SemiStructured Data (JSON) AQL AQL extractors available for many common formats [syslog, websphere, csv, ...] BigInsights ships with tools for creating new extractors.
  • 27. Index What − Index and facet extracted records and fields so it can be available for searching via the faceted searching user interface How − BigInsights BigIndex Advantages Find correlated, log entries based on time through interactive UI Add/inject other data (e.g Excel) to enrich log context. Allow operations staff to quickly find log entries based on search terms such as, web service name, server name, exception code, transaction id etc
  • 28. Transform What – Link and enrich log information from different entities • Find relationships between log records • Integrate structured data with log data – network configuration, user account information… How – JAQL Advantages – High level language that is Big Data aware – Out of the box transformers – Extensive tooling for application customization • Eclipse IDE
  • 29. The Transform Stage: Linking logs from and other information from varied sources Text Files Raw Logs (HDFS/GPFS) Link logs corresponding to 1.IT logs of a single business activity or transaction – Up & down the IT stack Performance and Fault data Web log Network log Correlations, Predictive Models 2.Log of a activity across one layer of IT stack (e.g. OS layer) 3.… Structured data from non-log sources Outlier Detection MQ log – Messages flowing through a sequence of routers (HDFS/GPFS) Performance Data Fault data Transaction log Server log Input: Parsed log records, additional structured data Output: Individual log records, from different IT entities, linked and enriched
  • 30. Analyze What – Correlate across fields – Find frequently occurring sequences and combinations of events – Potential for predictive modeling in the future How – System ML Advantages – Scalable to perform analytics on Big Data – Flexible and customizable – Easy to plugin into applications via a JAQL/Java interface
  • 32. Machine Data Adapters What are Adapters − Adapt a variety of inputs to a standard output Why do we need Machine Data Adapters − To handle different ‘machine data’ formats
  • 33. Adapters in High-Level Workflow Apply Adapter
  • 34. Adapter Functions Create − Enter Adapter-Name, LogType, ‘sample machine data’ and first ‘timestamp’ in the ‘sample machine data’ − Check the recommended ‘DataTime Format’ and ‘preTimeStamp Regex’ and select defaults like ‘timezone’, ‘year’ and ‘month’. − Verify the extracted output and save if you find it good − If extracted out is bad, then you can go back and edit parameter ‘Data Source Type’, ‘DataTime Format’ and ‘preTimeStamp Regex’ Edit View Apply Delete
  • 35. Create Machine Data Adapter – Step-1
  • 36. Create Machine Data Adapter – Step-2
  • 37. Create Machine Data Adapter – Step-3
  • 39. Edit Machine Data Adapter – Step-1
  • 40. Edit Machine Data Adapter – Step-2
  • 41. Edit Machine Data Adapter – Step-3
  • 44. Verify the Adapter (metadata.json)
  • 46. Data Explorer for Indexing Application Data Explorer Index Configuration File to support generic schema for extracted machine data. Parallelizing data pushing to Data Explorer Indexer. Run Data Explorer Index Application
  • 47. Data Explorer Index Configuration File The Data Explorer index config file specifies which fields to index, which field contains record ID as well as Data Explorer index field definitions: field name, type, searchable, retrievable, filterable and sortable. Example: { "source": { "dateFormat": "MMM dd yyyy HH:mm:ss.SSS Z", "LogDatetime[].normalized_text", "deFieldName": "LogDatetime", "retrievable": true, "fieldName": "suppress": false }, "target": { "filterable": true, "searchable": true, "isRecordID": false, "sortable": true, Default Index Configuration file is provided. "type": "Date" }}
  • 48. Parallelizing data pushing to Data Explorer Indexer The application uses Oozie jaql action to parallelize the job to multiple tasks. HDFS Jaql Hadoop Task 1 Jaql Hadoop Task M … BigSearch BigSearch BigSearch BigSearch Zookeeper Zookeeper Cluster Cluster Locate shards Indexing app BI platform/IDE DE Backend DE Backend Shard 1 Shard 1 … DE Backend DE Backend Shard N Shard N
  • 49. Run Data Explorer index Application
  • 50. Basic Facet Search UI on Application Builder
  • 51. BI Log Monitoring and Analysis • Ingest BigInsights logs in HBase in real time. • Create Log Monitoring Extraction application that extracts log records from HBase. • Create Index Management application to delete old index log records from DFS. • Embed the MDA Search UI within the BigInsights Dashboard for BigInsights log search.
  • 52. Ingesting BigInsights Logs into HBase Chukwa agents setup on Name Node and each of the Data Nodes Adapters are programmatically installed and removed depending on user configuration. Custom Chukwa writer class created to add logs into HBASE in real time. Log4j Interface streams logs to the adaptors which stream logs to HBASE Different log types are concurrently recorded in HBASE in a single table
  • 53. Data Collection Diagram HDFS with Apache Map Reduce Data Node 2 Name Node Hadoop Data Node Hadoop Secondary Name Node Hadoop Task Tracker Hadoop Name Node Hadoop Task Attempt Hadoop Jobtracker HBASE Data Node 1 Hadoop Data Node Data Node 3 Hadoop Task Tracker Hadoop Task Attempt Hadoop Data Node Hadoop Task Tracker Hadoop Task Attempt For HDFS with Symphony MapReduce Installation: Hadoop Data Node, Hadoop Name Node and Hadoop Secondary Name Node logs are supported For GPFS with Apache MapReduce Installation: Hadoop Job Tracker, Hadoop Task Tracker and Hadoop Task Attempt logs are supported For GPFS with Symphony MapReduce Installation: Only Hadoop Task Attempt logs are supported
  • 54. BigInsights Dashboard User starts the BigInsights log collection from the LogCollection app. User is able to stop the BigInsights log collection from the LogCollection app. Or by turning off the Monitoring. The MDA Search UI is wrapped by a frame in BigInsights Dashboard.
  • 57. BigInsights Log Monitoring Application Is a BigInsights Chained application. Contains Log Monitoring Extraction application and Index application. Assumes that Log Monitoring Extraction application is running on schedule mode. The BigInsights Logs is selected assumed as the workflow for Index application. Any configuration files are assumed to be the default configuration files installed with MDA “Index Only New Logs” check-box in the Index application is assumed to be unchecked.