1524 how ibm's big data solution can help you gain insight into your data center v2

How IBM's Big Data Solution
Can Help You Gain Insight into
Your Data Center
Christophe Menichetti, Certified IT Specialist
BAO / Big Data

© 2012 IBM Corporation

Please note

IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product
direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract. The development, release, and
timing of any future features or functionality described for our products remains at our sole
discretion.

Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration,
and the workload processed. Therefore, no assurance can be given that an individual user
will achieve results similar to those stated here.

1

IBM Montpellier Client Center

Our Client Center partners with
clients to meet their IT infrastructure
goals and improve their overall
business by demonstrating the
capabilities of the IBM solutions.

 Smarter Computing Design:  Benchmarks & Proofs of Concept:
 System Briefings Energy, Cities, Cloud, Water, –PureSystems
Business Resilience –System z
 Software Briefings –Power Systems
 Enterprise Architecture Design
–HPC
 Demonstrations  z Key Workload Initiatives –System x & Blade
 Advanced Technical Skills –Storage
 Industry Showcases  Solution Testing
 ISV Solution Centers:
 BP, ISV & CSI Support SAP, Oracle, Siebel  WW GDPS Solution Testing
 WW Financial Services CoE  Software zTEC
 New Technology Introduction

Talk & Teach Design Prove

2

Innovation Lab – Resources & Skills
 Smarter Cities Innovation through R&D Collaborative Projects supported by CAS
France in partnership with Labs & Clients (funded by Governments or European
Commission) and Client projects

 Big Data / BAO & Smarter Cities offerings
Customer Briefings & Workshops: Architecture, Design Session, PoC
Presales technical support: RFP, sizing, pilot support, architecture Showcases Xavier Vasques Virginie Radisson Marie Angèle Grilli Olivier Hess
Manager Business Leader Project Manager CTO
 Smarter Energy & Cities: Innovation with a vision of improving Energy
Smarter cities
Consumption through the use of IT and BAO with Universities/company

 Montpellier Water Management COE: The use of numerical simulations-HPC for
Water Manager as Deep Thunder from IBM Research first implemented in IBM
Europe by our IBM Montpellier team.

Christophe Menichetti Elsa Fabres Colin Dumontier Romain Chailan
BAO/Big Data Data Analytics & IOC HPC/Water Specialist PhD Student
Specialist Specialist

Saniya Ben Hassen Jean-Philippe Durney Denis Gras
BAO IT Architect BAO/Big Data IT Smarter Cities
Architect IT Architect

Promote and develop innovative assets around Data
applied to Smarter Planet/Cities issues in order to
engage customer and collaborative R&D projects
3

AGENDA

Big Data Challenges
> Why the interest is growing?

Big Data Technologies
> What is Big Data ?

IBM Big Data Solutions
> IBM Big Insights and IBM Streams

Big Data in action
> Our Customer Center Showcase experience

4

4

AGENDA

Big Data Challenges



Big Data in action

5

5

What is Big Data ?

6

6

Our data rich world is exploding…

4.6
IT: Logs & 30 billion RFID
transactions
billion
tags today camera
Twitter process (1.3B in 2005 phones
7 TBs of world
data every day wide

900
million
GPS
devices
Facebook processes
sold
10 TBs of annually
World Data Centre for Climate data every day
keeps 220 TBS of Web data
by 2013
and 9 PBs of auxiliary
supporting data 2
billion
Capital market people
data volumes grew on the
76 million smart Web by
1,750%, 2003-06 meters in 2009… 2011
200M by 2014 Text, Blog,
Weblog 7

7

The Big Data Opportunity

Extracting insight from an immense volume, variety and velocity
of data, in context, beyond what was previously possible.

Variety: Manage the complexity of
multiple relational and non-
relational data types and
schemas

Velocity: Streaming data and large
volume data movement

Volume: Scale from terabytes to
zettabytes (1B TBs)
8

8
8

Bring Together a Large Volume and Variety of Data to Find New Insights

Multi-channel customer
sentiment and experience a
analysis

Detect life-threatening
conditions at hospitals in
time to intervene

Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement

Make risk decisions based on
real-time transactional data

Identify criminals and threats
from disparate video, audio,
and data feeds

9

9

AGENDA

Big Data Challenges



Big Data in action

10

10

Big Data : why is it possible Now ?

 Traditional approach : Data to Function Traditional approach
Application server and Database
User request Query Data server are separate
Database Data can be on multiple servers
Application Analysis Program can run on
server
server
multiple Application servers
Network is still a the middle
Send result return Data Data have to go through the network
process Data
Data

•Big Data Approach
Big Data approach : Function to Data  Analysis Program runs where are
Query & the data : on Data Node
Send Function to process Data Only the Analysis Program are have
process on Data Data to go through the network
User request Data
nodes Analysis Program need to be
Data
nodes
Master Data
nodes MapReduce aware
node nodes Highly Scalable :
Data
Data 1000s Nodes
Data Petabytes and more
Data
Send Consolidate result

11

11

Big Data : why is it possible Now ?

 Traditional approach : Data to Function Example :
User request Query Data How many hours Clint Eastwood
Database appears in all the movies he has done ?
Application
server All movies need to be parsed to find
server
Clint face
Send result return Data
Traditional approach : All movies are
process Data going to be sent through the Network
Data

 Big Data approach : Function to Data
Query &
• Big Data Approach : Only the
Send Function to process Data Analysis Program and Clint picture are
process on Data Data sent through the Network
User request Data
nodes
Data
nodes
Master Data
nodes
node nodes
Data
Data
Data
Data
Send Consolidate result

12

12

Merging the Traditional and Big Data Approaches

Traditional Approach Big Data Approach
Structured & Repeatable Analysis Iterative & Exploratory Analysis

IT
Business Users
Delivers a platform to
Determine what enable creative
question to ask discovery

IT Business Users
Structures the Explores what
data to answer questions could be
that question asked

Monthly sales reports Brand sentiment
Profitability analysis Product strategy
Customer surveys Maximum asset utilization
13

13

AGENDA

Big Data Challenges



Big Data in action

14

14

IBM Big Data platform
Analyse unstructured Big
Data
Analyze structutred Big
Data
Analytic Applications Content Analytics
Cognos BI Reporting Exploration / Functional Industry Predictive
Reporting BIContent Index for contextual collaborative
Reporting / Content
SPSS Visualization App App Analytics Analytics
Analytics insights
Reporting
Create Reports on BigInsights , Analyze
In Streams Simplify your warehouse
Unlock Big Data Big Data Platform PureData Analytics, PureData
Operational Analytics
Infosphere Data Explorer
Visualization Application Systems Deliver deep insight with advanced
Gather, extract and explore data using
best of breed visualization
& Discovery Development Management in-database analytics and operational
analytics

Analyze Raw Rata Accelerators
InfoSphere BigInsights
Infosphere Streams (RT) Index Big Data
Data Explorer
Speed time to value with analytic and Hadoop Stream Data Content Analytics
application accelerators Content
System Computing Warehouse Management Index for contextual collaborative
insights
Reduce costs with Hadoop
PlatForm Computing , GPFS

Cost-effectively analyze Manage Big Data
petabytes of structured and
Gardium, Information Server
unstructured information
Information Integration & Governance Govern data quality and manage
information lifecycle
insights
Analyze Streaming Data
InfoSphere Streams
Cloud | Mobile | Security
Analyze streaming data and large data
bursts for real-time insights 15

AGENDA

Big Data Challenges




16

16

What’s so Special About Open Source Hadoop?

Storage Scalable
• Distributed • New nodes can be added on the fly
• Reliable
• Commodity gear Affordable
• Massively parallel computing on
commodity servers

Flexible
• Hadoop is schema-less – can absorb
MapReduce any type of data
• Parallel Programming
Fault Tolerant
• Fault Tolerant
• Through MapReduce software
framework

17

17

Basic Hadoop principles: HDFS and MapReduce

 Hadoop Distributed File System = HDFS : where Hadoop stores the data
– This file system spans all the nodes in a cluster

 Hadoop computation model
– Data stored in a distributed file system spanning many inexpensive computers
– Bring function to the data
– Distribute application to the compute resources where the data is stored
– Scalable to thousands of nodes and petabytes of data
public static class TokenizerMapper Hadoop Data Nodes
extends Mapper<Object,Text,Text,IntWritable> {
private final static IntWritable
one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text val, Context
StringTokenizer itr =
new StringTokenizer(val.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());

}
context.write(word, one); 1. Map Phase
}
} (break job into small parts)
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWrita
private IntWritable result = new IntWritable();
public void reduce(Text key, Distribute map 2. Shuffle
Iterable<IntWritable> val, Context context){
int sum = 0;
for (IntWritable v : val) { tasks to cluster (transfer interim output
sum += v.get();

. . .
for final processing)

MapReduce Application 3. Reduce Phase
(boil all output down to
Shuffle a single result set)

Result Set Return a single result set

18

18


Platform for volume, variety, velocity -- V3
 Enhanced Hadoop foundation
Analytics for V3
Enterprise Edition
 Text analytics & tooling
Licensed
Usability
 Web console Business process accelerators (“Apps”)
 Integrated install Text analytics
Spreadsheet-style analysis tool
 Spreadsheet-style tool

Enterprise class
RDBMS, warehouse connectivity
 Ready-made “apps”
Integrated Web-based console
Enterprise Class Basic Edition Flexible job scheduler
 Storage, security, cluster Performance enhancements
Free download
management Eclipse-based tooling
Integration Integrated install LDAP authentication
Online InfoCenter
 Connectivity to DB2, Netezza, JDBC ....
BigData Univ.
databases, SPSS, Cognos, Unica, Apache
Hadoop
coremetrics, Streams, Datastage

Breadth of capabilities 19

19

InfoSphere BigInsights – A Full Hadoop Stack

Open Source Components IBM Specific Components
20

20

Vestas optimizes
capital investments
based on 3 Petabytes
of information.
Capabilities Utilized:
InfoSphere Warehouse

• Model the weather to optimize
placement of turbines, maximizing
power generation and longevity.
• Reduce time required to identify
placement of turbine from weeks to
hours.
• Incorporate 3 PB of structured and
semi-structured information flows.
• Data volume expected to grow to 6 PB.

21

21
2

AGENDA

Big Data Challenges



InfoSphere Streams
>

22

22

IBM InfoSphere Streams for companies who need to…

Real-time delivery
 Deal with Terabytes of data each
second ICU Environment
Monitoring
Monitoring

 Work with application, sensor and Algo Powerful Telco churn
Analytics predict
internet data, video/audio Trading
Cyber Smart
Security Government / Grid
 Deliver insight in microseconds to Law enforcement

analytical applications

 Support complex scenarios using Millions of
events per Microsecond
C++ or Java code Latency
second
 Integrate with existing analytics
& data warehousing investments
Traditional /
Non-traditional
data sources

23

23

Stream Computing – Analyze Data in Motion

Traditional Computing Stream Computing

Historical fact finding Current fact finding

Find and analyze information stored on disk Analyze data in motion – before it is stored

Batch paradigm, pull model Low latency paradigm, push model

Query-driven: submits queries to static data Data driven – bring the data to the query

24

24

Big Data in Real-Time with InfoSphere Streams

Filter / Sample
Modify Annotate

Fuse
Classify

25

25

Asian Telco reduces
billing costs and
improves customer
satisfaction

Capabilities:
Stream Computing
Analytic Accelerators

Real-time mediation and analysis of
5B CDRs per day
Data processing time reduced from
12 hrs to 1 min
Hardware cost reduced to 1/8th
Proactively address issues
(e.g. dropped calls) impacting customer
satisfaction.
26

26
2

Most Use Cases Combine Technologies

Variety
Volume
Combination of Streams filters
Non-traditional/ incoming data
internet data
with traditional
data

InfoSphere Streams

Traditional
Data Reuse Warehouse
Analytic models
IBM Data Warehouse Velocity
Persistent Data In-Motion Data

27

27

Big Data Patterns
Common Big Data and Warehouse patterns
Separate unstructured & structured analysis Common analysis of structured and unstructured
data
App /BI App / BI App / BI
Visualization Visualization Visualization / Exploration
Exploration Exploration

BigInsights Warehouse
BigInsights Warehouse

Unstructured Structured
Unstructured Structured

Warehouse and BigInsights partitioning Warehouse batch offload

App / BI App / BI
App / BI
Visualization Visualization
Visualization
Exploration Exploration
Exploration

Warehouse BigInsights BigInsights
Warehouse

Structured
Structured 28

Big Data Patterns

Common Big Data and Warehouse patterns
In motion, at rest analysis with BigInsights In motion and at rest applications

Real time App Analytic App / Real time App / BI
/ BI BI

Streams BigInsights
Streams BigInsights
Warehouse
Warehouse
Streaming data Streaming data

In motion, at rest analysis of structured and In motion, structured at rest analysis
unstructured data
Real time App Real time App Analytic App / BI
Analytic App / BI
/ BI / BI

BigInsights Warehouse Streams BigInsights Warehouse
Streams

Streaming data Unstructured Structured Streaming data Structured
data data data
29

AGENDA

Big Data Challenges



Big Data in action

30

30

Big Data Use Cases and customer outcomes
Findings from the research collaboration of IBM Institute for Business Value and Saïd Business School, University of Oxford

Big data objectives Big data sources

Customer-centric outcomes New business model
Respondents were
Operational optimization Employee collaboration asked which data
Risk / financial management sources are currently
being collected and
Top functional objectives identified by organizations with active big analyzed as part of
data pilots or implementations. Responses have been weighted active big data efforts
and aggregated. within their organization.

31

Operations / Performance Data is Exploding

A typical enterprise with 5000 servers, running 125 applications across 2 to 3
data centers generates in excess of 1.3 TB of data per day

Data Ratio
Only 3% of the data generate is operations Metric Data Unstructured Data
oriented metric data. 3%

97% is made up of unstructured/semi
structured data 97%

Workloads are running on heterogeneous
platforms.

32

Log Analysis: Problem Characteristics

Several thousand log files collected daily, data collected over several years
Infrastructure (Servers, Networks, Storage), Middleware (App Server, Web Server, Database Server,
Messaging Server), Apps
Value in collocating and co-analyzing the above data

Millions of files, petabytes of data in total, terabytes produced per day.
The relationships between logs (links shown below) have to be discovered
Large percentage of storage in an enterprise is for log data

Analysis of log data has many challenges
One replica stops
responding... Collection and parsing of data
App
2
App
Server Interpretation of logs
App Load
2 Balancer
Replicated
Database SMEs flooded with common bugs

...causing a fraction of database calls to time out...
Lack of a joined up view.

...which leads to intermittent failures in the
application. Reactive rather than proactive
33

Central Lab Platform – Before

The consequence of scattered Infrastructures for hands-on classes are
high costs and business transformation roadblocks

34

34

Central Lab Platform – After

The scattered infrastructures were transformed into a
centralized consolidated hands-on Cloud Platform

35

35

Central Lab Platform Cloud Architecture
SELF-SERVE SERVICE SERVICE DYNAMIC
PORTAL REQUEST PROVISIONING INFRASTRUCTURE

Class
Manager Teacher & Students
Management

CLP Cloud Management Front-end Internet
access
Web Portal
Planning VPN
Reporting
Invoicing
Reservation
CLP Application engine Setup
manager

Shared CLP
Resources
TA CLP
TPM
Daily repl.

Workflows
TA DB CLP DB & Scripts

36

Process diagram for log analysis
700 Servers
•Unix
•Windows
•Mainframe
•HMC, BladeCenter

170 Storage servers
•DS8000
•V7000
•SVC

180 Switches
•SAN
•LAN

Cloud Mgt & applications
•TPM
•Odina
•Citrix
•Aventail
•Scripts

Business application
•Labs Reservation Portal
•Problem Tracking

37

Big Data Project Trends & Directions

2 Majors Front End objective to demonstrate Big Data Benefit

Navigating Enterprise Information: “Leverage Big Data Business Value”
• 360° Operational View : To accelerate incident resolution
• 360° Business View : To provide metric and Insight
– Cloud Data Center utilization : Data Center Business View
– Training Labs : Data Center’s Customer Business View

Predictive Incident Alerting : “Act Proactively on Incident”
• Create Predictive Models based on log history to alert before Incident
arrived
• Reduce number of Incident Tickets

38

How support Team Work today : Many applications / Information dispersion

39

Navigating Enterprise Information: 360° Operational View

About | Help | Profile | Logout - Durney
Power System System X System Z Storage Software
Sort by: Date Relevance Title
Search: 153494
Your query has been expanded. Show Expansions

0 documents selected. Select/deselect all on this page Global Status
Documentation Service Warn Error Down Up
Top 76 Results
Ticket Citrix 0 0 0 10
Creator ID Assignee Status Priority Course code Class # Contact Network 0 0 0 180
Lab Setup Guide (4)
Nick Yabut 153494 Jean-Philippe Durney Open 2 AN14GB H65X Martin Elliff Storage 0 0 0 170
Courses Exercices (3)
need to rebuild LPAR2 for this course (sys5442_lpar2), but can't log into the class NIM server Phone # Master
Production documentation (10)
nim151 (10.6.151.35). It appears to be off-line and it is not showing on the managed system. Cell # servers 2 0 0 4
Best Practice (3) Emailmartin.elliff@mail.com Nim 4 0 56 87
Citrix (3) Sametime ID inst151 TPM 2 0 0 1
Provisionning (10)
Storage (4) Course Schedule
Open Tickets
TSM (3)
Overview (4) Sev 1 Sev 2 Sev 3
Processes (5) 5
Tech Choices (12) 4
How To (15)
3
more | all
2
Lotus Notes 1
Re: AN14 scripts on LPAR 10nov.2012 0
I have copied a tar file with all the script for the an14
course on you nim server "sys3862_nim1" in Ticket on AN14 h-24 h-12 h-6 h-1
/home/an14. ... AN14 scripts on LPAR Sent by: ID Assignee Status Priority Course code Class # Contact
Jeffrey Emmanuel D ...
153301 Pascal Seignez Closed 2 AN14GB H65X Martin Elliff
Re: Ticket #123078 course AN14 ref 8849/E9D4/9416
26fév.2012 Access
AN14 ref 8849/E9D4/9416 Hello We have sent 3 IBM CLP class information: AN14GB / H65X (Jan. 21, 2013, 12:00 PM)sys5442 -- We have found that when a device
course kits : IBM CLP class ... is deleted from any of the LPARs (rmdev -dl hdisk2), cfgmgr has to be run twice to bring the device back online. I'm Top 11 Results
sure this is not standard behaviour. Can you explain why this is happening?
mime.htm (Ticket #123078 Updated (IBM Problem Tracking & ...
25fév.2012
AN14 class number E9D4 - customer sent message 150000 Pascal Seignez Closed 2 AN14GB 9023 Amin Ezzy HMC (4)
on St to request for 4 more additional ... AN14 class TPMHMC (3)
number E9D4, and also applid for two more kits ST: AN14G 9023 all students could not log in to the HMCs, username / password error
because the students ... Course HMC (1)
HMC Power down 03Jan.2013 NIM (2)
Pour le cours an14, ZRGV, l'instructeur demande •nim_master
pourquoi les lpars sont en AIX 7. … 60906 2nd Level Support Closed 2 AN14GB 2861 Martin Elliff
•nim151
Customer has issues on course AN14GB/2861. Storage (2)
more | all CLP Servers (3)
Citrix (1)
59861 Jean Midot Closed 2 AN140 VYRM Ben Gibbs Admin Tools (5)
Unable to log in to citrix (elabs), UID and PW not working : error : invalid credentials for all one example : UID :
stud148_1 pw : dayheat_67 more | all
40

Navigating Enterprise Information: 360° Business View

About | Help | Profile | Logout - Durney
Power System System X System Z Storage Software
Sort by: Date Relevance Title
Search:
Your query has been expanded. Show Expansions

Show Metrics
• Number of Running courses versus Number of Logged Students (typically Extract through Big Data Log
Analysis)
• Cumulative time usage per course/session
• Servers, Storage usage
• Electric consumption

Create Correlated views

• Electric Consumption versus number of courses running
• Consolidate view per by Global Training Partner

Analyze operation
• Number of Ticket per Courses Brand, per Course, per Geo
• Average Resolution time per Incident type
• Top 10 incident per frequency
• Top 10 incident per Geo
• Top 10 course per Geo
41

41

To learn more and deeper

IBM Tivoli Product to monitor and analyse machine logs:
> IBM Log Analytics

Download the presentation
on Pulse2013 site

Session 1844 :
Problem Determination and
Resolution in Minutes Using
Unstructured Data Analytics

Martin O’Brien - Product
Manager
Geetha Adinarayan - Client
Best Practices Lead

42

BIG Thanks you for your attention

43

43

Acknowledgements and Disclaimers:
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all
countries in which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are
provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or
advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this
presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages
arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is
intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering
the terms and conditions of the applicable license agreement governing the use of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they
may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these
materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific
sales, revenue growth or other results.

© Copyright IBM Corporation 2013. All rights reserved.

 U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
 Please update paragraph below for the particular product or family brand trademarks you mention such as WebSphere,
DB2, Maximo, Clearcase, Lotus, etc

IBM, the IBM logo, ibm.com, [IBM Brand, if trademarked], and [IBM Product, if trademarked] are trademarks or registered trademarks
of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked
terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be
registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at www.ibm.com/legal/copytrade.shtml
If you have mentioned trademarks that are not from IBM, please update and add the following lines:
[Insert any special 3rd party trademark names/attributions here]
Other company, product, or service names may be trademarks or service marks of others.
44

1524 how ibm's big data solution can help you gain insight into your data center v2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 1524 how ibm's big data solution can help you gain insight into your data center v2

Similar to 1524 how ibm's big data solution can help you gain insight into your data center v2 (20)

1524 how ibm's big data solution can help you gain insight into your data center v2