Big data ibm

Overview della
proposta IBM
22.marzo.2013

Carlo Patrini
Information Architect
carlo.patrini@it.ibm.com
+393357248561

© 2013 IBM Corporation

Abbiamo bisogno di acquisire maggiore conoscenza

Le esigenze di acquisire maggior conoscenza
(insights) sono sempre più necessarie ed urgenti

2

Rispondere a domande.. sempre nuove, sempre urgenti
e sempre… strategiche
Qual è stata l’ efficacia Come potremmo sfruttare al
Vorrei scoprire nuovi della campagna C123 ? meglio i dati storici per
segmenti cliente…. capire in anticipo le azioni
dei nostri compratori ?
Quali prodotti si
vendono meglio oggi
in Italia?

Cosa dicono le persone del nostro
nuovo prodotto ?

il 91% dei clienti insoddisfatti si
rivolgerà ad altri fornitori Cosa dice la gente al
nostro servizio
Come migliorare la ns customer Call center ?
retention ?

lIntegrare il Business con la Tecnologia
lUtilizzare dati storici e di sintesi – strutturati e non
lTrarre il massimo profitto dall'analisi delle informazioni estratte da tutte le fonti disponibili

3
2012 IBM Corporation

Il Data Warehouse e la Business Analytics
sono un’ottima risposta
…sempre sollecitata dal mercato che chiede..

• Volumi più elevati
• Più elevata qualità dei dati
• Maggior controllo sul processo
Reporting
•e soprattutto maggior SEMPLICITA Analysis
AUTONOMIA e PERFORMANCE Predictive Analytics
•…….

Master Data
Management Cubi

Fonti Data
dati da Warehouse
sistemi ETL
gestionali Data
Integration
Data Quality
Data Delivery
4

DWH più snelli, veloci e reattivi …
l’appliance DWH è la soluzione

Il DWH è fondamentale
però a volte è lento e troppo Mumble
ingessato e non evolve
con i tempi del business .. mumble….
la soluzione è
IBM Netezza

5

E il business è interessato ad acquisire info che
vanno oltre la transazione

Fail
Fail Fail Yes!
Fail Fail
Fail Fail Fail

Fail Fail Fail
Fail
Inizio Fail Fine
processo acquisto Fail processo acquisto
Fail
Albero
Il DWH generalmente traccia la transazione
decisionale
finale, quella conclusiva.
del
Per “leggere” meglio il processo di acquisto serve
processo
conoscere anche il resto
di acquisto

6

Big Data: il nuovo oceano dei dati
I dati sotto la superficie ancora
inesplorati

12+ terabytes 30 miliardi
di Tweets Sensori, RFID, altri device
al giorno che generano dati in
streaming

Volume Velocità

Varietà Veridicità
100’s Solo 1 su 3
Di tipi dati diversi Utenti di business ritiene di
avere informazioni affidabili 7
7 © 2013 IBM Corporation

La conoscenza è contenuta anche in fonti non
convenzionali …perchè ignorarle?
25Tb Facebook /giorno
q Il Business necessita di gestire ed usare in modo
massivo una quantità sempre crescente di
informazioni non convenzionali e generalmente
create all’esterno delle organizzazioni aziendali

q La maggior parte di queste informazioni non
convenzionali, sono semistrutturate o
completamente destrutturate

q Le organizzazioni soffrono se non possono
acquisire la conoscenza contenuta nelle
informazioni di business
Ø I sistemi tradizionali analizzano solo dati strutturati
Ø Il mancante 80% è costituito da informazioni non
strutturate o semi strutturate (Gartner).

200k twitter al minuto

Big Data 290 milioni twitter anno
12Tb twitter/giorno

8

Quando si parla di “data explosion”

83x

6,000,000 users on Twitter 500,000,000 users on Twitter
pushing out 300,000 pushing out 400,000,000
tweets per day tweets per day
1333x

9

Approccio Tradizionale e Approccio Big Data

10

BIG DATA
Stato dell’arte
11

Which is the State-of-The-Art?

>1100 Business Managers >200 CIOs

IBM and the Saïd Business School (on Global Scale) and SDA Bocconi University (on local
Scale) partnered to benchmark global big data activities
12 www.ibm.com/2012bigdatastudy 12

Big Data: lo stato dell’arte

1 Customer analytics are driving big data initiatives

Big data is dependent upon a scalable and extensible
2 information foundation

Initial big data efforts are focused on gaining insights
3 from existing and new sources of internal data

4 Big data requires strong analytics capabilities

The emerging pattern of big data adoption is
5 focused upon delivering measureable business value

IBM e Saïd Business School (Università di Oxford – ricerca globale) e Università SDA Bocconi
(Italia) hanno collaborato per un benchmark sulle iniziative Big Data
13

Key Findings: Big Data Activities

>1000 Business Managers

24% 47% 28%
Have Not Begun Big Data Planning Big Data Pilot & Implementation of
Activities Activities Big Data Activities

25% 57% 18%

>200 CIOs

14

IBM Big Data Platform & Ecosystem
IBM SOCIAL MEDIA ANALYTICS IBM CONTENT ANALYTICS
Out-of-the-Box Social Analytics Out-of-the-Box Text analytics
6 Environment Open environment with Enterprise Search 5

Analytic Applications
INFOSPHERE BI / Exploration / Functional Industry Predictive Content
DATA EXPLORER Reporting Visualization App App Analytics Analytics
BI / PURE DATA for Analytics
Reporting
(VIVISIMO) (NETEZZA)
IBM Big Data Platform 2
4 4
Visualization Application Systems – Optimized Very Large
– Search (and federate
& Discovery Development Management Data Warehousing
data) in a big data context

Accelerators

Hadoop Stream Data INFOSPHERE
INFOSPHERE
System Computing Warehouse STREAMS
BIGINSIGHTS
3
1
– Analyse large
1 – Analyse large
structured and
structured and unstructured data set in
unstructured data sets Information Integration & Governance
streaming

15

Il Data Warehouse e la Business Analytics…. ben si
integrano con la BIG DATA platform
1 IBM InfoSphere BigInsights

6
Cognos 5
External Source Systems
Applications
Structured,
Semi Structured/ Unstructured Data Spreadsheets 4
Sensors IBM Vivisimo
Data
Warehouse 2
Master Data
Management Cubi

Netezza
ETL
Data Integration 3
Data Quality IBM InfoSphere Streams
Data Delivery

16

L’ecosistema Big Data : la chiave è l’interoperabilità

Traditional /
Relational
Traditional
Warehouse Data Sources

Data Analytics on
Warehouse Structured Data

Non-Traditional /
Non-Relational
Streaming Data Sources
Data

InfoSphere Analytics on
Streams Data In-Motion

Non-Traditional/
Non-Relational
Data Sources
Internet-Scale
Data Sets
Traditional/Relational InfoSphere Analytics on
BigInsights Data at Rest
Data Sources

17

La piattaforma IBM Big Data:
La nuova frontiera di Analisi

Data Ingest
01011001100011101001001001001
100100100110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
Arricchire 01100100101001001010100010010
Analisi Real Time

01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
Modello 01100100101001001010100010010
01100100101001001010100010010
Analitico 11000100101001001011001001010
01100100101001001010100010010
Adattivo 01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010

18
18

Analisi Tradizionale estesa ai Big Data

1 Pre-Processing Hub 2 Query-able Archive 3 Exploratory Analysis

Data Explorer
Combinare dati
strutturati con
Data Explorer non strutturati
BigInsight Find and view

Streams BigInsight BigInsight

Streams
Information Information Information Server
Server Server

Data Data Data
Warehouse Warehouse Warehouse
19

Applicazioni Big Data

q Analisi cosa si dice sui Social
Media di un argomento
q Analisi messaggi Call Center
q Analisi dei LOG.
q Identificazione delle frodi.
q Ricercare dati attraverso un
motore federato
q Analisi di dati provenienti
da sensori
q ........
Si ricorre ad una soluzione Big Data, ad esempio, quando:

- risulta necessario analizzare TUTTI i dati potenzialmente disponibili e
quando l’elaborazione di un loro campione non sarebbe significativa
e in grado di fornire risultati efficaci.
- si vuole ESPLORARE, anche in modo interattivo, i dati disponibili nei casi
in cui le misure e gli indicatori di business non siano predeterminati.
- occorre analizzare un FLUSSO CONTINUO ed ampio di dati per prendere
decisioni in tempo reale

Il fenomeno Big Data non è legato ad un particolare settore di industria
fa leva sulla crescita del volume dei dati e su ulteriori dimensioni come la
Velocità e la Varietà dei dati disponibili. 20

Biginsights per elaborare in
Vestas optimizes
modo molto veloce Petabytes
di dati capital investments
based on 2.5
Petabytes of
information.
§ Model the weather to
optimize placement of
turbines, maximizing power
generation and longevity.
§ Reduce time required to
identify placement of turbine
from weeks to hours.
§ Incorporate 2.5 PB of
structured and semi-
structured information flows.
Data volume expected to
grow to 6 PB. 21
21

Infosphere Streams e Cisco turns to IBM big
Biginsights data for intelligent
per la gestione degli ambienti
infrastructure
management.
§ Optimize building energy
consumption with centralized
monitoring and control of
building monitoring system.
§ Automates preventive and
corrective maintenance of
building systems.
§ Uses Streams, InfoSphere
BigInsights and Cognos
§ Log Analytics
§ Energy Bill Forecasting
§ Energy consumption optimization
§ Detection of anomalous usage
22
22
§ Presence-aware energy mgt.
22
§ Policy enforcement

Infosphere Streams nel campo medico

Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to
predict infection in ICU 24 hours in advance

IBM Data Baby
youtube.com
23
23

Infosphere Streams per la Dublin City Centre Increases
ottimizzazione del traffico Bus Transportation
Performance
Capabilities Utilized:
Stream Computing

• Public transportation awareness solution
improves on-time performance and
provides real-time bus arrival info to
riders
• Continuously analyzes bus location data
to infer traffic conditions and predict
arrivals
• Collects, processes, and visualizes
location data of all bus vehicles
• Automatically generates transportation
routes and stop locations

Results:
• Monitoring 600 buses across 150 routes
• Analyzing 50 bus locations per second
• Anticipated to Increase bus ridership
24
24
24

CUSTOMER Analytics – GRUPO BBVA
seamlessly monitors and improves its online reputation
.
- Enables BBVA to consistently respond to and gain insight
into customer needs and feedback.
“What is great about this
solution is that it helps
- Gives BBVA the ability to measure the success of its outputs us to focus our actions
on the most important
and approaches to engaging stakeholders and customers. topics of online
discussions and
immediately plan the
- Shows whether positive or negative sentiments have
correct and most
increased or not, looks for the source and reason of suitable reaction.” –
Online Communication
comments and helps make decisions and plans. Department, BBVA

Behavioral
Data
25

CUSTOMER Analytics – MEDIASET
.
Social Analytics to collect Customer longitudinal point
of views from Web 2.0 and correlate them
with internal data “Big Data is a great
opportunity for TV
innovation in the next
Better understand its marketing campaigns and consumer
years. TV viewing is
preferences, transforming into a
multiplatform and
participative experience:
Looking for ways to analyze and differentiate consumer
the better we know and
experiences understand our viewers,
the better we can serve
them." – Valerio Motti,
Helped the client to assess the company’s corporate brands, Head of Marketing
with respect to one of its main pay-TV competitors Innovation, Mediaset
S.p.A.

Trandational
Data

Behavioral
Data
26
26

VIVISIMO – referenze

27

CASE
History

28

SUCCESS STORIES : tra le varie fonti…. eccone due

LINK

PDF File
Ricorda : Recuperare link che contiene questo doc 29

LINK UTILI

30

BIG Data : alcuni utili link

Big Data HUB & Success Stories
http://www.ibmbigdatahub.com/

Big Data University
http://bigdatauniversity.com/

BigInsights tec enablement wiki
https://www.ibm.com/developerworks/mydeveloperworks/wi
kis/home?lang=en_US#/wiki/BigInsights

FREE ebook – Harness the power of BigData
http://www.ibmbigdatahub.com/blog/research-director-reflects-
new-big-data-book

31

Mi fermo qui….

grazie per la
pazienza

32

HADOOP
&
BIGINSIGHTS
33

Biginsights basato su Hadoop ….. perchè ?

CPU istruzioni al secondo – miglioramenti significativi
1990 44 Mips at 40 Mhz
2000 3.562 Mips at 1.2 Ghz
2010 147.600 Mips at 3.3 Ghz

RAM Memory - miglioramenti significativi
– 1990 640 K
– 2000 64 Mb
– 2010 8-32 GB

Disk capacity - miglioramenti significativi
– 1990 20 MB
– 2000 10 GB
– 2010 1 TB

Disk latency (velocità di leggere e scrivere su disco ) - miglioramenti
poco significativi
Negli ultimi 7-10 anni non ci sono state enormi migliorie
correntemente la velocita è di circa 70 – 80 MB / sec

34

Quanto tempo ci vuole per scandire 1 TB ?

q 1 TB (at 80 MB / sec)
– 1 disk 3.4 hours
– 10 disks 20 min
– 100 disks 2 min
– 1000 disks 12 sec

q Per ovviare alla Disc Latency la risposta è la ..elaborazione parallela

q Hadoop : un nuovo modo per memorizzare ed elaborare i dati

ØScritto in Java
ØProgettato per lavorare su hardware non specializzato
ØGira in ambiente Linux
ØScalabile, Flessibile,Robusto

35

What is Hadoop?

§ Apache Hadoop = free, open source framework for data-
intensive applications
– Inspired by Google technologies (MapReduce, GFS)
– Yahoo has been the largest contributor to the project (Doug Cutting),
– Well-suited to batch-oriented, read-intensive applications
– Originally built to address scalability problems of Nutch, an open source
Web search technology

§ Enables applications to work with thousands of nodes and
petabytes of data in a highly parallel, cost effective manner
– CPU + disks of commodity box = Hadoop “node”
– Boxes can be combined into clusters
– New nodes can be added as needed without changing
• Data formats
• How data is loaded
• How jobs are written

36

Two Key Aspects of Hadoop

§ MapReduce framework
– MapReduce is a software framework introduced by
Google to support distributed computing on large data
sets of clusters of computers.
– How Hadoop understands and assigns work to the nodes
(machines)

§ Hadoop Distributed File System = HDFS
– Where Hadoop stores data
– A file system that spans all the nodes in a Hadoop cluster
– It links together the file systems on many local nodes to
make them into one big file system

37

Hadoop ed il paradigma Map Reduce
§I dati sono memorizzati su un sistema distribuito di server
§Le funzioni elaborative vengono inviate dove ci sono I dati
§Ogni server elabora I dati di propria competenza e condivide i risultati
§Il sistema può scalare raggiungendo migliaia di nodi e PB di dati

public static class TokenizerMapper
public static class TokenizerMapper
Hadoop Data Nodes
extends Mapper<Object,Text,Text,IntWritable> {{
extends Mapper<Object,Text,Text,IntWritable>
private final static IntWritable
private final static IntWritable
one == new IntWritable(1);
one new IntWritable(1);
private Text word == new Text();
private Text word new Text();
public void map(Object key, Text val, Context
public void map(Object key, Text val, Context
StringTokenizer itr ==
StringTokenizer itr
new StringTokenizer(val.toString());
new StringTokenizer(val.toString());
while (itr.hasMoreTokens()) {{
while (itr.hasMoreTokens())
word.set(itr.nextToken());
word.set(itr.nextToken());
context.write(word, one);
context.write(word, one);
}}
1. Map Phase
(spezza il job in piccole parti)
}}
}}

public static class IntSumReducer

2. Shuffle
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWrita
extends Reducer<Text,IntWritable,Text,IntWrita

Distribute map
private IntWritable result == new IntWritable();
private IntWritable result new IntWritable();
public void reduce(Text key,
public void reduce(Text key,
Iterable<IntWritable> val, Context context){
Iterable<IntWritable> val, Context context){
int sum == 0;
int sum 0;
for (IntWritable vv :: val) {{ tasks to cluster (riordina I risultati parziali per
for (IntWritable val)
sum += v.get();
sum += v.get();
.. .. ..
le elaborazione finale)
3. Reduce Phase
(rielabora il tutto per ottenere
MapReduce Application
un singolo risultato)
Shuffle

Result Set Return a single result set

38

BigInsights estende le capabilities di Hadoop open
source con l’aggiunta di nuove funzionalità ….

InfoSphere BigInsights

Advanced Engines

Development Tools
Analytic Applications
Enterprise Indexing
BI / Exploration / Functional Industry Predictive Content

capabilities Reporting Visualization App App BI /
Analytics Analytics
Report

Connectors ing

IBM Big Data Platform
Visualization Application Systems
Workload Optimization & Discovery Development Management

Administration & Security Accelerators

Hadoop Stream Data
System Computing Warehouse

Open source IBM tested & supported
based open source components Information Integration & Governance
components

39

Infosphere BigInsights : due edizioni

Con BigInsights le aziende possono indirizzare l’ elaborazione di enormi quantità di
dati mai prima sfruttate e ricavare nuova conoscenza in modo efficiente, ottimizzato e
scalabile.
Tale infrastruttura sfrutta il MapReduce framework di Hadoop per affrontare
l’elaborazione parallela di grandi insiemi di dati distribuiti su numerosi nodi. 40
40

Infosphere BigInsights : due edizioni

Enterprise Edition
GPFS-SNC Native Support*
Spreadsheet-style data exploration
Job and Workflow Management
Productivity and Efficiency Improvements
Integration with InfoSphere Warehouse
Integration with Netezza
Integration with DB2
Large Scale Indexing
Basic Edition Text Analytics
Machine Learning*
Free Download, Easy Installation Tiered Terabyte Pricing
24x7 Web Support, 10TB Limit
Paid Support Option * = coming soon

41

Biginsights on Cloud

42

IBM BigInsights on Cloud
Hadoop for everyone

43

Infosphere Streams

44

Infosphere Streams

InfoSphere Streams dispone di un’infrastruttura software agile e scalabile per
l’analisi in tempo reale di enormi flussi di dati in movimento, di qualsiasi natura e
provenienti da innumerevoli sorgenti.

Tale tipo di elaborazione aumenta la precisione e la velocità del processo
decisionale in diversi campi come quelli sanitario, astronomico,
manifatturiero, finanziario e molti altri ancora.
45

Categories of Problems Solved by Streams

§ Applications that require on-the-fly processing, filtering and analysis
of streaming data
– Sensors: environmental, industrial, surveillance video, GPS, …
– “Data exhaust”: network/system/web server/app server log files
– High-rate transaction data: financial transactions, call detail records

§ Criteria: two or more of the following
– Messages are processed in isolation or in limited data windows
– Sources include non-traditional data (spatial, imagery, text, …)
– Sources vary in connection methods, data rates, and processing
requirements, presenting integration challenges
– Data rates/volumes require the resources of multiple processing nodes
– Analysis and response are needed with sub-millisecond latency
– Data rates and volumes are too great for store-and-mine approaches

46

Elaborazione real time time con infosphere streams

à continuous ingestion infrastructure provides services for
scheduling analytics across h/w nodes
à continuous analysis
establishing streaming connectivity
…
Filter
Transform Annotate

Correlate
Classify

achieve scale
by partitioning applications into components 47
by distributing across stream-connected hardware nodes 2012 IBM Corporation

Infosphere Data
Explorer
(ex VIVISIMO)
48

Vivisimo e la sua missione

Aiuta le organizzazioni a scoprire,
organizzare, analizzare e navigare
grandi quantità di dati eterogenei e
dinamici, sia strutturati che
destrutturati, indipendentemente da
dove siano gestiti o storicizzati, per
incrementare l’efficienza ed il valore
nei processi di business.

49

Vivisimo nell’azienda

Relational
Data § Garantire l'accesso a numerose
applicazioni e archivi dati
File
Systems
§ Scoprire e navigare all’interno di
Content
Management
tutta l’azienda
§ Fondere informazioni strutturate
Email

Velocity Platform
e non strutturate per guidare
l’azienda verso:
CRM
Application/ – Migliori decisioni
Users
Supply
– Operazioni più efficienti
Chain – Migliore comprensione dei
clienti
ERP
– Innovazione
Commenting

RSS Feeds § Strumenti Social per la
Tagging
collaborazione ed il riutilizzo
Rating Cloud

Shared Custom
Folders Sources

Social Tools External
Sources
50

Vivisimo ricerca federata

51
51

Vivisimo architettura

User Profiles Application SDK Federated Sources

Authentication/Authorization
Query transformation
Personalization
Display Subscriptions Feeds Web Results

Text Analytics Meta-Data
Search Engine
Thesauri Faceting
Clustering BI
Ontology Support Tagging
Semantic Processing Taxonomy
Entity Extraction Collaboration
Relevancy

Connector
Framework

CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems

52

CUSTOMER
Analytics
esempi..
53

CUSTOMER Analytics - alcuni esempi ..
Deeper Customer Analytics Examples and Best Practice and leverage Big Data:
Ready for Business

Behavioral Data

Connect with Clients &
prospects, with Brands
...analyse strong and weak
Delight customers with targeted signals in discussion
….social and transactional
propositions You

Real time interaction
Interaction Data across channels Transaction
Interact!
Data

Single view
Business Data,
Social Data,
Interactive data

Enterprise Systems
54

CUSTOMER Analytics – MOBY Lines .
Digital marketing optimization: lifetime individual
tracking, microsegmentation, channel attribution,
proposition automation
Intuitive

Social
collection

Digital & Multichannel
Marketing / individual
digital analytics, real time Single view
Business Data,
monitoring, I/O ERP data, Social Data,
Interactive data
dynamic segments, mkt.
automation 55
Enterprise Systems

CUSTOMER Analytics – GARANTY bank – un filmato..
Garanty Real time interaction across channels

Single view
Business Data,
Social Data,
Interactive data

56

Big data ibm

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Big data ibm

Similar to Big data ibm (20)

More from gmrinaldi

More from gmrinaldi (6)

Big data ibm