3. Rispondere a domande.. sempre nuove, sempre urgenti
e sempre… strategiche
Qual è stata l’ efficacia Come potremmo sfruttare al
Vorrei scoprire nuovi della campagna C123 ? meglio i dati storici per
segmenti cliente…. capire in anticipo le azioni
dei nostri compratori ?
Quali prodotti si
vendono meglio oggi
in Italia?
Cosa dicono le persone del nostro
nuovo prodotto ?
il 91% dei clienti insoddisfatti si
rivolgerà ad altri fornitori Cosa dice la gente al
nostro servizio
Come migliorare la ns customer Call center ?
retention ?
lIntegrare il Business con la Tecnologia
lUtilizzare dati storici e di sintesi – strutturati e non
lTrarre il massimo profitto dall'analisi delle informazioni estratte da tutte le fonti disponibili
3
2012 IBM Corporation
4. Il Data Warehouse e la Business Analytics
sono un’ottima risposta
…sempre sollecitata dal mercato che chiede..
• Volumi più elevati
• Più elevata qualità dei dati
• Maggior controllo sul processo
Reporting
•e soprattutto maggior SEMPLICITA Analysis
AUTONOMIA e PERFORMANCE Predictive Analytics
•…….
Master Data
Management Cubi
Fonti Data
dati da Warehouse
sistemi ETL
gestionali Data
Integration
Data Quality
Data Delivery
4
2012 IBM Corporation
6. E il business è interessato ad acquisire info che
vanno oltre la transazione
Fail
Fail Fail Yes!
Fail Fail
Fail Fail Fail
Fail Fail Fail
Fail
Inizio Fail Fine
processo acquisto Fail processo acquisto
Fail
Albero
Il DWH generalmente traccia la transazione
decisionale
finale, quella conclusiva.
del
Per “leggere” meglio il processo di acquisto serve
processo
conoscere anche il resto
di acquisto
6
2012 IBM Corporation
8. La conoscenza è contenuta anche in fonti non
convenzionali …perchè ignorarle?
25Tb Facebook /giorno
q Il Business necessita di gestire ed usare in modo
massivo una quantità sempre crescente di
informazioni non convenzionali e generalmente
create all’esterno delle organizzazioni aziendali
q La maggior parte di queste informazioni non
convenzionali, sono semistrutturate o
completamente destrutturate
q Le organizzazioni soffrono se non possono
acquisire la conoscenza contenuta nelle
informazioni di business
Ø I sistemi tradizionali analizzano solo dati strutturati
Ø Il mancante 80% è costituito da informazioni non
strutturate o semi strutturate (Gartner).
200k twitter al minuto
Big Data 290 milioni twitter anno
12Tb twitter/giorno
8
2012 IBM Corporation
9. Quando si parla di “data explosion”
83x
6,000,000 users on Twitter 500,000,000 users on Twitter
pushing out 300,000 pushing out 400,000,000
tweets per day tweets per day
1333x
9
2012 IBM Corporation
12. Which is the State-of-The-Art?
>1100 Business Managers >200 CIOs
IBM and the Saïd Business School (on Global Scale) and SDA Bocconi University (on local
Scale) partnered to benchmark global big data activities
12 www.ibm.com/2012bigdatastudy 12
13. Big Data: lo stato dell’arte
1 Customer analytics are driving big data initiatives
Big data is dependent upon a scalable and extensible
2 information foundation
Initial big data efforts are focused on gaining insights
3 from existing and new sources of internal data
4 Big data requires strong analytics capabilities
The emerging pattern of big data adoption is
5 focused upon delivering measureable business value
IBM e Saïd Business School (Università di Oxford – ricerca globale) e Università SDA Bocconi
(Italia) hanno collaborato per un benchmark sulle iniziative Big Data
13
14. Key Findings: Big Data Activities
>1000 Business Managers
24% 47% 28%
Have Not Begun Big Data Planning Big Data Pilot & Implementation of
Activities Activities Big Data Activities
25% 57% 18%
>200 CIOs
14
15. IBM Big Data Platform & Ecosystem
IBM SOCIAL MEDIA ANALYTICS IBM CONTENT ANALYTICS
Out-of-the-Box Social Analytics Out-of-the-Box Text analytics
6 Environment Open environment with Enterprise Search 5
Analytic Applications
INFOSPHERE BI / Exploration / Functional Industry Predictive Content
DATA EXPLORER Reporting Visualization App App Analytics Analytics
BI / PURE DATA for Analytics
Reporting
(VIVISIMO) (NETEZZA)
IBM Big Data Platform 2
4 4
Visualization Application Systems – Optimized Very Large
– Search (and federate
& Discovery Development Management Data Warehousing
data) in a big data context
Accelerators
Hadoop Stream Data INFOSPHERE
INFOSPHERE
System Computing Warehouse STREAMS
BIGINSIGHTS
3
1
– Analyse large
1 – Analyse large
structured and
structured and unstructured data set in
unstructured data sets Information Integration & Governance
streaming
15
16. Il Data Warehouse e la Business Analytics…. ben si
integrano con la BIG DATA platform
1 IBM InfoSphere BigInsights
6
Cognos 5
External Source Systems
Applications
Structured,
Semi Structured/ Unstructured Data Spreadsheets 4
Sensors IBM Vivisimo
Data
Warehouse 2
Master Data
Management Cubi
Netezza
ETL
Data Integration 3
Data Quality IBM InfoSphere Streams
Data Delivery
16
2012 IBM Corporation
17. L’ecosistema Big Data : la chiave è l’interoperabilità
Traditional /
Relational
Traditional
Warehouse Data Sources
Data Analytics on
Warehouse Structured Data
Non-Traditional /
Non-Relational
Streaming Data Sources
Data
InfoSphere Analytics on
Streams Data In-Motion
Non-Traditional/
Non-Relational
Data Sources
Internet-Scale
Data Sets
Traditional/Relational InfoSphere Analytics on
BigInsights Data at Rest
Data Sources
17
2012 IBM Corporation
18. La piattaforma IBM Big Data:
La nuova frontiera di Analisi
Data Ingest
01011001100011101001001001001
100100100110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
Arricchire 01100100101001001010100010010
Analisi Real Time
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
Modello 01100100101001001010100010010
01100100101001001010100010010
Analitico 11000100101001001011001001010
01100100101001001010100010010
Adattivo 01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
18
18
20. Applicazioni Big Data
q Analisi cosa si dice sui Social
Media di un argomento
q Analisi messaggi Call Center
q Analisi dei LOG.
q Identificazione delle frodi.
q Ricercare dati attraverso un
motore federato
q Analisi di dati provenienti
da sensori
q ........
Si ricorre ad una soluzione Big Data, ad esempio, quando:
- risulta necessario analizzare TUTTI i dati potenzialmente disponibili e
quando l’elaborazione di un loro campione non sarebbe significativa
e in grado di fornire risultati efficaci.
- si vuole ESPLORARE, anche in modo interattivo, i dati disponibili nei casi
in cui le misure e gli indicatori di business non siano predeterminati.
- occorre analizzare un FLUSSO CONTINUO ed ampio di dati per prendere
decisioni in tempo reale
Il fenomeno Big Data non è legato ad un particolare settore di industria
fa leva sulla crescita del volume dei dati e su ulteriori dimensioni come la
Velocità e la Varietà dei dati disponibili. 20
21. Biginsights per elaborare in
Vestas optimizes
modo molto veloce Petabytes
di dati capital investments
based on 2.5
Petabytes of
information.
§ Model the weather to
optimize placement of
turbines, maximizing power
generation and longevity.
§ Reduce time required to
identify placement of turbine
from weeks to hours.
§ Incorporate 2.5 PB of
structured and semi-
structured information flows.
Data volume expected to
grow to 6 PB. 21
21
2012 IBM Corporation
22. Infosphere Streams e Cisco turns to IBM big
Biginsights data for intelligent
per la gestione degli ambienti
infrastructure
management.
§ Optimize building energy
consumption with centralized
monitoring and control of
building monitoring system.
§ Automates preventive and
corrective maintenance of
building systems.
§ Uses Streams, InfoSphere
BigInsights and Cognos
§ Log Analytics
§ Energy Bill Forecasting
§ Energy consumption optimization
§ Detection of anomalous usage
22
22
§ Presence-aware energy mgt.
22
2012 IBM Corporation
§ Policy enforcement
23. Infosphere Streams nel campo medico
Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to
predict infection in ICU 24 hours in advance
IBM Data Baby
youtube.com
23
23
2012 IBM Corporation
24. Infosphere Streams per la Dublin City Centre Increases
ottimizzazione del traffico Bus Transportation
Performance
Capabilities Utilized:
Stream Computing
• Public transportation awareness solution
improves on-time performance and
provides real-time bus arrival info to
riders
• Continuously analyzes bus location data
to infer traffic conditions and predict
arrivals
• Collects, processes, and visualizes
location data of all bus vehicles
• Automatically generates transportation
routes and stop locations
Results:
• Monitoring 600 buses across 150 routes
• Analyzing 50 bus locations per second
• Anticipated to Increase bus ridership
24
24
2012 IBM Corporation
24
25. CUSTOMER Analytics – GRUPO BBVA
seamlessly monitors and improves its online reputation
.
- Enables BBVA to consistently respond to and gain insight
into customer needs and feedback.
“What is great about this
solution is that it helps
- Gives BBVA the ability to measure the success of its outputs us to focus our actions
on the most important
and approaches to engaging stakeholders and customers. topics of online
discussions and
immediately plan the
- Shows whether positive or negative sentiments have
correct and most
increased or not, looks for the source and reason of suitable reaction.” –
Online Communication
comments and helps make decisions and plans. Department, BBVA
Behavioral
Data
25
26. CUSTOMER Analytics – MEDIASET
.
Social Analytics to collect Customer longitudinal point
of views from Web 2.0 and correlate them
with internal data “Big Data is a great
opportunity for TV
innovation in the next
Better understand its marketing campaigns and consumer
years. TV viewing is
preferences, transforming into a
multiplatform and
participative experience:
Looking for ways to analyze and differentiate consumer
the better we know and
experiences understand our viewers,
the better we can serve
them." – Valerio Motti,
Helped the client to assess the company’s corporate brands, Head of Marketing
with respect to one of its main pay-TV competitors Innovation, Mediaset
S.p.A.
Trandational
Data
Behavioral
Data
26
26
31. BIG Data : alcuni utili link
Big Data HUB & Success Stories
http://www.ibmbigdatahub.com/
Big Data University
http://bigdatauniversity.com/
BigInsights tec enablement wiki
https://www.ibm.com/developerworks/mydeveloperworks/wi
kis/home?lang=en_US#/wiki/BigInsights
FREE ebook – Harness the power of BigData
http://www.ibmbigdatahub.com/blog/research-director-reflects-
new-big-data-book
31
2012 IBM Corporation
34. Biginsights basato su Hadoop ….. perchè ?
CPU istruzioni al secondo – miglioramenti significativi
1990 44 Mips at 40 Mhz
2000 3.562 Mips at 1.2 Ghz
2010 147.600 Mips at 3.3 Ghz
RAM Memory - miglioramenti significativi
– 1990 640 K
– 2000 64 Mb
– 2010 8-32 GB
Disk capacity - miglioramenti significativi
– 1990 20 MB
– 2000 10 GB
– 2010 1 TB
Disk latency (velocità di leggere e scrivere su disco ) - miglioramenti
poco significativi
Negli ultimi 7-10 anni non ci sono state enormi migliorie
correntemente la velocita è di circa 70 – 80 MB / sec
34
2012 IBM Corporation
35. Quanto tempo ci vuole per scandire 1 TB ?
q 1 TB (at 80 MB / sec)
– 1 disk 3.4 hours
– 10 disks 20 min
– 100 disks 2 min
– 1000 disks 12 sec
q Per ovviare alla Disc Latency la risposta è la ..elaborazione parallela
q Hadoop : un nuovo modo per memorizzare ed elaborare i dati
ØScritto in Java
ØProgettato per lavorare su hardware non specializzato
ØGira in ambiente Linux
ØScalabile, Flessibile,Robusto
35
2012 IBM Corporation
36. What is Hadoop?
§ Apache Hadoop = free, open source framework for data-
intensive applications
– Inspired by Google technologies (MapReduce, GFS)
– Yahoo has been the largest contributor to the project (Doug Cutting),
– Well-suited to batch-oriented, read-intensive applications
– Originally built to address scalability problems of Nutch, an open source
Web search technology
§ Enables applications to work with thousands of nodes and
petabytes of data in a highly parallel, cost effective manner
– CPU + disks of commodity box = Hadoop “node”
– Boxes can be combined into clusters
– New nodes can be added as needed without changing
• Data formats
• How data is loaded
• How jobs are written
36
2012 IBM Corporation
37. Two Key Aspects of Hadoop
§ MapReduce framework
– MapReduce is a software framework introduced by
Google to support distributed computing on large data
sets of clusters of computers.
– How Hadoop understands and assigns work to the nodes
(machines)
§ Hadoop Distributed File System = HDFS
– Where Hadoop stores data
– A file system that spans all the nodes in a Hadoop cluster
– It links together the file systems on many local nodes to
make them into one big file system
37
2012 IBM Corporation
38. Hadoop ed il paradigma Map Reduce
§I dati sono memorizzati su un sistema distribuito di server
§Le funzioni elaborative vengono inviate dove ci sono I dati
§Ogni server elabora I dati di propria competenza e condivide i risultati
§Il sistema può scalare raggiungendo migliaia di nodi e PB di dati
public static class TokenizerMapper
public static class TokenizerMapper
Hadoop Data Nodes
extends Mapper<Object,Text,Text,IntWritable> {{
extends Mapper<Object,Text,Text,IntWritable>
private final static IntWritable
private final static IntWritable
one == new IntWritable(1);
one new IntWritable(1);
private Text word == new Text();
private Text word new Text();
public void map(Object key, Text val, Context
public void map(Object key, Text val, Context
StringTokenizer itr ==
StringTokenizer itr
new StringTokenizer(val.toString());
new StringTokenizer(val.toString());
while (itr.hasMoreTokens()) {{
while (itr.hasMoreTokens())
word.set(itr.nextToken());
word.set(itr.nextToken());
context.write(word, one);
context.write(word, one);
}}
1. Map Phase
(spezza il job in piccole parti)
}}
}}
public static class IntSumReducer
2. Shuffle
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWrita
extends Reducer<Text,IntWritable,Text,IntWrita
Distribute map
private IntWritable result == new IntWritable();
private IntWritable result new IntWritable();
public void reduce(Text key,
public void reduce(Text key,
Iterable<IntWritable> val, Context context){
Iterable<IntWritable> val, Context context){
int sum == 0;
int sum 0;
for (IntWritable vv :: val) {{ tasks to cluster (riordina I risultati parziali per
for (IntWritable val)
sum += v.get();
sum += v.get();
.. .. ..
le elaborazione finale)
3. Reduce Phase
(rielabora il tutto per ottenere
MapReduce Application
un singolo risultato)
Shuffle
Result Set Return a single result set
38
2012 IBM Corporation
39. BigInsights estende le capabilities di Hadoop open
source con l’aggiunta di nuove funzionalità ….
InfoSphere BigInsights
Advanced Engines
Development Tools
Analytic Applications
Enterprise Indexing
BI / Exploration / Functional Industry Predictive Content
capabilities Reporting Visualization App App BI /
Analytics Analytics
Report
Connectors ing
IBM Big Data Platform
Visualization Application Systems
Workload Optimization & Discovery Development Management
Administration & Security Accelerators
Hadoop Stream Data
System Computing Warehouse
Open source IBM tested & supported
based open source components Information Integration & Governance
components
39
2012 IBM Corporation
40. Infosphere BigInsights : due edizioni
Con BigInsights le aziende possono indirizzare l’ elaborazione di enormi quantità di
dati mai prima sfruttate e ricavare nuova conoscenza in modo efficiente, ottimizzato e
scalabile.
Tale infrastruttura sfrutta il MapReduce framework di Hadoop per affrontare
l’elaborazione parallela di grandi insiemi di dati distribuiti su numerosi nodi. 40
40
2012 IBM Corporation
41. Infosphere BigInsights : due edizioni
Enterprise Edition
GPFS-SNC Native Support*
Spreadsheet-style data exploration
Job and Workflow Management
Productivity and Efficiency Improvements
Integration with InfoSphere Warehouse
Integration with Netezza
Integration with DB2
Large Scale Indexing
Basic Edition Text Analytics
Machine Learning*
Free Download, Easy Installation Tiered Terabyte Pricing
24x7 Web Support, 10TB Limit
Paid Support Option * = coming soon
41
45. Infosphere Streams
InfoSphere Streams dispone di un’infrastruttura software agile e scalabile per
l’analisi in tempo reale di enormi flussi di dati in movimento, di qualsiasi natura e
provenienti da innumerevoli sorgenti.
Tale tipo di elaborazione aumenta la precisione e la velocità del processo
decisionale in diversi campi come quelli sanitario, astronomico,
manifatturiero, finanziario e molti altri ancora.
45
2012 IBM Corporation
46. Categories of Problems Solved by Streams
§ Applications that require on-the-fly processing, filtering and analysis
of streaming data
– Sensors: environmental, industrial, surveillance video, GPS, …
– “Data exhaust”: network/system/web server/app server log files
– High-rate transaction data: financial transactions, call detail records
§ Criteria: two or more of the following
– Messages are processed in isolation or in limited data windows
– Sources include non-traditional data (spatial, imagery, text, …)
– Sources vary in connection methods, data rates, and processing
requirements, presenting integration challenges
– Data rates/volumes require the resources of multiple processing nodes
– Analysis and response are needed with sub-millisecond latency
– Data rates and volumes are too great for store-and-mine approaches
46
2012 IBM Corporation
47. Elaborazione real time time con infosphere streams
à continuous ingestion infrastructure provides services for
scheduling analytics across h/w nodes
à continuous analysis
establishing streaming connectivity
…
Filter
Transform Annotate
Correlate
Classify
achieve scale
by partitioning applications into components 47
by distributing across stream-connected hardware nodes 2012 IBM Corporation
48. Infosphere Data
Explorer
(ex VIVISIMO)
48
2012 IBM Corporation
49. Vivisimo e la sua missione
Aiuta le organizzazioni a scoprire,
organizzare, analizzare e navigare
grandi quantità di dati eterogenei e
dinamici, sia strutturati che
destrutturati, indipendentemente da
dove siano gestiti o storicizzati, per
incrementare l’efficienza ed il valore
nei processi di business.
49
2012 IBM Corporation
50. Vivisimo nell’azienda
Relational
Data § Garantire l'accesso a numerose
applicazioni e archivi dati
File
Systems
§ Scoprire e navigare all’interno di
Content
Management
tutta l’azienda
§ Fondere informazioni strutturate
Email
Velocity Platform
e non strutturate per guidare
l’azienda verso:
CRM
Application/ – Migliori decisioni
Users
Supply
– Operazioni più efficienti
Chain – Migliore comprensione dei
clienti
ERP
– Innovazione
Commenting
RSS Feeds § Strumenti Social per la
Tagging
collaborazione ed il riutilizzo
Rating Cloud
Shared Custom
Folders Sources
Social Tools External
Sources
50
2012 IBM Corporation
54. CUSTOMER Analytics - alcuni esempi ..
Deeper Customer Analytics Examples and Best Practice and leverage Big Data:
Ready for Business
Behavioral Data
Connect with Clients &
prospects, with Brands
...analyse strong and weak
Delight customers with targeted signals in discussion
….social and transactional
propositions You
Real time interaction
Interaction Data across channels Transaction
Interact!
Data
Single view
Business Data,
Social Data,
Interactive data
Enterprise Systems
54
55. CUSTOMER Analytics – MOBY Lines .
Digital marketing optimization: lifetime individual
tracking, microsegmentation, channel attribution,
proposition automation
Intuitive
Social
collection
Digital & Multichannel
Marketing / individual
digital analytics, real time Single view
Business Data,
monitoring, I/O ERP data, Social Data,
Interactive data
dynamic segments, mkt.
automation 55
Enterprise Systems
56. CUSTOMER Analytics – GARANTY bank – un filmato..
Garanty Real time interaction across channels
Single view
Business Data,
Social Data,
Interactive data
56