Your SlideShare is downloading. ×
0
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Big data ibm
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big data ibm

2,170

Published on

1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,170
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
92
Comments
1
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Overview della proposta IBM 22.marzo.2013Carlo PatriniInformation Architectcarlo.patrini@it.ibm.com+393357248561 © 2013 IBM Corporation
  • 2. Abbiamo bisogno di acquisire maggiore conoscenzaLe esigenze di acquisire maggior conoscenza(insights) sono sempre più necessarie ed urgenti 2 © 2013 IBM Corporation
  • 3. Rispondere a domande.. sempre nuove, sempre urgenti e sempre… strategiche Qual è stata l’ efficacia Come potremmo sfruttare al Vorrei scoprire nuovi della campagna C123 ? meglio i dati storici per segmenti cliente…. capire in anticipo le azioni dei nostri compratori ? Quali prodotti si vendono meglio oggi in Italia? Cosa dicono le persone del nostro nuovo prodotto ? il 91% dei clienti insoddisfatti si rivolgerà ad altri fornitori Cosa dice la gente al nostro servizio Come migliorare la ns customer Call center ? retention ?lIntegrare il Business con la TecnologialUtilizzare dati storici e di sintesi – strutturati e nonlTrarre il massimo profitto dallanalisi delle informazioni estratte da tutte le fonti disponibili 3 2012 IBM Corporation
  • 4. Il Data Warehouse e la Business Analytics sono un’ottima risposta …sempre sollecitata dal mercato che chiede.. • Volumi più elevati • Più elevata qualità dei dati • Maggior controllo sul processo Reporting •e soprattutto maggior SEMPLICITA Analysis AUTONOMIA e PERFORMANCE Predictive Analytics •…….Master DataManagement Cubi Fonti Data dati da Warehouse sistemi ETLgestionali Data Integration Data Quality Data Delivery 4 2012 IBM Corporation
  • 5. DWH più snelli, veloci e reattivi …l’appliance DWH è la soluzioneIl DWH è fondamentaleperò a volte è lento e troppo Mumbleingessato e non evolvecon i tempi del business .. mumble….la soluzione èIBM Netezza 5 © 2013 IBM Corporation
  • 6. E il business è interessato ad acquisire info chevanno oltre la transazione Fail Fail Fail Yes! Fail Fail Fail Fail Fail Fail Fail Fail Fail Inizio Fail Fine processo acquisto Fail processo acquisto Fail Albero Il DWH generalmente traccia la transazione decisionale finale, quella conclusiva. del Per “leggere” meglio il processo di acquisto serve processo conoscere anche il resto di acquisto 6 2012 IBM Corporation
  • 7. Big Data: il nuovo oceano dei dati I dati sotto la superficie ancora inesplorati 12+ terabytes 30 miliardi di Tweets Sensori, RFID, altri device al giorno che generano dati in streaming Volume Velocità Varietà Veridicità 100’s Solo 1 su 3 Di tipi dati diversi Utenti di business ritiene di avere informazioni affidabili 77 © 2013 IBM Corporation
  • 8. La conoscenza è contenuta anche in fonti non convenzionali …perchè ignorarle? 25Tb Facebook /giornoq Il Business necessita di gestire ed usare in modo massivo una quantità sempre crescente di informazioni non convenzionali e generalmente create all’esterno delle organizzazioni aziendaliq La maggior parte di queste informazioni non convenzionali, sono semistrutturate o completamente destrutturateq Le organizzazioni soffrono se non possono acquisire la conoscenza contenuta nelle informazioni di business Ø I sistemi tradizionali analizzano solo dati strutturati Ø Il mancante 80% è costituito da informazioni non strutturate o semi strutturate (Gartner). 200k twitter al minuto Big Data 290 milioni twitter anno 12Tb twitter/giorno 8 2012 IBM Corporation
  • 9. Quando si parla di “data explosion” 83x6,000,000 users on Twitter 500,000,000 users on Twitter pushing out 300,000 pushing out 400,000,000 tweets per day tweets per day 1333x 9 2012 IBM Corporation
  • 10. Approccio Tradizionale e Approccio Big Data 10 2012 IBM Corporation
  • 11. BIG DATAStato dell’arte 11 2012 IBM Corporation
  • 12. Which is the State-of-The-Art? >1100 Business Managers >200 CIOs IBM and the Saïd Business School (on Global Scale) and SDA Bocconi University (on local Scale) partnered to benchmark global big data activities12 www.ibm.com/2012bigdatastudy 12
  • 13. Big Data: lo stato dell’arte 1 Customer analytics are driving big data initiatives Big data is dependent upon a scalable and extensible 2 information foundation Initial big data efforts are focused on gaining insights 3 from existing and new sources of internal data 4 Big data requires strong analytics capabilities The emerging pattern of big data adoption is 5 focused upon delivering measureable business valueIBM e Saïd Business School (Università di Oxford – ricerca globale) e Università SDA Bocconi (Italia) hanno collaborato per un benchmark sulle iniziative Big Data 13
  • 14. Key Findings: Big Data Activities >1000 Business Managers 24% 47% 28%Have Not Begun Big Data Planning Big Data Pilot & Implementation of Activities Activities Big Data Activities 25% 57% 18% >200 CIOs 14
  • 15. IBM Big Data Platform & Ecosystem IBM SOCIAL MEDIA ANALYTICS IBM CONTENT ANALYTICS Out-of-the-Box Social Analytics Out-of-the-Box Text analytics6 Environment Open environment with Enterprise Search 5 Analytic Applications INFOSPHERE BI / Exploration / Functional Industry Predictive Content DATA EXPLORER Reporting Visualization App App Analytics Analytics BI / PURE DATA for Analytics Reporting (VIVISIMO) (NETEZZA) IBM Big Data Platform 24 4 Visualization Application Systems – Optimized Very Large – Search (and federate & Discovery Development Management Data Warehousingdata) in a big data context Accelerators Hadoop Stream Data INFOSPHERE INFOSPHERE System Computing Warehouse STREAMS BIGINSIGHTS 3 1 – Analyse large 1 – Analyse large structured and structured and unstructured data set in unstructured data sets Information Integration & Governance streaming 15
  • 16. Il Data Warehouse e la Business Analytics…. ben si integrano con la BIG DATA platform 1 IBM InfoSphere BigInsights 6 Cognos 5 External Source Systems Applications Structured, Semi Structured/ Unstructured Data Spreadsheets 4Sensors IBM Vivisimo Data Warehouse 2Master DataManagement Cubi Netezza ETL Data Integration 3 Data Quality IBM InfoSphere Streams Data Delivery 16 2012 IBM Corporation
  • 17. L’ecosistema Big Data : la chiave è l’interoperabilità Traditional / Relational Traditional Warehouse Data Sources Data Analytics on Warehouse Structured Data Non-Traditional / Non-Relational Streaming Data Sources Data InfoSphere Analytics on Streams Data In-Motion Non-Traditional/ Non-Relational Data SourcesInternet-Scale Data Sets Traditional/Relational InfoSphere Analytics on BigInsights Data at Rest Data Sources 17 2012 IBM Corporation
  • 18. La piattaforma IBM Big Data: La nuova frontiera di Analisi Data Ingest 01011001100011101001001001001100100100110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 Arricchire 01100100101001001010100010010 Analisi Real Time 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 01100100101001001010100010010 01100100101001001010100010010 Modello 01100100101001001010100010010 01100100101001001010100010010 Analitico 11000100101001001011001001010 01100100101001001010100010010 Adattivo 01100100101001001010100010010 01100100101001001010100010010 11000100101001001011001001010 18 18
  • 19. Analisi Tradizionale estesa ai Big Data 1 Pre-Processing Hub 2 Query-able Archive 3 Exploratory Analysis Data Explorer Combinare dati strutturati con Data Explorer non strutturati BigInsight Find and viewStreams BigInsight BigInsight Streams Information Information Information Server Server Server Data Data Data Warehouse Warehouse Warehouse 19 19 © 2013 IBM Corporation
  • 20. Applicazioni Big Data q Analisi cosa si dice sui Social Media di un argomento q Analisi messaggi Call Center q Analisi dei LOG. q Identificazione delle frodi. q Ricercare dati attraverso un motore federato q Analisi di dati provenienti da sensori q ........Si ricorre ad una soluzione Big Data, ad esempio, quando:- risulta necessario analizzare TUTTI i dati potenzialmente disponibili e quando l’elaborazione di un loro campione non sarebbe significativa e in grado di fornire risultati efficaci.- si vuole ESPLORARE, anche in modo interattivo, i dati disponibili nei casi in cui le misure e gli indicatori di business non siano predeterminati.- occorre analizzare un FLUSSO CONTINUO ed ampio di dati per prendere decisioni in tempo realeIl fenomeno Big Data non è legato ad un particolare settore di industriafa leva sulla crescita del volume dei dati e su ulteriori dimensioni come laVelocità e la Varietà dei dati disponibili. 20
  • 21. Biginsights per elaborare in Vestas optimizesmodo molto veloce Petabytesdi dati capital investments based on 2.5 Petabytes of information. § Model the weather to optimize placement of turbines, maximizing power generation and longevity. § Reduce time required to identify placement of turbine from weeks to hours. § Incorporate 2.5 PB of structured and semi- structured information flows. Data volume expected to grow to 6 PB. 21 21 2012 IBM Corporation
  • 22. Infosphere Streams e Cisco turns to IBM bigBiginsights data for intelligentper la gestione degli ambienti infrastructure management. § Optimize building energy consumption with centralized monitoring and control of building monitoring system. § Automates preventive and corrective maintenance of building systems. § Uses Streams, InfoSphere BigInsights and Cognos § Log Analytics § Energy Bill Forecasting § Energy consumption optimization § Detection of anomalous usage 2222 § Presence-aware energy mgt. 22 2012 IBM Corporation § Policy enforcement
  • 23. Infosphere Streams nel campo medicoBig Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance IBM Data Baby youtube.com 23 23 2012 IBM Corporation
  • 24. Infosphere Streams per la Dublin City Centre Increases ottimizzazione del traffico Bus Transportation Performance Capabilities Utilized: Stream Computing • Public transportation awareness solution improves on-time performance and provides real-time bus arrival info to riders • Continuously analyzes bus location data to infer traffic conditions and predict arrivals • Collects, processes, and visualizes location data of all bus vehicles • Automatically generates transportation routes and stop locations Results: • Monitoring 600 buses across 150 routes • Analyzing 50 bus locations per second • Anticipated to Increase bus ridership 24 24 2012 IBM Corporation24
  • 25. CUSTOMER Analytics – GRUPO BBVA seamlessly monitors and improves its online reputation. - Enables BBVA to consistently respond to and gain insight into customer needs and feedback. “What is great about this solution is that it helps - Gives BBVA the ability to measure the success of its outputs us to focus our actions on the most important and approaches to engaging stakeholders and customers. topics of online discussions and immediately plan the - Shows whether positive or negative sentiments have correct and most increased or not, looks for the source and reason of suitable reaction.” – Online Communication comments and helps make decisions and plans. Department, BBVA Behavioral Data 25
  • 26. CUSTOMER Analytics – MEDIASET.Social Analytics to collect Customer longitudinal pointof views from Web 2.0 and correlate themwith internal data “Big Data is a great opportunity for TV innovation in the next Better understand its marketing campaigns and consumer years. TV viewing is preferences, transforming into a multiplatform and participative experience: Looking for ways to analyze and differentiate consumer the better we know and experiences understand our viewers, the better we can serve them." – Valerio Motti, Helped the client to assess the company’s corporate brands, Head of Marketing with respect to one of its main pay-TV competitors Innovation, Mediaset S.p.A. Trandational Data Behavioral Data 26 26
  • 27. VIVISIMO – referenze 27 2012 IBM Corporation
  • 28. CASEHistory 28 2012 IBM Corporation
  • 29. SUCCESS STORIES : tra le varie fonti…. eccone dueLINK PDF FileRicorda : Recuperare link che contiene questo doc 29 2012 IBM Corporation
  • 30. LINK UTILI 30 2012 IBM Corporation
  • 31. BIG Data : alcuni utili link Big Data HUB & Success Stories http://www.ibmbigdatahub.com/ Big Data University http://bigdatauniversity.com/ BigInsights tec enablement wikihttps://www.ibm.com/developerworks/mydeveloperworks/wi kis/home?lang=en_US#/wiki/BigInsights FREE ebook – Harness the power of BigData http://www.ibmbigdatahub.com/blog/research-director-reflects- new-big-data-book 31 2012 IBM Corporation
  • 32. Mi fermo qui…. grazie per la pazienza 3232 © 2013 IBM Corporation
  • 33. HADOOP &BIGINSIGHTS 33 2012 IBM Corporation
  • 34. Biginsights basato su Hadoop ….. perchè ? CPU istruzioni al secondo – miglioramenti significativi 1990 44 Mips at 40 Mhz 2000 3.562 Mips at 1.2 Ghz 2010 147.600 Mips at 3.3 Ghz RAM Memory - miglioramenti significativi – 1990 640 K – 2000 64 Mb – 2010 8-32 GB Disk capacity - miglioramenti significativi – 1990 20 MB – 2000 10 GB – 2010 1 TB Disk latency (velocità di leggere e scrivere su disco ) - miglioramenti poco significativi Negli ultimi 7-10 anni non ci sono state enormi migliorie correntemente la velocita è di circa 70 – 80 MB / sec 34 2012 IBM Corporation
  • 35. Quanto tempo ci vuole per scandire 1 TB ?q 1 TB (at 80 MB / sec) – 1 disk 3.4 hours – 10 disks 20 min – 100 disks 2 min – 1000 disks 12 secq Per ovviare alla Disc Latency la risposta è la ..elaborazione parallelaq Hadoop : un nuovo modo per memorizzare ed elaborare i dati ØScritto in Java ØProgettato per lavorare su hardware non specializzato ØGira in ambiente Linux ØScalabile, Flessibile,Robusto 35 2012 IBM Corporation
  • 36. What is Hadoop?§ Apache Hadoop = free, open source framework for data- intensive applications – Inspired by Google technologies (MapReduce, GFS) – Yahoo has been the largest contributor to the project (Doug Cutting), – Well-suited to batch-oriented, read-intensive applications – Originally built to address scalability problems of Nutch, an open source Web search technology§ Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner – CPU + disks of commodity box = Hadoop “node” – Boxes can be combined into clusters – New nodes can be added as needed without changing • Data formats • How data is loaded • How jobs are written 36 2012 IBM Corporation
  • 37. Two Key Aspects of Hadoop§ MapReduce framework – MapReduce is a software framework introduced by Google to support distributed computing on large data sets of clusters of computers. – How Hadoop understands and assigns work to the nodes (machines)§ Hadoop Distributed File System = HDFS – Where Hadoop stores data – A file system that spans all the nodes in a Hadoop cluster – It links together the file systems on many local nodes to make them into one big file system 37 2012 IBM Corporation
  • 38. Hadoop ed il paradigma Map Reduce §I dati sono memorizzati su un sistema distribuito di server §Le funzioni elaborative vengono inviate dove ci sono I dati §Ogni server elabora I dati di propria competenza e condivide i risultati §Il sistema può scalare raggiungendo migliaia di nodi e PB di dati public static class TokenizerMapper public static class TokenizerMapper Hadoop Data Nodes extends Mapper<Object,Text,Text,IntWritable> {{ extends Mapper<Object,Text,Text,IntWritable> private final static IntWritable private final static IntWritable one == new IntWritable(1); one new IntWritable(1); private Text word == new Text(); private Text word new Text(); public void map(Object key, Text val, Context public void map(Object key, Text val, Context StringTokenizer itr == StringTokenizer itr new StringTokenizer(val.toString()); new StringTokenizer(val.toString()); while (itr.hasMoreTokens()) {{ while (itr.hasMoreTokens()) word.set(itr.nextToken()); word.set(itr.nextToken()); context.write(word, one); context.write(word, one); }} 1. Map Phase (spezza il job in piccole parti) }} }} public static class IntSumReducer 2. Shuffle public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita extends Reducer<Text,IntWritable,Text,IntWrita Distribute map private IntWritable result == new IntWritable(); private IntWritable result new IntWritable(); public void reduce(Text key, public void reduce(Text key, Iterable<IntWritable> val, Context context){ Iterable<IntWritable> val, Context context){ int sum == 0; int sum 0; for (IntWritable vv :: val) {{ tasks to cluster (riordina I risultati parziali per for (IntWritable val) sum += v.get(); sum += v.get(); .. .. .. le elaborazione finale) 3. Reduce Phase (rielabora il tutto per ottenereMapReduce Application un singolo risultato) Shuffle Result Set Return a single result set 38 2012 IBM Corporation
  • 39. BigInsights estende le capabilities di Hadoop open source con l’aggiunta di nuove funzionalità …. InfoSphere BigInsights Advanced Engines Development Tools Analytic ApplicationsEnterprise Indexing BI / Exploration / Functional Industry Predictive Contentcapabilities Reporting Visualization App App BI / Analytics Analytics Report Connectors ing IBM Big Data Platform Visualization Application Systems Workload Optimization & Discovery Development Management Administration & Security Accelerators Hadoop Stream Data System Computing WarehouseOpen source IBM tested & supported based open source components Information Integration & Governancecomponents 39 2012 IBM Corporation
  • 40. Infosphere BigInsights : due edizioniCon BigInsights le aziende possono indirizzare l’ elaborazione di enormi quantità didati mai prima sfruttate e ricavare nuova conoscenza in modo efficiente, ottimizzato e scalabile. Tale infrastruttura sfrutta il MapReduce framework di Hadoop per affrontare l’elaborazione parallela di grandi insiemi di dati distribuiti su numerosi nodi. 40 40 2012 IBM Corporation
  • 41. Infosphere BigInsights : due edizioni Enterprise Edition GPFS-SNC Native Support* Spreadsheet-style data exploration Job and Workflow Management Productivity and Efficiency Improvements Integration with InfoSphere Warehouse Integration with Netezza Integration with DB2 Large Scale Indexing Basic Edition Text Analytics Machine Learning* Free Download, Easy Installation Tiered Terabyte Pricing 24x7 Web Support, 10TB Limit Paid Support Option * = coming soon 41
  • 42. Biginsights on Cloud 42 2012 IBM Corporation
  • 43. IBM BigInsights on CloudHadoop for everyone 43 2012 IBM Corporation
  • 44. Infosphere Streams 44 2012 IBM Corporation
  • 45. Infosphere Streams InfoSphere Streams dispone di un’infrastruttura software agile e scalabile per l’analisi in tempo reale di enormi flussi di dati in movimento, di qualsiasi natura e provenienti da innumerevoli sorgenti.Tale tipo di elaborazione aumenta la precisione e la velocità del processodecisionale in diversi campi come quelli sanitario, astronomico,manifatturiero, finanziario e molti altri ancora. 45 2012 IBM Corporation
  • 46. Categories of Problems Solved by Streams§ Applications that require on-the-fly processing, filtering and analysis of streaming data – Sensors: environmental, industrial, surveillance video, GPS, … – “Data exhaust”: network/system/web server/app server log files – High-rate transaction data: financial transactions, call detail records§ Criteria: two or more of the following – Messages are processed in isolation or in limited data windows – Sources include non-traditional data (spatial, imagery, text, …) – Sources vary in connection methods, data rates, and processing requirements, presenting integration challenges – Data rates/volumes require the resources of multiple processing nodes – Analysis and response are needed with sub-millisecond latency – Data rates and volumes are too great for store-and-mine approaches 46 2012 IBM Corporation
  • 47. Elaborazione real time time con infosphere streams à continuous ingestion infrastructure provides services for scheduling analytics across h/w nodes à continuous analysis establishing streaming connectivity … Filter Transform Annotate Correlate Classify achieve scale by partitioning applications into components 47 by distributing across stream-connected hardware nodes 2012 IBM Corporation
  • 48. Infosphere Data Explorer (ex VIVISIMO) 48 2012 IBM Corporation
  • 49. Vivisimo e la sua missione Aiuta le organizzazioni a scoprire, organizzare, analizzare e navigaregrandi quantità di dati eterogenei e dinamici, sia strutturati chedestrutturati, indipendentemente dadove siano gestiti o storicizzati, perincrementare l’efficienza ed il valore nei processi di business. 49 2012 IBM Corporation
  • 50. Vivisimo nell’azienda Relational Data § Garantire laccesso a numerose applicazioni e archivi dati File Systems § Scoprire e navigare all’interno di Content Management tutta l’azienda § Fondere informazioni strutturate Email Velocity Platform e non strutturate per guidare l’azienda verso: CRM Application/ – Migliori decisioni Users Supply – Operazioni più efficienti Chain – Migliore comprensione dei clienti ERP – Innovazione Commenting RSS Feeds § Strumenti Social per la Tagging collaborazione ed il riutilizzo Rating Cloud Shared Custom Folders Sources Social Tools External Sources 50 2012 IBM Corporation
  • 51. Vivisimo ricerca federata 51 51 2012 IBM Corporation
  • 52. Vivisimo architettura User Profiles Application SDK Federated Sources Authentication/Authorization Query transformation Personalization Display Subscriptions Feeds Web Results Text Analytics Meta-Data Search Engine Thesauri Faceting Clustering BI Ontology Support Tagging Semantic Processing Taxonomy Entity Extraction Collaboration Relevancy Connector Framework CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems 52 2012 IBM Corporation
  • 53. CUSTOMER Analytics esempi.. 53 2012 IBM Corporation
  • 54. CUSTOMER Analytics - alcuni esempi .. Deeper Customer Analytics Examples and Best Practice and leverage Big Data: Ready for Business Behavioral Data Connect with Clients & prospects, with Brands ...analyse strong and weakDelight customers with targeted signals in discussion ….social and transactional propositions You Real time interaction Interaction Data across channels Transaction Interact! Data Single view Business Data, Social Data, Interactive data Enterprise Systems 54
  • 55. CUSTOMER Analytics – MOBY Lines .Digital marketing optimization: lifetime individualtracking, microsegmentation, channel attribution,proposition automation Intuitive Social collection Digital & Multichannel Marketing / individual digital analytics, real time Single view Business Data, monitoring, I/O ERP data, Social Data, Interactive data dynamic segments, mkt. automation 55 Enterprise Systems
  • 56. CUSTOMER Analytics – GARANTY bank – un filmato..Garanty Real time interaction across channels Single view Business Data, Social Data, Interactive data 56

×