EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise


Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

  1. 1. Bringing Big Data to the EnterpriseDipl.Ing.W olfgang NimfuehrInformation Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe7 June 2012wolfgang.nimfuehr@at.ibm.com © 2012 IBM Corporation
  2. 2. Legal Disclaimer © IBM Corporation 2012. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Information regarding potential future products is intended to outline our general product direction and it should not b e relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal ob ligation to deliver any material, code or functionality. Information about potential future products may not b e incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.2 © 2012 IBM Corporation
  3. 3. The Information Explosion in Data and Real World Events 44x as much Data and Content 2020 35 zettabytes Business leaders frequently Over Coming Decade 1 in3 make decisions based on information they don’t trust, or don’t ha ve 2009 800,000 petabytes 1 in2 Business leaders say they don’t have access to the information they need to do their jobs 80% of CIOs cited “Business Of world’s data is unstructured 83% intelligence and analytics” as part of their visionary plans to enhance competitiveness of CEOs need to do a better job 60% capturing and understanding information rapidly in order to make swift business decisions Organizations Need Deeper Insights3 3 © 2012 IBM Corporation
  4. 4. ChallengeStudy a Large Volume and Variety of Data to Find New Insights Multi-channel customer sentiment and experience a analysis Support medical diagnostics Detect life-threatening conditions Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Make risk decisions and frauds detection based on real-time transactional data Identify criminals and threats from disparate video, audio, and data feeds 4 © 2012 IBM Corporation
  5. 5. Leveraging Big Data Analytics can improve Experience … Client Mgr Data Scientist Dashboards Call Center …Information Management Capabilities Natural Language External Data Internal Data • Web Logs • Relationship / risk • Event triggers • Twitter feeds data • Customer Profitability • Facebook chats • Product analysis • YouTube Video profitability data • Complaint Data • Blogs/Posting Big Data • Email • Voice to Te xt Data • Appraisal data Analytics correspondents • Transactional data • Company website • Policy & Procedure • Credit bureau data Hub logs data 5 © 2012 IBM Corporation
  6. 6. On Feb 16 2011 the IBM Watson system won Jeopardy! Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context andretrieving, analyzing and understanding vast amounts of information in real-time? 6 © 2012 IBM Corporation
  7. 7. IBM Watson‘s project started 2007• Project started in 2007, lead David Ferrucci• Initial goal: create a system able to process natural language & extract knowledge faster than any other computer or human• Jeopardy! was chosen because it’s a huge “IBM is not in the entertainment challenge for a computer to find the questions business. But we are in the business of to such “human” answers under time pressure technology and pushing frontiers.” David Shepler, IBM Research Program Manager• Watson was NOT online!• Watson weighs the probability of his answer being right – doesn’t ring the buzzer if he’s not confident enough• Which questions Watson got wrong almost as interesting as which he got right! 7 © 2012 IBM Corporation
  8. 8. Different Types of Evidence: Keyword Evidence In May 1898 Portugal celebrated In May, Gary arrived in the 400th anniversary of this India after he celebrated his explorer’s arrival in India. anniversary in Portugal. arrived in celebrated Keyword Matching Keyword Matching celebrated In May Keyword Matching Keyword Matching In May 1898Evidence 400th Keyword Matching anniversarysuggests “Gary” anniversary Keyword Matchingis the answerBUT the system Portugal Keyword Matching Keyword Matching in Portugalmust learn thatkeyword arrival inmatching maybe weak relative India Keyword Matching Keyword Matching Indiato other types ofevidence explorer Gary 8 © 2012 IBM Corporation
  9. 9. Different Types of Evidence: Deeper Evidence In May 1898 Portugal celebrated On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama On 27th May 1498, Vasco da Gama the 400th anniversary of this On landedin Kappad Beach Vasco da landed in of May Beach the in th Kappad 1498, landed 27Kappad Beach explorer’s arrival in India. Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Find Judge Evidence landed in Portugal Many inference algorithms Temporal May 1898 400th anniversary 27th May 1498 Reasoning Date Math arrival StatisticalStronger in Paraphrasing Para-evidence can phras es GeoSpatialbe much India Reasoning Kappad Beachharder to find Geo-KBand score. explorer Vasco da Gama 9 The evidence is still not 100% certain. © 2012 IBM Corporation
  10. 10. DeepQA:Massively Parallel Probabilistic Evidence-Based ArchitectureQuestion 1000’s of 100,000’s scores from many simultaneous 100s Possible Pieces of Evidence 100s sources Text Analysis Algorithms Answers Multiple InterpretationsQuestion & Final Confidence Question Hypothesis Hypothesis and Topic Synthesis Merging & Decomposition Generation Evidence Scoring Analysis Ranking Hypothesis Hypothesis and Evidence Generation Scoring Answer & Confidence ... 10 © 2012 IBM Corporation
  11. 11. Maximum Benefit Requires Combining Deepand Reactive Analytics Hypotheses Predictions Real time Optimization 100,000 updates/sec, 5 ms/decision Exa Round-trip automation Deep Deep 10 PB f or Deep AnalyticsAnalytics Peta History Predictive Analytics 100,000 records/sec, 6B/day 10 ms/decision 6 PB f or Deep Analytics Feedback Data Scale Tera nio In Smart Traffic ra t te 250K GPS probes/sec g Reality Actions g ra Inte 630K segments/sec tio n Giga 2 ms/decision, 4K vehicles DeepQA Fast Traditional Data 100s GB for Deep Analytics Mega Warehouse and 3 sec/decision 1 PB training corpus Business Integration Intelligence Observations Kilo Reactive yr mo wk day hr min sec … ms µs Analytics Occasional Frequent Real-time 11 Decision Frequency © 2012 IBM Corporation
  12. 12. Big Data use cases across all industries Financial Services Utilities Fraud detection Weather impact analysis on Risk management power generation 360° View of the Customer Transmission monitoring Smart grid management Transportation IT Weather and traffic Transition log analysis impact on logistics and for multiple fuel consumption transactional systems Cybersecurity Health & Life Sciences Epidemic early warning Retail system 360° View of the Customer ICU monitoring Click-stream analysis Remote healthcare monitoring Real-time promotions Telecommunications Law Enforcement CDR processing Real-time multimodal surveillance Churn prediction Situational awareness Geomapping / marketing Cyber security detection Network monitoring 12 © 2012 IBM Corporation
  13. 13. Monetizing Relationships - not just Transactions Calling Network Merged Network company Telco Amy Bearn 32, Married, mother of 3, How v aluable is Amy to my mobile phone network? How likely is she to Accountant switch carriers? How many other Telco Score: 91 customers will f ollow CPG Score: 76 Fashion Score: 88 Retail Telco How v aluable is Amy to my retail sales? Who does she influence? Social Network Public What do they spend? Database 13 © 2012 IBM Corporation
  14. 14. °Sample: Big Data 360°Lead GenerationPersonal Attributes Personal Attributes• Identifiers: name, address, age, gender, • Identifiers: name, address, age, gender,occupation… occupation… Timely Insights Timely Insights• Interests: sports, pets, cuisine… • Intent to buy various products • Interests: sports, pets, cuisine… • Intent to buy various products• Life Cycle Status: marital, parental • Current Location • Life Cycle Status: marital, parental • Current Location Social Media based • Sentiment on products, services, campaigns • Sentiment on products, services, campaigns 360-degree • Incidents damaging reputation • Incidents damaging reputation Consumer Profiles • Customer satisfaction/attrition • Customer satisfaction/attritionLife Events Life Events• Life-changing events: relocation, having a • Life-changing events: relocation, having ababy, getting married, getting divorced, buying baby, getting married, getting divorced, buyinga house… a house… Products Interests Products Interests • Personal preferences of products • Personal preferences of products • Product Purchase history • Product Purchase historyRelationships Relationships • Suggestions on products & services • Suggestions on products & services• Personal relationships: family, friends and • Personal relationships: family, friends androommates… roommates…• Business relationships: co-workers and • Business relationships: co-workers andwork/interest network… work/interest network…Monetizable intent to buy products Life Events I need a new digital camera for my food pictures, any College: Off to Stanford for my MBA! Bbye chicago! I need a new digital camera for my food pictures, any College: Off to Stanford for my MBA! Bbye chicago! recommendations around 300? recommendations around 300? Looks like well be moving to New Orleans sooner than I thought. What should I buy?? A mini laptop with Windows 7 OR a Apple Looks like well be moving to New Orleans sooner than I thought. What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??! MacBook!??! Intent to buy a house Location announcements Im thinking about buying a home in Buckingham Estates per a Im thinking about buying a home in Buckingham Estates per a Im at Starbucks Parque Tezontle http://4sq.com/fYReSj recommendation. Anyone have advice on that area? #atx #austinrealestate 14 at Starbucks Parque Tezontle http://4sq.com/fYReSj Im recommendation. Anyone have advice on that area? #atx #austinrealestate © 2012 IBM Corporation #austin #austin
  15. 15. °Sample: Big Data 360°Lead Generation Real-time product Real-time product intents enriched with intents enriched with consumer attributes consumer attributes Entries contain promotional messages, Entries contain promotional messages, wishful thinking, questions, etc wishful thinking, questions, etc Integration across Social Media sites Integration across Social Media sitesMicro-segmentation of Micro-segmentation of product intents by product intents by Real-time tracking by occupation Real-time tracking by occupation micro-segmentation micro-segmentation For many of the attributes we need to extract, For many of the attributes we need to extract, cleanse, normalize and categorize cleanse, normalize and categorize Micro-segmentation of Micro-segmentation of consumers by hobbies consumers by hobbies 15 © 2012 IBM Corporation
  16. 16. Sample: Institutional Risk ApplicationComprehensive view of publicly traded companies and relatedpeople based on regulatory filings Extract Integrate16 © 2012 IBM Corporation
  17. 17. Requirements for a Big Data Solution Platform Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Multiple relational & non-relational data types and schemas Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries 17 © 2012 IBM Corporation
  18. 18. IBM Big Data Platform for Ingest, Data and Analytics Analytic Applications BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App Analytics Analytics New analytic applications drive the requirements for a big data platform IBM Big Data Platform • Integrate and manage the full variety, velocity and volume of data Visualization Application Systems & Discovery Development Management • Apply advanced analytics to information in its native form • Visualize all available data for ad- Accelerators hoc analysis • Development environment for Hadoop Stream Data building new analytic applications System Computing Warehouse • Workload optimization and scheduling • Security and Governance Information Integration & Governance 18 © 2012 IBM Corporation
  19. 19. Big Data CapabilitiesBig Data Challenges IBM Big Data Solutions • High volume of structured data • Valuable Information IBM Netezza Analytic appliance for high SQL Data • Compute intensive analytics speed, advanced analytics on • Low latency response on queries large structured data sets • Business Intelligence and Analytics • Understanding the customer through segmentation and analysis • Very high volumes (TBs to PBs) IBM BigInsights NoSQL Data unstructured data Hadoop-based processing for • Exploration and discovery analytics on variety and • Text, Entity and Social Media volumes of data Analytics • Real time processing Streaming • Detect failure patterns IBM Streams • High volume, low latency Low latency analytics for processing streaming data • Scoring and decision analytics 19 © 2012 IBM Corporation
  20. 20. InfoSphere BigInsightsAnalytical platform for Big Data at-rest Based on open source & IBM Analytic Applications technologies BI / Exploration / Functional Industry Predictiv e Content Reporting Visualization App App Analytics Analytics Distinguishing characteristics • Built-in analytics enhances business IBM Big Data Platform knowledge Visualization Application Systems • Enterprise software integration & Discovery Development Management complements and extends existing capabilities Accelerators • Production-ready platform with tooling for analysts, developers, and administrators Hadoop Stream Data speeds time-to-value and simplifies System Computing Warehouse development/maintenance IBM advantage • Combination of software, hardware, services and advanced research Information Integration & Governance 20 © 2012 IBM Corporation
  21. 21. InfoSphere BigInsightsEmbrace and Extend Hadoop Analytics BigSheets Text Analytics ML Analytics *) Interface Management Console Application (browser based) Pig Hive Jaql Avro IBM LZO Compression Zookeeper MapReduce AdaptiveMR FLEX BigIndex Developing Tooling (Eclipse Plug-Ins) Oozie Lucene Rest API Storage HBase (for Applications) HDFS GPFS-SNC *) Data Streams Netezza BoardReader R IBM Sources/ Open Source Data Stage DB2 CSV/XML/JSON SPSS Connectors Flume JDBC Web Crawler *) future release 21 © 2012 IBM Corporation
  22. 22. BigSheetsA visual tool for data manipulation and prototyping • Ad-hoc analytics for LOB user • Analyze a variety of data - unstructured and structured • Spreadsheet metaphor for exploring/ visualizing data • Browser-based 22 © 2012 IBM Corporation
  23. 23. Text AnalyticsTurns disparate words into measurable insights Physically Identify positive or Reporting/Monitoring assemble data, Part-of-speech negative sentiment, Iterative social commentary, standardize identification, standard NLP-based classification using combination w /structured form ats, address and custom ized analytics, define autom ated and data, clustering, auto-identify extraction dictionaries, variables, m acros m anual techniques. associated concepts, language, process proper noun and rules. Concept derivation & correlated concepts, auto- punctuation and identification, concept inclusion, semantic classification of non-gramm atical categorization, networks and co- documents, sites, posts. characters, synonyms, exclusions, occurrence rules standardize m ulti-terms, regular spelling. expressions, fuzzy- m atching Pre-configured text annotators ready for distributed processing on Big Data Support for native languages including double-byte 23 © 2012 IBM Corporation
  24. 24. Public wind data is available on 284km x 284 km grids (2.5o LAT/LONG) More data means more accurate and richer models (adding hundreds of variables) - Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term - Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations Reduced turbine placement identification from weeks to hours Perspective: The Vestas Wind library 24 24 © 2012 IBM Corporation24
  25. 25. InfoSphere StreamsAnalytical platform for Big Data in-motion Analytic Applications BI / Exploration / Functional Industry Predictiv e Content Reporting Visualization App App Analytics Analytics Built to analyze data in motion • Multiple concurrent input streams IBM Big Data Platform • Massive scalability Visualization Application Systems & Discovery Development Management Process and analyze a variety of Accelerators data • Structured, unstructured content, video, Hadoop Stream Data audio System Computing Warehouse • Advanced analytic operators Information Integration & Governance 25 © 2012 IBM Corporation
  26. 26. InfoSphere StreamsMassively Scalable Stream Analytics Linear Scalability Deployments Clustered deployments – unlimited Source Analytic Sync scalability Adapters Operators Adapters Automated Deployment Automatically optimize operator deployment across clusters Streams Studio IDE Performance Optimization Automated and Optimized JVM Sharing – minimize memory use Deployment Fuse operators on Streaming Data Streams Runtime Sources same cluster Telco client – 25 Million Visualization messages per second Analytics on Streaming Data Analytic accelerators for a variety of data types Optimized for real-time performance 26 © 2012 IBM Corporation
  27. 27. University of Ontario Institute of Technology Use case – Neonatal infant monitoring – Predict infection in ICU 24 hours in advance Solutions – 120 children monitored :120K msg/sec, billion msg/day – Trials expanding to include hospitals in US and China Event Pre- Analysis processer Framework Sensor Stream-based Distributed Interoperable Solutions Network Health care Infrastructure (Applications) 27 © 2012 IBM Corporation
  28. 28. Without a Big Data Platform You Code… Over 100 sample applications and toolkits with industry focused toolkits with 300+ functions and operators Event Custom SQL Handling and Scripts Multithreading Check Application Pointing M anagement Accelerators Streams provides development, deployment, HA and Tool kits runtime, and infrastructure services Performance Debug Connectors Optimization Security “TerraEchos developers can deliver applications 45% faster due to the agility of Streams Processing Language…” – Alex Philip, CEO and President, TerraEchos 28 © 2012 IBM Corporation
  29. 29. IBM is Committed to Innovation 2012 IBM Resarch Selected SW Acquisitions Almaden Austin Melbourne Sao Paulo Beijing Haif a Delhi Ireland Y amato Watson Zurich • •$16B+ in acquisitions since 2005 $16B+ in acquisitions since 2005 • •10,000+ technical professionals 10,000+ technical professionals • •~8000 dedicated consultants ~8000 dedicated consultants • •27,000+ business partner 27,000+ business partner certifications certifications • •88 Analytics SolutionsCenters Analytics Solutions Centers • •100 analytics-based research assets; 100 analytics-based research assets; almost 300 researchers almost 300 researchers “Watson is going to revolutionize many, many industries and it will fundamentally change the way we interact with computers & machines.” John Kelly, SVP & Head of IBM Research2005 * TeaLeaf, Varicent Vivismo pending acquisition close 29 © 2012 IBM Corporation
  30. 30. Making Learning Easy and Funbigdatauniversity.com/ ibm.com/software/data/bigdata/ ibm.com/software/data/infosphere/biginsights/ youtube.com/user/ibmbigdata30 © 2012 IBM Corporation
  31. 31. Questions & Answers Dipl.Ing. IBM Austria Wolfgang Nimführ Obere Donaustrass e 95 A1020 Vienna Information Agenda Executive Consultant Tel +43-664-618-5389 Big Data Tiger Team wolfgang.nimfuehr@at.ibm.com IBM Software Group Europe31 © 2012 IBM Corporation