VINT Symposium 2012: Recorded Future | Dirk de Roos (IBM)


Published on

Big Data Stories

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We have been working with an Indian telco client for some time now to help reduce their billing costs and improve customer satisfaction. Challenge: Call Detail Record (CDR) processing within their data warehouse was sub-optimal, Could not achieve real time billing which required handling billions of CDRs per day and de-duplication against 15 days worth of CDR data Unable to support for future IT and Business with real-time analytics Solution: Single platform for mediation and real time analytics reduces IT complexity The PMML standard is used to import data mining models from InfoSphere Warehouse. Offloaded the CDR processing to InfoSphere Streams resulting in enhanced data warehouse performance and improved TCO Each incoming CDR is analyzed using these data mining models, allowing immediate detection of events (ex: dropped calls) that might create customer satisfaction issues. Business Benefit: Data now processed at the speed of Business - from 12 hours to 1 second HW Costs reduced to 1/8th Support for future growth without the need to re-architect, more data, more analysis Platform in-place for real-time analytics to drive revenue
  • IBM has been working with one of the leading non-profit research institutes leading a regional project to prove the viability and benefits of smart grid technology and test the concept of demand-based electrical power pricing Background: The project is the largest initiative of its kind in the US and is designed to test and quantify smart grid costs and benefits with over 60,000 consumers in five states - Washington, Oregon, Idaho, Montana and Wyoming. The smart grid technique uses an incentive and a feedback signal to help coordinate smart grid resources. The two-way communication of this information - from power source to destination - allows intelligent devices and consumers to make smart decisions about using this energy. The requirements of the project call for a robust infrastructure that facilitates two-way data flow and computing power capable of continuously processing petabytes of data. Solution: IBM is building the infrastructure to disseminate the project ’ s transactive incentive signal and interlace it with the participants ’ responsive assets. The solution consists of: - IBM streams computing software running on IBM x86 servers to allow for the effective streaming of data - IBM data warehouse appliance provide to analyze and understand the project data (up to 10 petabytes) in minutes Benefits: • Enabled a town to avoid a power outage by using a two-way advanced meter system to shut off home water heaters during peak periods, reducing strain on an unreliable underwater cable • Empowers consumers to make educated choices about how and when to use electricity, and at what price • Increases grid efficiency and reliability through system self-monitoring and feedback
  • Most of you know of Watson, our computing system designed to compete on the Jeopardy game show. Watson represents a breakthrough in terms of volume of information stored, and the ability to access it quickly (answering natural language questions). I think Watson is impressive, because there are many commercial uses for this technology – and the technology exists today! The game Jeopardy provides the ultimate challenge for Watson because the game’s clues involve analyzing subtle meanings, irony, riddles, and other complexities in which humans excel and computers traditionally do not. If you think about Deep Blue, the 1997 IBM machine that defeated the reigning world chess champion, Watson is yet another major leap in capability of IT systems to identify patterns, gain critical insight and enhance decision-making despite daunting complexities. While Deep Blue was amazing, it was an achievement of the application of compute power to a computationally well-defined and well-bound game: Chess. Watson, on the other hand, faces a challenge that is open-ended, defies the well-bounded mathematical formulation of a game like Chess. Watson has to operate in the near limitless, ambiguous, and high contextual domain of human language and knowledge.   Watson answers a Grand Challenge: Can IBM design a computing system that rivals a human’s ability to answer questions posed in natural language by interpreting meaning and context and then retrieving, analyzing and understanding vast amounts of information in real-time? IBM Watson is a breakthrough in analytic innovation, proving that it is possible to harness vast amounts of information and rival a human’s ability to answer questions posted in natural language in real-time. But it doesn't matter how good the machine is if we don’t have good information to feed it. We live in a time where a computer can compete against humans at answering questions in plain English, based on storing, retrieving, analyzing and understanding vast amounts of information at real-time speeds. These same capabilities can enable you to improve and optimize your business, too. IBM just showed the value of putting that information to work by creating a computing system capable of competing on Jeopardy Well there ’ s a lot of technology that went into Watson – and a lot of Big Data technology in there as well. Now take a moment and think about how this iconic game show is played: you have to answer a question within three seconds. The technology used to analyze and return answers in Watson was a pre-cursor to the Streams technology, in fact, Streams was invented because that technology used in Watson wasn’t fast enough for some of the in-motion requirements needed by companies today. Jeopardy questions are not straight forward, they have pun and tricks to make them harder – so some of our text analytic technology with natural language processing, which is part of the IBM Big Data platform, is in there too (that ’ s yet another MAJOR DIFFERENTIATOR for IBM in Big Data: our Text   Analytic Toolkit, which you will hear more about later in this presentation). It wasn’t always smooth sailing for Watson, the big breakthrough came when they started to use machine learning (ML), and the IBM Big Data platform will further differentiate itself from the field in 2012 when a corresponding toolkit came to market just like the text analytics toolkit. Finally, Watson had to have access to a heck of a lot of data – and Big Data technologies were used to load and index over 200 million pages of data; Watson had everything from encyclopedias, to the bible, to the world famous music and movie databases, etc.   All these technologies mentioned in the previous paragraph had to work together as well. So IBM clearly has some inflection point understanding of these technologies and how to get them working together. In the case of the text analytics and machine learning – well we have to make that easier to consume because you don ’ t have the world ’ s largest commercial research organization for math at your fingertips. So we need to build tooling, and optimization, and accelerators around that and put these technologies inside consumable toolkits: which are we doing now.
  • In order to know we are making progress on scientific problems like open-domain QA well-defined challenges help demonstrate we can solve concrete & difficult tasks. As you might know Jeopardy! Is a long-standing, well-regarded and highly challenging Television quiz show in the US that demands human contestants to quickly understand and answer richly expressed natural language questions over a staggering array of topics. The Jeopardy! Challenge uniquely provides a palpable, compelling and notable way to drive the technology of Question Answering along key dimensions If you are familiar with the quiz show it asks an I incredibly broad range of questions over a huge variety of topics. In a single round there is a grid of 6 Categories and for each category 5 rows with increasing $ values. Once a cell is chosen by 1 of three players, A question, or what is often called a Clue is revealed. Here you see some example questions. <read some of the questions>   Jeopardy uses complex and often subtle language to describe what is being asked.   To win you have to be extraordinarily precise. You must deliver the exact answer – no more and no less – it is not good enough for it be somewhere in the top 2, 10 or 20 documents – you must know it exactly and get it in first place – otherwise no credit – in fact you loose points.   You must demonstrate Accurate Confidences -- That is -- you must know what you know – if you “buzz –in” and then get it wrong you lose the $$ value of the question.   And you have to do this all very quickly – deeply analyze huge volumes of content, consider many possible answers, compute your confidence and buzz in – all in just seconds. As we shall see compete with human champions at this game represents a Grand Challenge in Automatic Open-Domain Question Answering. <STOP> <NEXT SLIDE>
  • 01/18/12 IOD2011 4/9/12 GS302_ManojSaxena_v7
  • Main point: At the core of what makes Watson different are three powerful technologies - natural language, hypothesis generation, and evidence based learning. But Watson is more than the sum of its individual parts. Watson is about bringing these capabilities together in a way that ’s never been done before resulting in a fundamental change in the way businesses look at quickly solving problems Further speaking points: . Looking at these one by one, understanding natural language and the way we speak breaks down the communication barrier that has stood in the way between people and their machines for so long. Hypothesis generation bypasses the historic deterministic way that computers function and recognizes that there are various probabilities of various outcomes rather than a single definitive ‘right’ response. And adaptation and learning helps Watson continuously improve in the same way that humans learn….it keeps track of which of its selections were selected by users and which responses got positive feedback thus improving future response generation Additional information : The result is a machine that functions along side of us as an assistant rather than something we wrestle with to get an adequate outcome
  • Challenge Reduce the occurrence of high cost Congestive Heart Failure (CHF) readmissions by proactively identifying patients likely to be readmitted on an emergent basis. Solution Seton Healthcare is a not-for-profit organization, the Seton Family is the leading provider of healthcare services in Central Texas, serving an 11-county population of 1.9 million Target and understand high-risk CHF patients for care management programs using natural language processing. Used predictive models that have demonstrated high positive predictive value against extracted structured and unstructured data Results Proactively targeted care management which will reduce re-admission of CHF patients. Identified patients likely for re-admission and introduced early interventions which will reduce cost, mortality rates, and improve patient quality of life. Background Seton Healthcare is a not-for-profit organization, the Seton Family is the leading provider of healthcare services in Central Texas, serving an 11-county population of 1.8 million. Seton Healthcare identified an opportunity to significantly reduce the occurrence of high cost CHF readmissions by proactively identifying patients likely to be readmitted on an emergent basis. Objectives Seton will partner with IBM to implement content and predictive analytics to identify patients who should receive proactive medical case management and intervention. The expectation is that Seton can reduce the occurrence of costly readmissions, mortality rates and improve the quality of life for these patients. Project Description CHF prevention and reduced re-admission is a main focuses of Seton’s Clinical Design Center. The key clinical, financial, and contextual data for CHF patients span many applications and are stored in both structured and unstructured content. To achieve the Design Center objectives, the following capabilities are needed: Integrate these data into longitudinal patient records Identify important information in the unstructured data Develop predictive models that show Likelihood of readmission Likelihood of ambulatory-sensitive ED visits and admissions Forecasted next year costs Display predictive model results along with aggregated patient record data in an visual, easily-navigable system IOD2011_BA KEYNOTEIBM IOD 2011 06/19/12 D1_BA Keynote_v4
  • VINT Symposium 2012: Recorded Future | Dirk de Roos (IBM)

    1. 1. Big Data StoriesDirk deRoosIBM World Wide Technical Sales, IBM Big Data @Dirk_deRoos © 2012 IBM Corporation
    2. 2. Harnessing the Largest Predictive Focus Group in the World Purpose • Understand public sentiment towards an event: movie trailers • Deeply understand the potential customer profile: gender, occupation, intent to watch • Alter marketing launch plans based on insight Background • 1.1 Billion Tweets analyzed • 5.7 Million blogs/forum posts • 3.5 million messages • Also: Facebook, Google+, Tumblr, Flickr2 © 2012 IBM Corporation
    3. 3. Media & Entertainment Social Media Analytics3 © 2012 IBM Corporation
    4. 4. Asian telco reduces billing costs and improves customer satisfaction Real-time mediation and analysis of 6B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues (e.g. dropped calls) impacting customer satisfaction4 © 2012 IBM Corporation
    5. 5. Pacific Northwest Smart GridDemonstration ProjectCapabilities: Stream Computing – real-time control system Deep Analytics Appliance – analyze massive data setsDemonstrates scalability from 100to 500K homes while retaining 10years’ historical dataAccommodates ad hoc analysis of pricefluctuation, energy consumption profiles,risk, fraud detection, grid health, etc. 5 © 2012 IBM Corporation
    6. 6. Watson’s advanced analytic capabilities can sort through the equivalent of6 200 MILLION pages of data to uncover an answer in 3 SECONDS.Corporation © 2012 IBM
    7. 7. The Jeopardy! Challenge – Question Answering Solution Broad/Open $200 $1000 Domain If youre standing, its The first person the direction you should mentioned by name in ‘The Man in the Iron look to check out the Complex wainscoting. Mask’ is this hero of a previous book by the Language same author. High Precision $2000 $600 Of the 4 countries in the The colorful nickname Accurate world that the U.S. does for the country whose not have diplomatic Confidence elite footballers let their relations with, the one egos prevent them that’s farthest north from winning High Speed7 © 2012 IBM Corporation
    8. 8. Brief History of IBM Watson IBM Jeopardy! Watson Watson Watson Research Grand for for Financial Industry Project Challenge Healthcare Services Solutions (2006 – ) (Feb 2011) (Aug 2011 –) (Mar 2012 – ) (2012 – ) Cross-industry Expansion Applications Commercialization Demonstration From inspiration and invention, through innovation and R&D industrialization, ending with industry transformation.8 © 2012 IBM Corporation
    9. 9. IBM Watson brings together a set of transformationaltechnologies to drive optimized outcomes 2 Generates and evaluates hypothesis for1 Understands better outcomes natural 99% language and 60% 10% human speech 3 Adapts and Learns from …built on a massively parallel user selections probabilistic evidence-based and responses architecture optimized for POWER79 © 2012 IBM Corporation
    10. 10. Healthcare industry is beset with some of the most complexinformation challenges we collectively face Medical information is doubling every 5 years, much of which is unstructured 81% of physicians report spending 5 hours or less per month reading medical journals “Medicine has become too complex (and only) about 20% of the knowledge clinicians use today is evidence-base.” Steven Shapiro, Chief Medical & Scientific Officer, UPMC10 Source: International Journal of Circumpolar Health, © 2012 IBM Corporation, Institute for Medicine"
    11. 11. IBM and Seton Health put Ready for Watson to work • Proactively target care management that reduces re-admission of congestive heart failure patients • Improve patient quality of life, reduce cost and mortality rates • Analyze unstructured data (e.g., physician notes) and provide an integrated view of clinical and operational information • Analysis revealed: – 18 top indicators determined – 2 key re-admission factors11 © 2012 IBM Corporation
    12. 12. 12 © 2012 IBM Corporation