Unlock Hidden Potential through Big Data and Analytics


Published on

A presentation by Intel CIO Kim Stevenson @kimsstevenson "Unlock Hidden Potential through Big Data and Analytics." Includes the drivers behind big data and SMAC (social, mobile, analytics and cloud) and how the business value being created at Intel through advanced analytics and using BI as a competitive advantage.
Venue: NOAA Feb. 24, 2014

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Advanced analytics on top of Big data is Hot Topic some might even say an over used and abused topic. But the reality we all deal with is that our organizations have experienced tremendous growth in data whether it is your organization data, your customers data, citizen data or event data. The question you have to ask is do we really get the most value out of this data? The answer is …We’re starting to. At NCAR, they are just finishing up an experiment where they’ve collected atmospheric data 7 miles above the earth to help determine if this will improve the forecasting of severe weather.
  • In fact, we have already seen great improvement in predictive models because of greater data and processing ability. The EF-5 Tornado that hit Moore Oklahoma in May had winds exceeding 200MPH and was on the ground for 17 miles. Due to the predictive model, the town had 36 minutes of warning before the tornado hit where the average prior to this was 12 minutes. 24 minutes is the difference between life and death. Although, this storm did horrific damage and many lost their lives, it would have been worse without the predictive capabilities due in large part to data analysis. The NCAR experiment (MPEX) strives to make this warning period even greater by understanding the atmospheric data miles above the earth. Advanced analytics on top of Big data use cases are abundant across commercial enterprises and government organizations.I’ll discuss how to unlock the potential of big data and analytics thru industry examples and specifics of what we are doing at Intel.
  • We talked about tornados, but lets get closer to home. While preparing for this keynote we pulled the 10 day forecast for New Orleans on 2/20/14. How accurate is the forecast? How often is a 5 or 10 day weather forecast still inaccurate? What methods are being used to predict that weather? The business of weather predictions is changing. And the opportunities abound for companies to use weather predictions to help them understand business projections. A third of U.S. commerce is sensitive to the weather,” according to Bill Pardue, head of Weather Analytics, a new firm providing predictive weather modeling. “With modeling based on truly global data, companies are in a position to make better decisions for business”. These are not the types of informational models that would have been possible until recently.- Placeholder for latest and play by ear 
  • UPS: Optimized and Automated their entire Package Flow system. the "last mile" in its delivery network - UPS developed a suite of package flow technologies and business processes that use smart labels to capture information about the package before it gets to the center.  Using historical, forecasted and exceptions information, package flow technologies create a dispatch plan for every driver working out of the package distribution center. The system helps package center management ensure that drivers are not over-dispatched and that last minute load changes to a driver's package car are minimized. They use this system to avoid things like left hand turns, red light wait times, traffic issues, -Automated routing saves 20-30 million miles and ~3 million gallons of gas saved and 30 metric tons of CO2 reduction. Obama:  Data and Micro-Targeting for the 2012 election.  Many claim this was his differentiator.His team started on day 1 of his first election to create personalized voter profiles.  This included voter history, demographic data, and preferences to identify “persuadables” to target.  They linked voter files to zip codes and individuals within households to target for people for early voting engagements.They also used data to raise campaign funds and to determine where to spend campaign dollars (based on their target audiences).  GE:  Vision to fix or replace products before they break.  They predict part failure rates and send preventative maintenance crews out to anticipate issues.    http://blogs.wsj.com/cio/2012/11/29/ge-ceo-jeff-immelt-says-analytics-next-holy-grail/The products they are creating are targeted at the airline, railroads, hospitals, manufacturing and energy companies to operate more efficiently by analyzing data collected by networks of sensors.  The company is investing $1 billion into these services/products.Analytics is not a new concept. Analytics based on your company’s structured data (supply demand, customer order history, workforce statistics, etc) is the foundation for running an effective business today. We’ve entered a new era where data has exploded and is accessible creating new industries and competitive advantage for the early adopters. This era has a fundamentally different computing model: THE SMAC STACK.
  • As we look around many examples are emanating from different walks of life across public, private and research opportunities where the new paradigm of not just storing the data but processing and analyzing are solving big problems and thus providing big opportunities. You can see this in As Tim Reilly famously said, “data is the source of competitive advantage” – search or retail – broadly applied to many applications and opportunitiesRepresents another platform opportunity for Intel.Transition: Our StrategyNOTES:Tim’s statement is that every significant internet application to date has been backed by a specialized database. Amazons database of products, Googles Web Crawl, Mapquests map database, Napsters Song Database.Quotes: In the internet era, one can already see a number of cases where control over the database has led to market control and outsized financial returns. The monopoly on domain name registry initially granted by government fiat to Network Solutions (later purchased by Verisign) was one of the first great moneymakers of the internet. While we've argued that business advantage via controlling software APIs is much more difficult in the age of the internet, control of key data sources is not, especially if those data sources are expensive to create or amenable to increasing returns via network effects.The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.FULL TEXT - “Data is the Next Intel Inside” – Tim O’Reilly – 2009 (2005?)Every significant internet application to date has been backed by a specialized database: Google's web crawl, Yahoo!'s directory (and web crawl), Amazon's database of products, eBay's database of products and sellers, MapQuest's map databases, Napster's distributed song database. As Hal Varian remarked in a personal conversation last year, "SQL is the new HTML." Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as "infoware" rather than merely software. This fact leads to a key question: Who owns the data?In the internet era, one can already see a number of cases where control over the database has led to market control and outsized financial returns. The monopoly on domain name registry initially granted by government fiat to Network Solutions (later purchased by Verisign) was one of the first great moneymakers of the internet. While we've argued that business advantage via controlling software APIs is much more difficult in the age of the internet, control of key data sources is not, especially if those data sources are expensive to create or amenable to increasing returns via network effects.Look at the copyright notices at the base of every map served by MapQuest, maps.yahoo.com, maps.msn.com, or maps.google.com, and you'll see the line "Maps copyright NavTeq, TeleAtlas," or with the new satellite imagery services, "Images copyright Digital Globe." These companies made substantial investments in their databases (NavTeq alone reportedly invested $750 million to build their database of street addresses and directions. Digital Globe spent $500 million to launch their own satellite to improve on government-supplied imagery.) NavTeq has gone so far as to imitate Intel's familiar Intel Inside logo: Cars with navigation systems bear the imprint, "NavTeq Onboard." Data is indeed the Intel Inside of these applications, a sole source component in systems whose software infrastructure is largely open source or otherwise commodified.The now hotly contested web mapping arena demonstrates how a failure to understand the importance of owning an application's core data will eventually undercut its competitive position. MapQuest pioneered the web mapping category in 1995, yet when Yahoo!, and then Microsoft, and most recently Google, decided to enter the market, they were easily able to offer a competing application simply by licensing the same data.Contrast, however, the position of Amazon.com. Like competitors such as Barnesandnoble.com, its original database came from ISBN registry provider R.R. Bowker. But unlike MapQuest, Amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample material. Even more importantly, they harnessed their users to annotate the data, such that after ten years, Amazon, not Bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. Amazon also introduced their own proprietary identifier, the ASIN, which corresponds to the ISBN where one is present, and creates an equivalent namespace for products without one. Effectively, Amazon "embraced and extended" their data suppliers.Imagine if MapQuest had done the same thing, harnessing their users to annotate maps and directions, adding layers of value. It would have been much more difficult for competitors to enter the market just by licensing the base data.The recent introduction of Google Maps provides a living laboratory for the competition between application vendors and their data suppliers. Google's lightweight programming model has led to the creation of numerous value-added services in the form of mashups that link Google Maps with other internet-accessible data sources. Paul Rademacher'shousingmaps.com, which combines Google Maps with Craigslist apartment rental and home purchase data to create an interactive housing search tool, is the pre-eminent example of such a mashup.At present, these mashups are mostly innovative experiments, done by hackers. But entrepreneurial activity follows close behind. And already, one can see that for at least one class of developer, Google has taken the role of data source away from Navteq and inserted themselves as a favored intermediary. We expect to see battles between data suppliers and application vendors in the next few years, as both realize just how important certain classes of data will become as building blocks for Web 2.0 applications.The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.For example, in the area of identity, PayPal, Amazon's 1-click, and the millions of users of communications systems, may all be legitimate contenders to build a network-wide identity database. (In this regard, Google's recent attempt to use cell phone numbers as an identifier for Gmail accounts may be a step towards embracing and extending the phone system.) Meanwhile, startups like Sxip are exploring the potential of federated identity, in quest of a kind of "distributed 1-click" that will provide a seamless Web 2.0 identity subsystem. In the area of calendaring, EVDB is an attempt to build the world's largest shared calendar via a wiki-style architecture of participation. While the jury's still out on the success of any particular startup or approach, it's clear that standards and solutions in these areas, effectively turning certain classes of data into reliable subsystems of the "internet operating system", will enable the next generation of applications.A further point must be noted with regard to data, and that is user concerns about privacy and their rights to their own data. In many of the early web applications, copyright is only loosely enforced. For example, Amazon lays claim to any reviews submitted to the site, but in the absence of enforcement, people may repost the same review elsewhere. However, as companies begin to realize that control over data may be their chief source of competitive advantage, we may see heightened attempts at control.Much as the rise of proprietary software led to the Free Software movement, we expect the rise of proprietary databases to result in a Free Data movement within the next decade. One can see early signs of this countervailing trend in open data projects such as Wikipedia, the Creative Commons, and in software projects like Greasemonkey, which allow users to take control of how data is displayed on their computer.
  • Key message: We’ve always had data and have been using tools to gain insights from that data. So, why so much focus on this now? What is driving the big data phenomenon? Primarily there are 3 major elements that are coming together to form this inflection point. On one hand the volume of data has been growing at astronomical speed while the nature of the data also is changing from used to be more structured data to multitude of unstructured data formats. As per IDC, the amount of digital data created in 2010was 1200 exabytes. It sis expected to grow to 40000 exabytes by 2020 alone. (IDC Digital Universe Study Dec 2012)The cost of technology that is needed to store and process this ever-growing data has been coming down significantly that now it is making economic possibility to apply the technology to use data for significant value generation. As an example: Server system pricing declined from $11.5K in 2002 down to ~$6.5K in 2012 (IDC WW Server Tracker) Cost of storage has declined from $24 per GB to $1.5 per GB from 2002 – 2012 (IDC Storage Tracker)Adding to this is the third vector of significant new investments from several ecosystem players to build tools and services required.
  • Let us talk Data:On a personal scale, not too far ago, I was perfectly happy with a storage of few MB to store all my important files. Now I have 4 TB of personal data and still not enough.Same thing is applicable to Enterprises.Not only the volume but the variety of data that we exchange and store has changed and that is driving the rapid growth.Most corporations used to measure the data in KB in the mainframe era. The advent of Internet brought this in GB.The new denominator in the wave of mobile, social, big data and cloud era is Zettabytes. And no one is able to accurately predict the rate of growth that is constantly being revised upwards. Most of this was not possible before. With the advent of new computing, Storage and networking technologies coupled with others are making this possible. The thirst never seem to subside though !!
  • It’s no surprise that the amount of data being generated is growing at a phenomenal rate. In the next decade the rate of information generated will grow nearly 15x, powered by a variety of inputs, from cloud to the growth of clients to the increase in machine-to-machine communication.Big Data has always been with us, and it’s likely that it exists in many places in your environment. If you have a website that accepts comments, you have the potential to mine those comments to make your business better. If you have industrial controls in your manufacturing environment, the data from those could increase your profits. Your employees are also one of your best indicators of how business runs, and the ability to use information to make their jobs easier could greatly benefit your company.All of this data provides a gold mine of information and decision process… if you have the capability to mine that data to get those insights. As the volume, variety, and velocity of data continues to grow, the opportunities to draw on the data will also increase. And that means insight and value to your business.
  • SMAC equals SOCIAL, MOBILE, ANALYTICS & CLOUDCloud is a new service delivery model & creates economies of scale and time to market improvement. Thinking about cloud in the context of Analytics – Cloud enables enterprises to ingest and manage structured and unstructured data from numerous sources as inputs into Analytics engines. Mobile is not just how you connect with your customers, suppliers and employees but also how the “Internet of Things” connect to your enterprise. Analytics enhance the value proposition of mobility.Social is the new feedback mechanism and boils down to finding the right people and the right information then driving engagement to a desired outcome. Analytics – When we look at the data challenge it breaks into 3 areas: Volume, Variety and Velocity. We now have enormous amounts of data – some of our own, some from external sources. Some of this data is structured and stored in our traditional relational databases, but much of it is unstructured. Unstructured could be an email, text message, video file, voicemail. The 1st step is to bring some structure to these unstructured data sources. Once that is done science of data can be applied to create observations, predictive models and answer previously unanswerable questions.Figuring out what business problems need to be solved using advanced analytics is the key in Unlocking the Potential Value. Individually each component of the SMAC stack creates value. Placed in the combination, and the exponential value is unlocked. And when it comes to the transformative opportunities with Big Data and Analytics, SMAC in combination can’t be underestimated.
  • Collective Intelligence and human creativity are our only limitation – True but we need to put a little structure to our approach so we’ve chosen these four simple questions and are progressing thru each phase systematically. The 4 key questions are:What Happened? Why did it Happen? What will happen? And How do we make it Happen?Simple to say but just a bit more complex when you apply a specific business question.
  • I’ll shift gears on what we’re doing with big data at Intel. These use cases span across internal operations, consumer behavior and security and risk management. In the next three years we expect to achieve cost savings and increased bottom line revenue of nearly half a billion dollars through use of analytics solutions by 2016.
  • Key Message: Intel Big Data value – massive compute capabilities and industry enabling skill. Reinforce that use of our SW will aid Intel in getting value from our HW innovations and expanding solution sales opportunitiesWe know there is pent up demand for data analytics solutions. Corporations know they can grow revenue, reduce costs and reduce risk by leveraging available data sources. We’ve all seen the stats: the big data market is sized as $35B in silicon, system hardware, software and professional services by 2020, growing at 30% CAGR. On one hand, we are making Intel Xeon E7 v2 the platform of choice for in-memory workloads by ramping established solutions such as SAP HANA, ORCL 12c, SAS, etc.  But today’s deployments of big data solutions are constrained by cost and complexity. There aren’t enough data scientists in the world to meet the needs of Enterprise.Intel’s Distribution of Hadoop is both optimized for Intel Architecture and the widely adopted. Our recent launch of the Hadoop version 3 framework and Graph Builder visual analytics software are direct steps towards that goal. Further, Intel SSD and Network optimized products are targeted to Big Data workloads.As your business grows and your data grows, Xeon E7 is built to handle the growth of your largest workloads. Not all workloads scale out, for example DB workloads. Xeon E7 v2, with it’s ability to scale up, can handle those workloads.
  • Xeon E7 v ’s unique combination of high performance, improved reliability and large memory footprint provide a winning combination that is ideal for today’s data heavy IT environments. By harnessing these capabilities, IT leaders can more quickly gain insight, make business decisions and gain competitive advantage. Real-time Business Intelligence that drives faster decision making, for greater competitiveness and profitability potential. Puts organizations in a position to solve today’s business problems and even handle data-related issues for tomorrow.
  • Basic BI is the cost of doing business today. Advanced BI and predictive models help keep Intel ahead of the competition with faster information analysis and decision making. Identification of High-Potential Resellers - Advanced BI allows us to focus resources where they will generate the highest return. For example, we developed a solution to help Intel sales teams strategically focus on large-volume resellers to deliver greater revenue. This engine mines large sets of internal and external sales data to identify the most promising reseller partners in specific geographies. In last two years,this solution identified 3x as many high-potential resellers compared to using manual methods. We estimated up to USD 50 million in potentially new and incremental sales opportunities from our deployments worldwide in 2012 and 2013.Rapid detection of information security threats - Early warning of malware and other cyber threats increase enterprise information security. (need to obtain more speaker notes on this use case with some numbers)IDH – Key Differentiators:Open Data Platform:We believe Hadoop has the potential to evolve into the open data platform where ton of innovation will happen, we want the momentum to continue and almost everything we are doing in the space we are contributing back to open source. "Operationalizing Hadoop" is key to making it enterprise grade. There is tremendous work happening in the community and we leverage that. However we have decided to invest where the community has not had much time to invest, and focus on  Specifically around Security, deployment and performance to make it ready for mass enterprise deploymentsSecurity:Authentication, authorization, auditing built-in to Apache HadoopTransparent encryption in Hive, Pig, MapReduce, HBase, HDFSUp to 20x faster en/decryption with Intel AES-NI1Performance:Up to 30x faster on Intel architecture than other hardwareUp to 2.6X faster than other open source distributionsManageability:Enterprise-grade cluster management console and APIsAutomated configuration with Intel® Auto TunerREST based fully configurable manager
  • All organizations have something to be gained from Big Data Analytics. The key is to get started. The first step is to identify a few small quick wins – think about what your logs can tell you (security logs, call center logs, transaction logs, etc) and think about uncovering the patterns in any for of claims that you have (insurance, warranty, rebate) – maybe it’s fraud or duplicate payments like we had. Then you have to begin to shift the questions from what happened to what WILL happen – dedicate some % of time on the predictionsOn to the more difficult issue of skills. These skills are scarce but you can acquire, build skills; or even “rent” skills to get you started.My view is You’ll need to do all of these things initiallyFocus on causation not correlation….test tons of hypotheses…scientific method Correlation not equal to causation
  • IT is in the best position to be the organizational catalyst to capture the value of Big data. At Intel IT, we are not only driving our teams to think about the opportunity but all functions across the company. Are you up for the challenge in your organization? Thank You
  • I would like to draw your attention to IT@Intel Program from Intel with an intent to share our best practices with you all and I turn bring the learnings back to Intel to influence our products that could benefit all. You can download our mobile app onto your device as well as visit the URL as listed here to access many resources including white papers, case studies, how to guides, radio shows, webinars among others..
  • Unlock Hidden Potential through Big Data and Analytics

    1. 1. Unlocking Hidden Potential Through Big Data & Analytics Kim Stevenson Intel Chief Information Officer February 24, 2014 @kimsstevenson
    2. 2. 36 Minutes
    3. 3. Today’s Discussion • • • • What’s driving Big Data? What is Intel doing to power this? What is Intel IT doing about it? Next Steps – what should you be doing?
    4. 4. What’s Driving Big Data? 10X Volume & Type of Data Lower Cost of Compute & Storage Data growth by 2016 90% unstructured 1 AVERAGE SERVER COST STORAGE COST / GB 2002 - 20122 2002 - 20122 90% 40% $17B New Investments in Tools & Services $3B 2010 1: IDC Digital Universe Study December 2012 2: Intel Forecast 3: IDC WW Big Data Forecast $7B 2012 20173
    5. 5. 100,000+ tweets 695,000 status updates Cloud/Server Megabytes The Internet Gigabytes Social, Mobile, Cloud, Analytics Zettabytes 698,445 Google searches • • Mainframe Kilobytes 168 million+ emails sent 1,820TB of data created 217 new mobile web users Yottabytes ? Every 60 seconds 11million instant messages
    6. 6. Virtuous Cycle of Data-Driven Innovation 40 Zettabytes of data will be generated WW in 20201 Clients Cloud Richer user experiences Intelligent Systems Richer data to analyze 2.8 Zettabytes of data generated WW in 20121 (1) IDC Digital Universe 2020, (2) IDC Richer data from devices
    7. 7. Corporations are deriving value from data UPS Package Flow Allows UPS to increase its customer satisfaction, obviates the needs of humans in route planning, increasing accuracy and efficiency, saves millions of miles and millions of gallons Data and Micro-Targeting Assists Campaigns Built a voter file that included voter history, demographic data, and preferences to identify “persuadables” to target Predictive Analytics to Reduce Maintenance Costs Predicted part failure rates and sent preventative maintenance crews to anticipate issues *Other brands and names are the property of their respective owners
    8. 8. Big Data: Big Opportunities Traffic Optimization Smart Energy Grid Personalized Preventive Care Location Aware Ad Placement Claim Fraud Reduction Buyer Protection Program
    9. 9. What fuels this Innovation? 9
    10. 10. Data Platform: Intel Value Enable Massive Compute Build Open Ecosystem Reduce Complexity Intel Data Platform : Analytic s Toolkit
    11. 11. Make Business Intelligence Your Competitive Advantage The Intel® Xeon® Processor E7 – 8800/4800/2800 v2 Product Families delivers: MEMORY PERFORMANCE RELIABILITY In-Memory reliability and uptime + scalability Up To 3X Capacity Up To 2X Throughput Improvement Designed For >99.999% Reliability Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. (E7 v2 Memory Capacity Increase) On a 4-socket natively-connected platform: Intel® Xeon® processor E7 family supports 64DIMMS, max memory per DIMM of 32GB RDIMM; Intel® Xeon® processor E7 v2 family supports 96DIMMs, max memory per DIMM of 64GB RDIMM. This enables a 3x increase in memory.
    12. 12. Intel Distribution of Hadoop Security Performance Manageability
    13. 13. Value Creation Through ADVANCED ANALYTICS FORESIGHT How can we make it happen? What will happen? INSIGHT Why did it happen? What happened? HINDSIGHT Descriptive Diagnostic Predictive Prescriptive
    14. 14. Business Value through Analytics at Intel Operational Efficiency Revenue Growth Cost Reduction Post Silicon Validation Channel Reseller Selection Fraud Prevention
    15. 15. The Path to INSIGHTS • Start the Journey with Quick Wins • Move from Rear view to Future Projections • Acquire and Build Skills • Seek Causation not Correlation • Intel Data Platform can help accelerate your value
    16. 16. BE the CATALYST to UNLOCK the POTENTIAL
    17. 17. Sharing Intel IT Best Practices with the World Learn More About Intel IT’s Initiatives at www.intel.com/IT Download Intel IT Business Review mobile app from your smartphone or tablet device m.intel.com/IIBR
    18. 18. @kimsstevenson
    19. 19. How Accurate is the Forecast?