BIG DATA,
    BIG CONTENT
    SNW SPRING
    FRED OH
    APRIL 2, 2012



1                   © Hitachi Data Systems Corporation 2011. All Rights Reserved.
BIG DATA IS NOT JUST ABOUT SIZE


                              SEMI-STRUCTURED
                                         DATA                                Four Vs
             UNSTRUCTURED                                                     Volume
                     DATA                          Satellite
                                                   Images
    STRUCTURED
          DATA
                                 Email
                                                            Sensors           Velocity
                                                                   Bio-
                                                               Informatics
            OLTP                   Documents
                                                                             Variability
                                                               M2M and
                                                               Web Logs
                                   Social
                                                                               Value
                                                          Video
                                                    Audio




             Data-intensive Processing Increases
2
BIG OPPORTUNITY – ACROSS INDUSTRIES

    BIG DATA IMPACT                BIG DATA EXAMPLES

                              Science and data science
            Telco
                              • Decoding a genome with 3 billion data pairs can
     $100B Opportunity
                                now be done in < 1 hour


         Healthcare
     $300B Opportunity)
                              Media and entertainment
                              • Video surveillance at airports with facial
            Retail              recognition analysis and real-time reporting to
      +60% Margin (US)         security



       Manufacturing
     +50% Production $       Oil and gas
                              • Projects usually require coordinating hundreds of
                                firms with up to 10PB of data to analyze oil
          Public                locations
       Administration
    €100B Opportunity (EU)
3
BIG DATA MARKET MATURITY
    IS JUST BEGINNING


     85% of Fortune 500 companies are unable to exploit big
      data for competitive advantage
     90% of business leaders say information is a strategic
      asset but <10% can quantify its economic value
     Preparing now with data
      quality, event-driven
      architectures, and laying
      foundational infrastructure for Big
      Data later.




4
ONE PLATFORM FOR THE JOURNEY
    OF BIG DATA AND BIG CONTENT



                  SEARCH ACROSS
                  NEW HDDS


                                              SEARCH,
                                              ANALYZE


                        BRING
     INGEST,          ANALYTICS
      STORE            TO THE
                        DATA


                                  PARTNERS




                                             REPURPOSE,
                                             RECOMBINE
5
HITACHI NAS PLATFORM, POWERED BY
     BLUEARC® (HNAS) – BIG DATA IN OIL AND GAS
        Data          Data          Seismic          Visual
     Acquisition   Management      Processing    Interpretation       Data workflows and
                                                                       management are
                                                                     increasingly complex




                           Modeling     Petrophysical     Property    Simulation
                          Automation      Analysis        Modeling


     HNAS provides high-performance scale
      ‒ Tremendous need for high-performance storage
      ‒ High data volume with storage requirements from 200TB to tens of PB
      ‒ High-frequency data streams – e.g., 10MB/sec times the number of boats




6
BIG DATA, BIG CONTENT


                                         Provide bottomless
                                          storage
                                             80 nodes and 32 billion
                                              objects
                                             1,000 tenants per
                                              system
                                             70K namespaces in
                                              many-to-one systems


                            Replicate    Reduce tape backup

                                         Distribute content
                                             Write once, read
                                              everywhere




7
SEARCH FOR BIG DATA (COMING IN 2012)

    NEW HITACHI DATA DISCOVERY SUITE

     Super-scale search                                             SEARCH
                                                                     ACROSS
     Big data search index architecture with
      Solr + Hadoop + Hitachi
                                                          PARALLEL   PARALLEL   PARALLEL
                                                          INDEXING   INDEXING   INDEXING




                                                          REGION 1   REGION 2   REGION 3




     Geospatial and wide area search for FCS portfolio




8
BIG DATA ANALYTICS – REAL TIME

     HITACHI CONVERGED PLATFORM FOR SAP HANA™

      In-memory computing for real-time analytics          HIGH
                                                        PERFORMANCE
       ‒ Calculate first, then move results                 APPS

      Processing massive quantities                             Delegate
       of real-time data to provide immediate results              Data-
                                                                 intensive
                                                                Operations
      Converged Platform provides
       ‒ On-demand, nondisruptive scalability
       ‒ Highest-performing appliance for SAP
                                                         DATA LAYER
         HANA




    MASSIVE SCALE-OUT COMING IN 2012!
9
WINNING STRATEGY – TCO VS. RACK-BY-RACK

     COMPETITOR DEPLOYMENT IS RACK-BY-RACK
     AT LOWEST POSSIBLE PRICING




                        10 NAS nodes
                            with 720TB per rack!




10
WINNING STRATEGY – TCO VS. RACK-BY-RACK

     HITACHI DATA SYSTEMS DEPLOYMENT IS TCO-BASED




                               2 HNAS
                             3090 nodes
                             + 672TB for 1st rack




11
GOING FORWARD – $$$$$$$$$$$$$$$$$$$$$




          Information     Managed         Genomics
            Lifecycle     Storage     Information Cloud
        Management +      Solution
                                                FLUID
      National Genomics                        CONTENT
       Database (based
         on HCP and
       HDDS), 4PB per
               year                      DYNAMIC    SOPHISTICATED
                                     INFRASTRUCTURE    INSIGHT




12
QUESTIONS
     AND DISCUSSION



13
THANK YOU



14

Big Data, Big Content, and Aligning Your Storage Strategy

  • 1.
    BIG DATA, BIG CONTENT SNW SPRING FRED OH APRIL 2, 2012 1 © Hitachi Data Systems Corporation 2011. All Rights Reserved.
  • 2.
    BIG DATA ISNOT JUST ABOUT SIZE SEMI-STRUCTURED DATA Four Vs UNSTRUCTURED Volume DATA Satellite Images STRUCTURED DATA Email Sensors Velocity Bio- Informatics OLTP Documents Variability M2M and Web Logs Social Value Video Audio Data-intensive Processing Increases 2
  • 3.
    BIG OPPORTUNITY –ACROSS INDUSTRIES BIG DATA IMPACT BIG DATA EXAMPLES Science and data science Telco • Decoding a genome with 3 billion data pairs can $100B Opportunity now be done in < 1 hour Healthcare $300B Opportunity) Media and entertainment • Video surveillance at airports with facial Retail recognition analysis and real-time reporting to +60% Margin (US) security Manufacturing +50% Production $ Oil and gas • Projects usually require coordinating hundreds of firms with up to 10PB of data to analyze oil Public locations Administration €100B Opportunity (EU) 3
  • 4.
    BIG DATA MARKETMATURITY IS JUST BEGINNING  85% of Fortune 500 companies are unable to exploit big data for competitive advantage  90% of business leaders say information is a strategic asset but <10% can quantify its economic value  Preparing now with data quality, event-driven architectures, and laying foundational infrastructure for Big Data later. 4
  • 5.
    ONE PLATFORM FORTHE JOURNEY OF BIG DATA AND BIG CONTENT SEARCH ACROSS NEW HDDS SEARCH, ANALYZE BRING INGEST, ANALYTICS STORE TO THE DATA PARTNERS REPURPOSE, RECOMBINE 5
  • 6.
    HITACHI NAS PLATFORM,POWERED BY BLUEARC® (HNAS) – BIG DATA IN OIL AND GAS Data Data Seismic Visual Acquisition Management Processing Interpretation Data workflows and management are increasingly complex Modeling Petrophysical Property Simulation Automation Analysis Modeling  HNAS provides high-performance scale ‒ Tremendous need for high-performance storage ‒ High data volume with storage requirements from 200TB to tens of PB ‒ High-frequency data streams – e.g., 10MB/sec times the number of boats 6
  • 7.
    BIG DATA, BIGCONTENT  Provide bottomless storage  80 nodes and 32 billion objects  1,000 tenants per system  70K namespaces in many-to-one systems Replicate  Reduce tape backup  Distribute content  Write once, read everywhere 7
  • 8.
    SEARCH FOR BIGDATA (COMING IN 2012) NEW HITACHI DATA DISCOVERY SUITE  Super-scale search SEARCH ACROSS  Big data search index architecture with Solr + Hadoop + Hitachi PARALLEL PARALLEL PARALLEL INDEXING INDEXING INDEXING REGION 1 REGION 2 REGION 3  Geospatial and wide area search for FCS portfolio 8
  • 9.
    BIG DATA ANALYTICS– REAL TIME HITACHI CONVERGED PLATFORM FOR SAP HANA™  In-memory computing for real-time analytics HIGH PERFORMANCE ‒ Calculate first, then move results APPS  Processing massive quantities Delegate of real-time data to provide immediate results Data- intensive Operations  Converged Platform provides ‒ On-demand, nondisruptive scalability ‒ Highest-performing appliance for SAP DATA LAYER HANA MASSIVE SCALE-OUT COMING IN 2012! 9
  • 10.
    WINNING STRATEGY –TCO VS. RACK-BY-RACK COMPETITOR DEPLOYMENT IS RACK-BY-RACK AT LOWEST POSSIBLE PRICING 10 NAS nodes with 720TB per rack! 10
  • 11.
    WINNING STRATEGY –TCO VS. RACK-BY-RACK HITACHI DATA SYSTEMS DEPLOYMENT IS TCO-BASED 2 HNAS 3090 nodes + 672TB for 1st rack 11
  • 12.
    GOING FORWARD –$$$$$$$$$$$$$$$$$$$$$ Information Managed Genomics Lifecycle Storage Information Cloud Management + Solution FLUID National Genomics CONTENT Database (based on HCP and HDDS), 4PB per year DYNAMIC SOPHISTICATED INFRASTRUCTURE INSIGHT 12
  • 13.
    QUESTIONS AND DISCUSSION 13
  • 14.

Editor's Notes

  • #3 Answer why it’s called big data – Explain misnomer Emphasize the info extraction and why analytics is so important. Possible analogy from Data WH -- NAS – Distributed DatasetOLD NOTESThe Analysts are all hard at work defining Big Data in their own unique ways but they all pretty much agree on 3 key characteristics. Along with big volumes of data, we have velocity which refers to the speed at which the data is streaming in as well as the time sensitivity of delivering the analysis/reacting and variability which refers to the data format – typically separated into structured (fits relational database model), unstructured and semi-structured (has structure but doesn’t fit relational model). Most would argue that it is a combination of these factors that defines Big Data or that Big Data Analytics refers to problems that we can’t solve with traditional DW/Analytics technologiesThe chart illustrates the evolution of data available for analytics as 3 waves – OLTP or traditional DW, Human generated unstructured data – the wave driven by social media, and machine generated data which will really take hold with the Internet of thingsThough traditional DW has been around for about 30 years if really took off in the 1990s. Companies needed a way to gain cross business insight from all the disparate database applications they had rolled out e.g. ERP, supply chain management, order entry… They did that by loading data from the operational systems into relational data warehouses. In the early days the cost of DW was very high - $Millions for mere TBs so earliest adoption was by the big transaction heavy businesses with deep pockets like banks and retailers. The combination of lower technology costs and increased storage and compute capacity spawned usage by companies of all sizes. Data volumes were driven higher by Internet applications, eCommerce and the focus on CRM in the 2000s. Today the largest DWs are in the low PBs but average size still closer to the 10s to 100s of TBs for most businesses - sizeable but not when compared to the next waves The data is all captured from and stored in relational databases so it is highly structured and though there are real time applications predominently data is loaded as nightly and weekly batch jobsThe 2nd wave human generated unstructured data started around 3 years ago but ramped this past year. Social media content including blogs and twitter feeds is a big component here along with web logs that track the trail on human activity on the Internet. Many of these web log files used to be thrown away but with the reduced cost of storage and compute power companies are now starting to glean valuable insight – we’ll look at examples in a few slides. Clearly the volumes are huge here (remember Google generates 20PB daily) , the data streams in at a fast rate and the data does not have the nice predictable structure that we had with the OTLP data. The final wave is machine data this will be the biggest wave of all and some estimate that though we are just dipping a toe into analyzing this kind of data it will overtake social media data in terms of volume within 5 years and quickly surpass it 10 to 20 fold. As we saw from the Boeing example the data streams will be constant and ability to analyze and not just gather insight but to react in real time will be critical for many applications.
  • #4 *Source: McKinsey Global Institute, 2011 – global projections – Healthcare, Telco, Retail, Manufaturing, Public Admin (Above source) By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”Notes Science and data science190,000 – Shortage of data scientists in U.S. by 2018Media400B Videos viewed online in 2010 (U.S.Oil and Gas2011 $5 Billion in IT spend and $1 Billion on storage-----Oil &amp; Gas – From BjornVideos Watched - http://bigdata-io.org/digital-entertainment-52b-in-2014NOTES – WIPShould talk about / note that search is important tot all these. Video Surveillance at Airports in support of National Defense Video Cameras at all airports (Hitachi Kokusai Ltd)Facial Recognition SW to identify ‘people of interest’ (Hitachi Ltd)Real time reporting to security forces before they leave the airport9420 Tweets per secondAnalyze content for ‘favorable’ characteristicsSend ‘buy now’ app to smart phone15% off couponFree shippingHave it before the next game
  • #5 Gartner&gt;Most organizations will be unable to exploit new analytic capabilities due to poor data qualityand latency.■ Data quality assurance is becoming a high priority, but traditional approaches fail due toincreased information volume, velocity, variety and complexity.■ The desire to increase reliability, consistency, control and agility in information infrastructure isdriving organizations to rationalize overlapping tools and technologies, replace custom code,remove data silos and add richer metadata and modeling.■ Few organizations evaluate the economic potential of information assets with the discipline theydemonstrate in managing, deploying and accounting for traditional physical and financialassets.■ Event data, proliferating rapidly, can be used to improve situation awareness and enable senseand-respond &quot;smart&quot; systems with rigorous information governance.Recommendations■ Adapt data quality measurement methods to samples, as it will not be possible to measure all.Map expectations to specific uses and expose &quot;confidence factors&quot; to provide businesscontext.■ Select straightforward approaches to estimate the relative value of information sources usingquality, completeness, consistency, integrity, scarcity, timeliness and business problemrelevance, for example.■ Determine a framework and methods (cost, income or market-based) with your CFO to quantifyinformation asset financial value. Consider a supplemental balance sheet to communicate it.■ Use Gartner&apos;s Information Capabilities Framework to identify technology in place thataddresses common capabilities and gaps where tools are lacking. Plan to fill critical gaps andrationalize tools.■ Make event-driven architecture and complex event processing first-class citizens in datamodeling work and metadata repositories.Strategic
  • #6 Customer questions “Do you have a scalable platform for big data?” How do I find across How do I perform – thru partnership with ; Industry vertical, application providers, HANA; Hitachi Consulting This this where EMC will position Islion
  • #7 Historically, IT has focused on delivering infrastructure for each application. Our infrastructure cloud approach unifies your server, storage and network silos to improve utilization, simplify management and lower costs. Separating applications from underlying storage allows data to be moved freely according to usage, cost and application requirements with minimal impact to applications.As unstructured data overtakes structured data, our content cloud approach creates a warehouse to store billions of data objects. Intelligence makes it all indexable, searchable, and discoverable across applications and devices, anytime and anywhere. This allows you to cut costs associated with managing, storing and accessing data and automate the information lifecycle. Infrastructure and content form the foundation for the information cloud, which will help you repurpose and extract more value from your data and content. It integrates data across application silos and serves it up to analytics applications that connect data sets, reveal patterns across them, and surface actionable insights to business users. Underneath it all, our single virtualization platformensures your organization gets seamless access to all resources, data, content and information.
  • #9 Super scale search with newHitachi Data Discovery Suite (HDDS)Exponentially more scalable and fasterBillions of objects across geographiesHadoop architecture for scale out indexingLeverages distributed platforms for big dataKey big data use case support of Geospatial (latitude/long.) search
  • #10 Today’s applications execute many data-intense operations in the application layer but High-performance apps delegate data-intense operations to in-memory computingHDS Unique:On-demand, non-disruptive scalabilityScale seamlessly from HANA “S” to “M” to “L” configurations with Hitachi blades and storageHighest-performing appliance for SAP HANAHitachi solution uses 4-way x86 blade servers with Intel 10-core CPUsBest investment protection and lower OPEXSupport production and test/dev/QA within a single blade chassis