Big Data, Big Content, and Aligning Your Storage Strategy


Published on

Fred Oh's presentation for SNW Spring, Monday 4/2/12, 1:00–1:45PM

Unstructured data growth is in an explosive state, and has no signs of slowing down. Costs continue to rise along with new regulations mandating longer data retention. Moreover, disparate silos, multivendor storage assets and less than optimal use of existing assets have all contributed to ‘accidental architectures.’ And while they can be key drivers for organizations to explore incremental, innovative solutions to their data challenges, they may provide only short-term gain. Join us for this session as we outline the business benefits of a truly unified, integrated platform to manage all block, file and object data that allows enterprises can make the most out of their storage resources. We explore the benefits of an integrated approach to multiprotocol file sharing, intelligent file tiering, federated search and active archiving; how to simplify and reduce the need for backup without the risk of losing availability; and the economic benefits of an integrated architecture approach that leads to lowering TCSO by 35% or more.

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Answer why it’s called big data – Explain misnomer Emphasize the info extraction and why analytics is so important. Possible analogy from Data WH -- NAS – Distributed DatasetOLD NOTESThe Analysts are all hard at work defining Big Data in their own unique ways but they all pretty much agree on 3 key characteristics. Along with big volumes of data, we have velocity which refers to the speed at which the data is streaming in as well as the time sensitivity of delivering the analysis/reacting and variability which refers to the data format – typically separated into structured (fits relational database model), unstructured and semi-structured (has structure but doesn’t fit relational model). Most would argue that it is a combination of these factors that defines Big Data or that Big Data Analytics refers to problems that we can’t solve with traditional DW/Analytics technologiesThe chart illustrates the evolution of data available for analytics as 3 waves – OLTP or traditional DW, Human generated unstructured data – the wave driven by social media, and machine generated data which will really take hold with the Internet of thingsThough traditional DW has been around for about 30 years if really took off in the 1990s. Companies needed a way to gain cross business insight from all the disparate database applications they had rolled out e.g. ERP, supply chain management, order entry… They did that by loading data from the operational systems into relational data warehouses. In the early days the cost of DW was very high - $Millions for mere TBs so earliest adoption was by the big transaction heavy businesses with deep pockets like banks and retailers. The combination of lower technology costs and increased storage and compute capacity spawned usage by companies of all sizes. Data volumes were driven higher by Internet applications, eCommerce and the focus on CRM in the 2000s. Today the largest DWs are in the low PBs but average size still closer to the 10s to 100s of TBs for most businesses - sizeable but not when compared to the next waves The data is all captured from and stored in relational databases so it is highly structured and though there are real time applications predominently data is loaded as nightly and weekly batch jobsThe 2nd wave human generated unstructured data started around 3 years ago but ramped this past year. Social media content including blogs and twitter feeds is a big component here along with web logs that track the trail on human activity on the Internet. Many of these web log files used to be thrown away but with the reduced cost of storage and compute power companies are now starting to glean valuable insight – we’ll look at examples in a few slides. Clearly the volumes are huge here (remember Google generates 20PB daily) , the data streams in at a fast rate and the data does not have the nice predictable structure that we had with the OTLP data. The final wave is machine data this will be the biggest wave of all and some estimate that though we are just dipping a toe into analyzing this kind of data it will overtake social media data in terms of volume within 5 years and quickly surpass it 10 to 20 fold. As we saw from the Boeing example the data streams will be constant and ability to analyze and not just gather insight but to react in real time will be critical for many applications.
  • *Source: McKinsey Global Institute, 2011 – global projections – Healthcare, Telco, Retail, Manufaturing, Public Admin (Above source) By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”Notes Science and data science190,000 – Shortage of data scientists in U.S. by 2018Media400B Videos viewed online in 2010 (U.S.Oil and Gas2011 $5 Billion in IT spend and $1 Billion on storage-----Oil & Gas – From BjornVideos Watched - – WIPShould talk about / note that search is important tot all these. Video Surveillance at Airports in support of National Defense Video Cameras at all airports (Hitachi Kokusai Ltd)Facial Recognition SW to identify ‘people of interest’ (Hitachi Ltd)Real time reporting to security forces before they leave the airport9420 Tweets per secondAnalyze content for ‘favorable’ characteristicsSend ‘buy now’ app to smart phone15% off couponFree shippingHave it before the next game
  • Gartner>Most organizations will be unable to exploit new analytic capabilities due to poor data qualityand latency.■ Data quality assurance is becoming a high priority, but traditional approaches fail due toincreased information volume, velocity, variety and complexity.■ The desire to increase reliability, consistency, control and agility in information infrastructure isdriving organizations to rationalize overlapping tools and technologies, replace custom code,remove data silos and add richer metadata and modeling.■ Few organizations evaluate the economic potential of information assets with the discipline theydemonstrate in managing, deploying and accounting for traditional physical and financialassets.■ Event data, proliferating rapidly, can be used to improve situation awareness and enable senseand-respond "smart" systems with rigorous information governance.Recommendations■ Adapt data quality measurement methods to samples, as it will not be possible to measure all.Map expectations to specific uses and expose "confidence factors" to provide businesscontext.■ Select straightforward approaches to estimate the relative value of information sources usingquality, completeness, consistency, integrity, scarcity, timeliness and business problemrelevance, for example.■ Determine a framework and methods (cost, income or market-based) with your CFO to quantifyinformation asset financial value. Consider a supplemental balance sheet to communicate it.■ Use Gartner's Information Capabilities Framework to identify technology in place thataddresses common capabilities and gaps where tools are lacking. Plan to fill critical gaps andrationalize tools.■ Make event-driven architecture and complex event processing first-class citizens in datamodeling work and metadata repositories.Strategic
  • Customer questions “Do you have a scalable platform for big data?” How do I find across How do I perform – thru partnership with ; Industry vertical, application providers, HANA; Hitachi Consulting This this where EMC will position Islion
  • Historically, IT has focused on delivering infrastructure for each application. Our infrastructure cloud approach unifies your server, storage and network silos to improve utilization, simplify management and lower costs. Separating applications from underlying storage allows data to be moved freely according to usage, cost and application requirements with minimal impact to applications.As unstructured data overtakes structured data, our content cloud approach creates a warehouse to store billions of data objects. Intelligence makes it all indexable, searchable, and discoverable across applications and devices, anytime and anywhere. This allows you to cut costs associated with managing, storing and accessing data and automate the information lifecycle. Infrastructure and content form the foundation for the information cloud, which will help you repurpose and extract more value from your data and content. It integrates data across application silos and serves it up to analytics applications that connect data sets, reveal patterns across them, and surface actionable insights to business users. Underneath it all, our single virtualization platformensures your organization gets seamless access to all resources, data, content and information.
  • Super scale search with newHitachi Data Discovery Suite (HDDS)Exponentially more scalable and fasterBillions of objects across geographiesHadoop architecture for scale out indexingLeverages distributed platforms for big dataKey big data use case support of Geospatial (latitude/long.) search
  • Today’s applications execute many data-intense operations in the application layer but High-performance apps delegate data-intense operations to in-memory computingHDS Unique:On-demand, non-disruptive scalabilityScale seamlessly from HANA “S” to “M” to “L” configurations with Hitachi blades and storageHighest-performing appliance for SAP HANAHitachi solution uses 4-way x86 blade servers with Intel 10-core CPUsBest investment protection and lower OPEXSupport production and test/dev/QA within a single blade chassis
  • Big Data, Big Content, and Aligning Your Storage Strategy

    1. 1. BIG DATA, BIG CONTENT SNW SPRING FRED OH APRIL 2, 20121 © Hitachi Data Systems Corporation 2011. All Rights Reserved.
    2. 2. BIG DATA IS NOT JUST ABOUT SIZE SEMI-STRUCTURED DATA Four Vs UNSTRUCTURED Volume DATA Satellite Images STRUCTURED DATA Email Sensors Velocity Bio- Informatics OLTP Documents Variability M2M and Web Logs Social Value Video Audio Data-intensive Processing Increases2
    3. 3. BIG OPPORTUNITY – ACROSS INDUSTRIES BIG DATA IMPACT BIG DATA EXAMPLES Science and data science Telco • Decoding a genome with 3 billion data pairs can $100B Opportunity now be done in < 1 hour Healthcare $300B Opportunity) Media and entertainment • Video surveillance at airports with facial Retail recognition analysis and real-time reporting to +60% Margin (US) security Manufacturing +50% Production $ Oil and gas • Projects usually require coordinating hundreds of firms with up to 10PB of data to analyze oil Public locations Administration €100B Opportunity (EU)3
    4. 4. BIG DATA MARKET MATURITY IS JUST BEGINNING  85% of Fortune 500 companies are unable to exploit big data for competitive advantage  90% of business leaders say information is a strategic asset but <10% can quantify its economic value  Preparing now with data quality, event-driven architectures, and laying foundational infrastructure for Big Data later.4
    6. 6. HITACHI NAS PLATFORM, POWERED BY BLUEARC® (HNAS) – BIG DATA IN OIL AND GAS Data Data Seismic Visual Acquisition Management Processing Interpretation Data workflows and management are increasingly complex Modeling Petrophysical Property Simulation Automation Analysis Modeling  HNAS provides high-performance scale ‒ Tremendous need for high-performance storage ‒ High data volume with storage requirements from 200TB to tens of PB ‒ High-frequency data streams – e.g., 10MB/sec times the number of boats6
    7. 7. BIG DATA, BIG CONTENT  Provide bottomless storage  80 nodes and 32 billion objects  1,000 tenants per system  70K namespaces in many-to-one systems Replicate  Reduce tape backup  Distribute content  Write once, read everywhere7
    8. 8. SEARCH FOR BIG DATA (COMING IN 2012) NEW HITACHI DATA DISCOVERY SUITE  Super-scale search SEARCH ACROSS  Big data search index architecture with Solr + Hadoop + Hitachi PARALLEL PARALLEL PARALLEL INDEXING INDEXING INDEXING REGION 1 REGION 2 REGION 3  Geospatial and wide area search for FCS portfolio8
    9. 9. BIG DATA ANALYTICS – REAL TIME HITACHI CONVERGED PLATFORM FOR SAP HANA™  In-memory computing for real-time analytics HIGH PERFORMANCE ‒ Calculate first, then move results APPS  Processing massive quantities Delegate of real-time data to provide immediate results Data- intensive Operations  Converged Platform provides ‒ On-demand, nondisruptive scalability ‒ Highest-performing appliance for SAP DATA LAYER HANA MASSIVE SCALE-OUT COMING IN 2012!9
    12. 12. GOING FORWARD – $$$$$$$$$$$$$$$$$$$$$ Information Managed Genomics Lifecycle Storage Information Cloud Management + Solution FLUID National Genomics CONTENT Database (based on HCP and HDDS), 4PB per year DYNAMIC SOPHISTICATED INFRASTRUCTURE INSIGHT12
    14. 14. THANK YOU14