Exploring the Wider World of Big Data- Vasalis Kapsalis

  • 276 views
Uploaded on

Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research …

Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
276
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
15
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Big Data Landscape
  • 2. Entering a New Era of Scale 2
  • 3. Convergence of Technology Disrupters Create Opportunity Cloud Mobile Big Data Social Internet of Things NetApp Confidential - Internal Use Only
  • 4. Unstructured Data Growth Dominates Revenue Share by Segment Traditional structured Traditional unstructured Traditional replicated Content depots / public cloud  Traditional Structured and Replicated Data mix shift is driven by: − Efficiency (Dedup, Compr, Thin Prov, SATA) − Growth in new category of storage consumers using cloud / content depots  Unstructured Data (files and objects) in traditional storage + Content depots / Cloud) will be the largest storage category by 2014 − Content depots / Cloud expected to be 95% unstructured data
  • 5. Not Even to The “Peak” VISIBILITY Peak of Inflated Expectations Plateau of Productivity Slope of Enlightenment Trough of Disillusionment Technology Trigger TIME 40 Zettabytes 5 Billion Estimated size of the digital universe in 2020 Smart phones 30 Billion 80% Pieces of new content to Facebook per month Unstructured data 5
  • 6. Big Data Is All Data From Everywhere Fundamentally changes your business  Transactional Data The Jet way  Machine Data  Social Data  Enterprise Content The Call Center
  • 7. Big Data Vendor Landscape A Lot of Hype and Buzz – Everyone is Jumping In Funding for Hadoop and NoSQL 451 Research 400 350 Cloudera series D 10gen series D MapR series B DataStax series B Neo Technology series A Opera Solutions series A Platfora series A Couchbase series C 300 250 200 150 100 Cloudera series C Cloudera series B MapR series A 50 0 Jan-08 Nov-11  Market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015  NoSQL $2Bn PA by 2015  Most firms are taking a pragmatic approach  Big data is in the very early stages of maturity "The Big Data market is expanding rapidly … For technology buyers, opportunities exist to use Big Data technology to improve operational efficiency and to drive innovation. Use cases are already present across industries and geographic regions." Dan Vesset, Vice President, IDC  Best practices are not mature IDC Big Data Survey 7
  • 8. Data Growth Impact on Business Complexity “Big Data” refers to datasets whose size is beyond the ability of typical tools to capture, store, manage and analyze Speed Volume Business Velocity Information Becomes a Propellant to Business Inflection Point 2010 Data Becomes a Burden to IT Infrastructure 2020 8
  • 9. Why Should You Care? It’s the Value of Your Data  Top line revenue – Leverage their data assets into business advantage     5 Billion Records Anywhere, Anytime Faster time to market 50% Increase in Revenue     Over 1PB of data Growth of 175% YOY 90 days of data within 24 hours of a failure  Bottom Line savings – Lower the cost of compliance – Manage ever growing data efficiently 9
  • 10. NetApp Big Data
  • 11. Why NetApp? Practical solutions that solve today’s problems Get Control Break Through Gain Insight NetApp helps you turn your exploding data from threat to opportunity. Manage your data effectively and affordably. Break through the limits. With NetApp, you can take on even the most massive and complex data projects. Turn insight to action. NetApp helps you get to clarity and insight faster and more reliably. 11
  • 12. Experience Managing Data at Scale NetApp’s Largest Customer 100 PB 4 Customers 50 PB 10 Customers 20 PB 50 Customers 10 PB 100 Customers 12
  • 13. NetApp Big Data Strategy Open Best-of-Breed Choice  Best of breed storage for Big Data Applications  Create deep integration and value add  Build on open standards with best-in-class partnerships  Validate with Ecosystem Leaders – Complete server, network and storage “Racks” – Delivered via trusted high-value partners 13
  • 14. Industry-Leading Storage Innovation Corporate Data Centers Cloud Data Centers Flash Arrays for ultra-high performance E-Series Clustered Data ONTAP for Shared Infrastructure for price-performance at scale StorageGRID for web scale object storage 14
  • 15. Big Data Building Blocks Applications Big Bandwidth Big Analytics Ingest, Process, Stream Reduce, Analyze, Report Retain, Distribute Retain, Distribute Extract Big Content Retain forever, multi-site distribution Store Retrieve Cloud Private/Public 15
  • 16. 16
  • 17. Analytics Oriented Business Processing Business Applications Query-based Retrieval Commit Transaction Processing Transaction granular data resilience, recoverability & protection at line speeds Memory Ingest Disk/Flash Tier Performance optimized query service Realtime Analytics Federated Database Store (Build/Buy/Partner) Persisted Commit Data organization optimized by query interface RDBMS Columnar DB Document Store K-V Store General Purpose DB  Data organized to align with schemas  Fixed consistency model  Complex queries supported  Volume based data management Analytics Oriented  Data organized in column files  Tabular interface without rigid schemas  Fast column scans  Multiple consistency models  Transaction granular data management Transaction Oriented  Data organized in data structures in memory  Schemaless transaction store for structured data  High transactional performance Metadata Service Oriented  Data organized in key value pairs  Suitable for metadata services with CMS’  Associated with object services
  • 18. Analytics Technologies to look out for! Old World New World Graph DBs (Niche) Key-Value Stores (Content/Object Service) Row-oriented RDBMS’ Document Stores (Transaction Oriented) Columnar DBs (Analytics Oriented) Datacenter Multi - Datacenter Relational DBs • ACID constrained • Complete query set • Limited availability • High consistency • Rich query set • Good availability • Tuneable consistency • Limited query set • Highest/WAN availability
  • 19. Analytics & Enterprise Apps Environment Reporting/Dashboard/Visualization Applications OLAP Analytics ETL Data Management ETL OLAP OLTP Storage File Systems Mobile Devices Location/GPS Logs Sensors Applications Other Data Sources Content Repositories Shared Storage Infrastructure Storage Data Management NFS/sNFS/pNFS Storage (All other storage, i.e. internal DAS) NetApp Confidential – Limited Use 19
  • 20. Some problems require an Enterprise Class Hadoop solution Enterprise Class Hadoop Enterprise Class Hadoop Packaged ready-to-deploy modular compute intensive Hadoop cluster Compute Power  Compute intensive applications  Video, imaging analysis  Extremely tight Service Level expectations  Severe financial consequences if the data analytic application or service is run late Commodity, Off the Shelf Hadoop Values associated with early adopters of Hadoop     Social Media Space Contributors to Apache Strong bias to JBOD Skeptical of ALL vendors Packaged ready-to-deploy modular Hadoop cluster  The data has intrinsic value $$$  Capacity and compute requirements expanding very fast  Higher storage performance  Real human consequences if the system fails (Threats, treatments, financial losses)  System has to allow for asymmetric growth Enterprise Class Hadoop Packaged ready-to-deploy modular storage intensive Hadoop cluster  Storage intensive applications  Additional CPUs does not help run time  Financial ticker data analysis  Extremely tight Service Level expectations  Need deeper storage per datanode Storage Capacity NetApp Confidential – Limited Use 20
  • 21. NetApp Open Solution for Hadoop  Easy to Deploy, Manage and Scale  Uses High Performance storage HDFS NameNode FAS2040 Secondary NameNode – Resilient and Compact – RAID Protection of Data – Less Network Congestion  Raw Capacity and density Map Reduce JobTracker DataNodes / TaskTracker : – 120TB or 180TB in 4U – Fully serviceable storage system 4 separate shared nothing partitions E2660 DataNodes / TaskTracker  Reliability – Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure – Reliable metadata (Name Node) Enterprise Class Hadoop NetApp Confidential – Limited Use 21
  • 22. NetApp Open Solution for Hadoop Validated Benefits for the Enterprise  Improved cluster performance by 62%  Completed jobs 200% faster under drive failure  Delivered linear performance scalability as nodes, data grew  Per-server capacity increase of 1.5x The NetApp Open Solution for Hadoop improves capacity and performance efficiency and recoverability compared to a server-based DAS deployment. - ESG, 2012
  • 23. Optimizing Performance and Stay Healthy Source: Cisco: http://bit.ly/yL54Ts Availability and Resiliency Burst Handling and Queuing Oversubscription Ratio Network Overhead Data Node Network Speed Network Latency Useful Work Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report, May 2012, http://bit.ly/LyYG0t 23
  • 24. DAS vs. NetApp footprint DAS Option    2RU, CPU: 2x8 cores, RAM: 48GB, Disk: 24 TB 1 Rack(42RU): 20 servers (320 cores, 960GB, 480TB) 6 Racks: 1920 cores, 5.7TB RAM, 2.8 PB Storage (120 servers) NetApp Option    1RU, CPU: 2x8 cores, RAM: 48GB, Disk: 2 TB (8TB Max(Optional PIXI Boot Diskless) 1 Rack (42RU)  CPU and Memory: 24 servers(6:1), 384 cores, 1.152TB  Storage: 4 E2660 720TB 4 Racks: 1536 cores, 4.6TB, 2.8 PB (96 servers)
  • 25. Case Study: ASUP NetApp Analytics Data Mart Extract Transform Load Data Warehouse Data Mart Gateways ETL Data Warehouse • 800K ASUPs every week • 40% coming over the weekend • Data needs to be parsed and loaded in 15 minutes • Only 5% of data goes into the data warehouse, rest unstructured, yet it’s growing 7-10 TB per month • No easy way to access this unstructured content Reporting • Numerous mining requests are not satisfied currently • Huge untapped potential of valuable insight Finally, the incoming load doubles every 16 months! NetApp Proprietary - Limited Use Only 25
  • 26. Case Study: NetApp Large-Scale Analytics CHALLENGE NETAPP SOLUTION 4 weeks to run a query on 24 billion unstructured records Impossible to run a query: 240 billion unstructured records BENEFITS Time reduced from 4 weeks to 10.5 hours 10-node Hadoop Cluster Previously impossible, now achievable in just 18 hours NetApp Proprietary - Limited Use Only 26
  • 27. Integrated Big Data Solutions and Expertise  Planning and implementation expertise for Big Data  Turn-key solution stacks and Big Data services Big Data System Integrators Solutions Built on NetApp® 27
  • 28. Next Steps - Team with the Experts  Strategic Assessment – Business goals – Data growth needs – Use case discovery (partner delivery)  Consult – Solution architecture and design (NetApp delivery) Support options: Global support available from NetApp and partners  Deploy – Installation and implementation (NetApp delivery) – Solution implementation (partner delivery) 28
  • 29. NetApp Confidential - Internal Use Only