Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Time Has Come for Big-Data-as-a-Service

1,276 views

Published on

Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined. Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to Big Data initiatives – whether on-premises or in the public cloud.

This session at Hadoop Summit in San Jose, California (June 2016) discusses the emerging category of Big-Data-as-a-Service (BDaaS) - representing the intersection of Big Data and Cloud Computing.

In this session, Kris Applegate (Cloud and Big Data Solution Architect at Dell) and Thomas Phelan (Co-Founder and Chief Architect at BlueData) outlined the following:

- Innovations that paved the way for Big-Data-as-a-Service
- Definition and categories of Big-Data-as-a-Service
- Key considerations for Big-Data-as-a-Service in the enterprise, including public cloud or on-premises deployment options

A video replay can also be found here: https://youtu.be/_ucPoTKuj8Q

Published in: Software
  • Be the first to comment

The Time Has Come for Big-Data-as-a-Service

  1. 1. #HadoopSummit The Time Has Come for Big-Data-as-a-Service Kris Applegate – Cloud and Big Data Solution Architect, Dell Tom Phelan – Co-Founder and Chief Architect, BlueData
  2. 2. #HadoopSummit Agenda • A Brief History of Hadoop • Data Storage and Networking Evolution • The Virtualization Revolution • Rise of Big-Data-as-a-Service • Big-Data-as-a-Service (BDaaS) Defined • BDaaS – Public Cloud or On-Premises? • Q & A
  3. 3. #HadoopSummit A Brief History of Hadoop
  4. 4. #HadoopSummit In the Beginning (circa 2003) … • Networks were slow (1 Gigabit per second maximum) • Siloed storage was expensive (proprietary and often required special hardware) • Local HDDs were cheap and fast enough for big data needs Source: http://static.googleusercontent.com/media/researc
  5. 5. #HadoopSummit Bringing the Compute to the Data Compute Storage Co-Locate Compute & Storage Hadoop and HDFS are Born
  6. 6. #HadoopSummit Network Improvements
  7. 7. #HadoopSummit Data Compression Options in HDFS Source: www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom
  8. 8. #HadoopSummit Result: Is Disk-Locality Irrelevant? Source: https://amplab.cs.berkeley.edu/wp- content/uploads/2011/06/disk- irrelevant_hotos2011.pdf Less relevant may be more accurate •Faster data center networks •Distributed/non-distributed caching platforms • Example: Alluxio (Tachyon) •Compute and storage separation
  9. 9. #HadoopSummit • Virtualization / “cloud” technology is not absolutely required • But realistically … the flexibility and elasticity of BDaaS cannot be economically provided without these underlying technologies BDaaS and Cloud
  10. 10. #HadoopSummit The Virtualization Revolution VMware KVM Docker HyperV LXC
  11. 11. #HadoopSummit Virtualization enabled several key benefits including: •Automation, flexibility, elasticity • Cost reduction and consolidation • Higher utilization, less hardware overprovisioning •Multi-tenancy • Security • VxLAN • Fault isolation The Virtualization Revolution
  12. 12. #HadoopSummit But …. the overhead involved in the virtualization of storage and networking within a hypervisor make it difficult to meet the performance needs of Big Data workloads (SLAs, QoS) The Virtualization Revolution
  13. 13. #HadoopSummit • Linux Containers • OS virtualization reduces CPU, memory, network, and storage virtualization overhead • Docker file format makes containers easy to use and share The Virtualization Revolution
  14. 14. #HadoopSummit Rise of Big-Data-as-a-Service
  15. 15. #HadoopSummit Big Data New Realities Big Data Traditional Assumptions Bare-metal Disk-locality HDFS on local disks Big Data New Realities Containers Compute and storage separation In-place access on remote data stores New Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights
  16. 16. #HadoopSummit Journey to BDaaS 2003 Google paper 2012 Hadoop 1.0.2 Snappy Compression 2012 10 Gbit networking in data center 2008 Initial release of Linux containers 2002 Initial release of VMware ESX 2015 BlueData EPIC 2.0 with Docker 2016 BDaaS available on-prem or cloud 2004 Big Data era begins 2002 2016 2014 VxLANs available 2013 Dell Hadoop Performance Analysis 2011 Dell first to launch optimized Apache Hadoop solution 2007 Hadoop release 0.14.1 2009 Dell DCS delivers first Big Data server 2013 Initial release of Docker 2015 40 Gb networking in data center 2014 BlueData wins Strata + Hadoop World Showcase 2009 Amazon Launches EMR
  17. 17. #HadoopSummit BDaaS – The Time Has Come All the pieces are now available: •Fast network hardware and good data compression  Compute and storage separation  Low overhead virtualization (containers)  Ability to run network and storage-intensive workloads •No sacrifice in performance •Demand from end users for agility, flexibility, & speed
  18. 18. #HadoopSummit Big-Data-as-a-Service Defined “A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large information sets in order to gain a competitive advantage.” On-Demand, Self-Service, Elastic Big Data Infrastructure, Applications, Analytics Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  19. 19. #HadoopSummit • Core BDaaS • Performance BDaaS • Feature BDaaS • Integrated BDaaS Four Types of BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  20. 20. #HadoopSummit Core BDaaS • Minimal platform, such as Hadoop with YARN Performance BDaaS • “Downwards” vertical integration • Includes optimized infrastructure • Tight integration with Core BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification Four Types of BDaaS
  21. 21. #HadoopSummit Four Types of BDaaS Feature BDaaS • “Upwards” vertical integration • Include features beyond Hadoop • Support for multiple Core BDaaS providers Integrated BDaaS • Full vertical integration and optimization • Includes both Performance BDaaS & Feature BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  22. 22. #HadoopSummit BDaaS – Public Cloud or On-Prem?
  23. 23. #HadoopSummit Public Cloud • Low Capex, high Opex • “Infinite” expandability • Less secure? • Less control: software, SLAs, configs, etc On-Premises (Private Cloud) •High Capex, low Opex •Eventually reach resource limit •More secure? •More control: software, SLAs, configs, etc. BDaaS – Public Cloud or On-Prem
  24. 24. #HadoopSummit Challenge: Public cloud services can be proprietary Goal: Deliver API-compatible on-prem + public cloud • BDaaS layer (e.g. BlueData) • PaaS layer (e.g. Cloudforms, Cloud Foundry) • API-compatible private cloud (e.g. Microsoft Azure Pack/Stack, OpenStack, VMware) BDaaS – Workload Portability
  25. 25. #HadoopSummit • Workloads with a shorter life than 16 months* (e.g. Dev/Test) • When data is in the cloud too • Public-facing services Example Public Cloud Use Cases BDaaS – Public Cloud * www.dell.com/learn/us/en/555/business~solutions~whitepapers~en/documents~microsoft-private-cloud-tco-0914.pdf
  26. 26. #HadoopSummit Example On-Prem Use Cases • High performance clusters • Data security • Data compliance • Persistent clusters with > 16 month lifespan* • High capacity clusters • When SLAs are needed * The BlueData EPIC software platform addresses this potential limitation BDaaS – On-Premises / Private Cloud
  27. 27. #HadoopSummit • BDaaS software platform, using Docker containers • Self-service, on-demand Hadoop / Spark clusters • Bring your own application / distribution / version • Compute and storage separation  Scale resources independently  Clusters with < 16 month lifespan well supported (e.g. transient)  No HDFS data ingestion penalty • Secure multi-tenancy, Quality of Service (QoS) BlueData EPIC – Integrated BDaaS
  28. 28. #HadoopSummit Big Data On-Premises Traditional Big Data On-Prem IT ManufacturingSalesR&DServices < 30% Utilization Duplication of data Management complexity Weeks to build each cluster Complex, painful upgrades BlueData EPIC Software Platform ManufacturingSalesR&DServices BI/Analytics Tools > 90% Utilization BDaaS On-Prem with BlueData No Duplication of Data Simplified Management Multi-Tenant Simple, instant upgrades Self-service, on-demand clusters with BlueData
  29. 29. #HadoopSummit NEW – BDaaS On-Prem and Cloud • BlueData announced AWS and multi-cloud strategy  Extending the user experience and value of BlueData to public cloud  Single pane of glass for on-prem and off-prem Big Data workloads  Initial AWS support; then MS Azure, Google Cloud Platform, others • Support for data on-prem and compute in the cloud  Leverage cloud compute elasticity while keeping data on-premises  Eliminate challenge of data movement from on-prem to cloud
  30. 30. #HadoopSummit BlueData and Dell Partnership • Joint solution for Big-Data-as-a-Service • BlueData = Certified Dell Technology Partner • Installed, tested, validated on Dell hardware • Featured in Dell’s Global Customer Solution Centers
  31. 31. #HadoopSummit Kris Applegate kris_applegate@dell.com www.dell.com/bigdata Tom Phelan tap@bluedata.com www.bluedata.com Q & A

×