SlideShare a Scribd company logo
BDaaS Meetup
October 6, 2016
Tom Phelan, Co-founder and Chief Architect
tap@bluedata.com
DEPLOYING BIG-DATA-AS-A-SERVICE
IN THE ENTERPRISE
Outline
• Big Data: Conflicting Enterprise Needs
• Big-Data-as-a-Service (BDaaS)
• BDaaS Enterprise Requirements
• Design Decisions
• Implementation
• Demo
Conflicting Enterprise Needs
Data scientists want flexibility:
• Different versions (and new releases) of Hadoop, Spark, et.al.
• Different sets of BI / analytics tools
IT wants control:
• Multi-tenancy
• QoS, Data Access
• Security
• Network, Authorization/Authentication
Big Data New Realities
Big Data Traditional
Assumptions
Bare-metal
Data locality
HDFS on local disks
Big Data
New Realities
Containers and VMs
Compute and
storage separation
In-place access on
remote data stores
New Benefits
and Value
Big-Data-as-a-Service
Agility and
cost savings
Faster time-to-
insights
Big-Data-as-a-Service Defined
“BDaaS basically provides a cloud based framework that offers
end-to-end big data solutions to business organizations.”
On-Demand, Self-Service, Elastic
Big Data Infrastructure, Applications, Analytics
Source: http://www.marketsandmarkets.com/Market-Reports/big-data-as-a-service-market-4129107.html
• Core BDaaS
• Performance BDaaS
• Feature BDaaS
• Integrated BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
Four Types of BDaaS
Core BDaaS
• Minimal platform, such as Hadoop with YARN
Performance BDaaS
• “Downwards” vertical integration
• Includes optimized infrastructure
• Tight integration with Core BDaaS
Four Types of BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
Feature BDaaS
• “Upwards” vertical integration
• Include features beyond Hadoop
• Support for multiple Core BDaaS providers & BI
tools
Integrated BDaaS
• Full vertical integration and optimization
• Includes both Performance BDaaS & Feature BDaaS
Four Types of BDaaS
Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
BDaaS – Public Cloud or On-Prem?
BDaaS – Public Cloud or On-Prem
BDaaS On-Prem – Architectures
• Deployment Mechanisms
• Bare Metal
• Virtualization
• Virtual Machines
• Containers
Virtual Machines Containers
Source: www.docker.com/what-docker
Virtualization Tradeoffs
• Tradeoffs depend on virtualization technology
• Hypervisor (Virtual Machines)
• Performance: CPU tax
• Security: Strong isolation and fault containment
• Examples: VMware BDE, OpenStack Sahara
• Linux Containers
• Performance: No CPU tax
• Security: Isolation and fault containment still developing
• Example: BlueData EPIC, Mesos + Myriad
Containers = the Future of Big Data
BDaaS ENTERPRISE REQUIREMENTS
BDaaS – Enterprise Requirements
• Multi-tenancy
• Resource Allocation/Isolation
• No Noisy Neighbor
• Security
• Network
• Storage
• User authorization/authentication
• Support for your application
• Quickly add support for new apps, frameworks, & versions
• Cluster expansion and contraction
• Support for HA configurations
BDaaS – Enterprise Requirements
• Infrastructure & operational requirements
• Support for capacity expansion
• Support for software upgrade
• Integration with existing container orchestration
• Kubernetes, Mesos, Docker Swarm
• Integration with existing network configuration and policies
• IP allocation and use, routing, security, SDN (e.g. Cisco ACI, VMware NSX)
• Integration with user authentication systems
• LDAP/AD
BDaaS – Enterprise Requirements
• Infrastructure & operational requirements (cont’d)
• Integration with existing policies
• Supported versions of OS, containers, KVM, VMware, etc.
• Monitoring
• Limitations on root access
• High Availability
• Geographic replication
BDaaS – Enterprise Requirements
DESIGN DECISIONS
BDaaS: Design Decisions I
• Run Hadoop/Spark distros and applications
unmodified
– Deploy all services that run on a single BM host in a
single container
• Multi-tenancy support is key
– Network and storage security
• Clusters of containers span physical hosts
BDaaS: Design Decisions II
• Images built to “auto-configure” themselves at time
of instantiation
– Not all instances of a single image run the same set of
services when instantiated
• Master vs. worker cluster nodes
– Support “reboot” of cluster
BDaaS: Design Decisions III
• Maintain the promise of containers
– Keep them as stateless as possible
– Container storage is always ephemeral
– Persistent storage is external to the container
IMPLEMENTATION
Multi-Tenant Deployment
5.5 5.4 1.5 2.4 1.6
ComputeIsolation
ComputeIsolation
Team 1 Team 2 Team 3
ETL using Hadoop ETL using Spark Machine Learning
Team 1 Team 2 Team3
Multiple teams or business groups
Evaluate different Big Data
analytics use cases (e.g. ETL, M/L)
Use different services & tools
(e.g. Hive, Notebooks, SparkR)
Use different distributions of
Hadoop and/or Spark
BlueData EPIC software platform
Shared server infrastructure
Shared data sets
Multiple distributions, services, tools on shared, cost-effective infrastructure
Shared Data (HDFS)
Shared, Centrally Managed Server Infrastructure
How We Did It: Implementation I
Resource Utilization
•CPU cores vs. CPU shares
•Over-provisioning of CPU recommended
•No over-provisioning of memory
– Swap
Network
•Connect containers across hosts
•Persistence of IP address across container restart
•DHCP/DNS service required for IP allocation and hostname resolution
•Deploy VLANs and VxLAN tunnels for tenant-level traffic isolation
Noisy neighbors
Worker HostWorker HostWorker HostWorker Host Worker HostWorker Host
Network Architecture
IP1 IP2 IP3 IP4
External
Network
Cluster Provisioning
and Automation
(Embedded containers for
Hadoop/Spark/BI tool nodes)
Internal Networking
(BlueData-assigned IPs from
floating IP range)
Policy Engine
(Resource / placement)
BD IP4 BD IP5 BD IP6
BlueData EPIC
BD IP7
BD IP8 BD IP9 BD IP10 BD IP11
External
Switch/Gateway
External
Switch/Gateway
Tenant1
Tenant2
Tenant3
Internal GatewayInternal Gateway
BD IP1 BD IP2 BD IP3
Controller Host
How We Did It: Implementation II
Storage
• Expandable, unified / and /data storage
– By default, Docker provides 10 GB (fixed) plus
optional / data
• DataTap (version-independent, HDFS-compliant)
– Connectivity to external storage
Image Management
• Utilize Docker’s image repository
• Author new Docker images using Dockerfiles
– Inject parameters at runtime
TIP: Mounting block
devices into a container
does not support
symbolic links (IOW:
/dev/sdb will not
work, /dm/… PCI device
can change across host
reboot).
TIP: Mounting block
devices into a container
does not support
symbolic links (IOW:
/dev/sdb will not
work, /dm/… PCI device
can change across host
reboot).
TIP: Docker images can
get large. Use “docker
squash” to save on size.
TIP: Docker images can
get large. Use “docker
squash” to save on size.
How We Did It: Security Considerations
• Security is essential since containers and host share one kernel
– Non-privileged containers
• Achieved through layered set of capabilities
• Different capabilities provide different levels of isolation and protection
• Add “capabilities” to a container based on what operations are permitted
How We Did It: Sample Dockerfile
# Spark-1.5.2 docker image for RHEL/CentOS 6.x
FROM centos:centos6
# Download and extract spark
RUN mkdir /usr/lib/spark; curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.4.tgz | tar -xz -C /usr/lib/spark/
# Download and extract scala
RUN mkdir /usr/lib/scala; curl -s http://www.scala-lang.org/files/archive/scala-2.10.3.tgz | tar xz -C /usr/lib/scala/
# Install zeppelin
RUN mkdir /usr/lib/zeppelin; curl -s http://10.10.10.10:8080/build/thirdparty/zeppelin/zeppelin-0.6.0-incubating-SNAPSHOT-v2.tar.gz|tar xz
-C /usr/lib/zeppelin
RUN yum clean all && rm -rf /tmp/* /var/tmp/* /var/cache/yum/*
ADD configure_spark_services.sh /root/configure_spark_services.sh
RUN chmod -x /root/configure_spark_services.sh && /root/configure_spark_services.sh
A Word About Performance …
Performance Testing: Spark
•Spark 1.x on YARN
•HiBench - Terasort
– Data sizes: 100Gb, 500GB, 1TB
•10 node physical/virtual cluster
•36 cores and112GB memory per node
•2TB HDFS storage per node (SSDs)
•800GB ephemeral storage
Spark on Docker: Performance
MB/s
DEMO
NEW – BDaaS On-Prem and Cloud
BlueData on AWS public cloud
•Extending the user experience and
value of BlueData to public cloud
•Single pane of glass for on-prem and
off-prem Big Data workloads
•Initial AWS support; then MS Azure,
Google Cloud Platform, others
•Ask us about our directed availability
program for AWS
Q&A
www.bluedata.com
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise

More Related Content

What's hot

Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
Venkatesh Narayanan
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your Data
Monica Li
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
EDB
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
solarisyougood
 
Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014
Joelith
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needs
EDB
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
Cloudera, Inc.
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Isaac Christoffersen
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
Oracle
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
{code} by Dell EMC
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
Amr Awadallah
 
Auditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPASAuditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPAS
EDB
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Big Data Week
 
Data as a Service
Data as a Service Data as a Service
Data as a Service
Kyle Hailey
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Travis Wright
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
Mohamed Magdy
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your Data
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needs
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 
Auditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPASAuditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPAS
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
 
Data as a Service
Data as a Service Data as a Service
Data as a Service
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 

Viewers also liked

Pemanfaatan Big Data dengan Hadoop
Pemanfaatan Big Data dengan HadoopPemanfaatan Big Data dengan Hadoop
Pemanfaatan Big Data dengan Hadoop
helda_drmsyptr
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOpsNordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Travis Wright
 
Big data&DaaS
Big data&DaaSBig data&DaaS
Big data&DaaS
Constantin Giurea
 
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Denodo DataFest 2016: Enterprise View of Data with Semantic Data LayerDenodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Denodo
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...MongoDB
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
MongoDB
 
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
Hong-Linh Truong
 
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsTUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsHong-Linh Truong
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Invalidez en el Siglo XXI
Invalidez en el Siglo XXIInvalidez en el Siglo XXI
Invalidez en el Siglo XXI
Rodrigo Silva
 
Lenguaje como posibilidad de intercambio
Lenguaje como posibilidad de intercambioLenguaje como posibilidad de intercambio
Lenguaje como posibilidad de intercambio
Jorge Serbal
 
Diapositivas grupo tecnologico
Diapositivas grupo tecnologicoDiapositivas grupo tecnologico
Diapositivas grupo tecnologico
efigeniaa
 
Telexfree spagna
Telexfree spagnaTelexfree spagna
Telexfree spagna
TelexFREE Italia
 
Direito civil
Direito civilDireito civil
Direito civil
CDIM Daniel
 
2011 Pew Social Media and Life Report
2011 Pew Social Media and Life Report2011 Pew Social Media and Life Report
2011 Pew Social Media and Life ReportMatthew Rathbun
 
Pensar acciones o palabras
Pensar acciones o palabrasPensar acciones o palabras
Pensar acciones o palabras
correoparadescargararchivos30
 
Pensamiento critico para niños
Pensamiento critico para niñosPensamiento critico para niños
Pensamiento critico para niños
Marta Montoro
 

Viewers also liked (20)

Pemanfaatan Big Data dengan Hadoop
Pemanfaatan Big Data dengan HadoopPemanfaatan Big Data dengan Hadoop
Pemanfaatan Big Data dengan Hadoop
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Nordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOpsNordic infrastructure Conference 2017 - SQL Server in DevOps
Nordic infrastructure Conference 2017 - SQL Server in DevOps
 
Big data&DaaS
Big data&DaaSBig data&DaaS
Big data&DaaS
 
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Denodo DataFest 2016: Enterprise View of Data with Semantic Data LayerDenodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
 
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
TUW- 184.742 Data as a Service – Concepts, Design & Implementation, and Ecosy...
 
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data ConcernsTUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
TUW-ASE Summer 2015: Data as a Service - Models and Data Concerns
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Invalidez en el Siglo XXI
Invalidez en el Siglo XXIInvalidez en el Siglo XXI
Invalidez en el Siglo XXI
 
Netiqueta
NetiquetaNetiqueta
Netiqueta
 
Lenguaje como posibilidad de intercambio
Lenguaje como posibilidad de intercambioLenguaje como posibilidad de intercambio
Lenguaje como posibilidad de intercambio
 
Diapositivas grupo tecnologico
Diapositivas grupo tecnologicoDiapositivas grupo tecnologico
Diapositivas grupo tecnologico
 
Telexfree spagna
Telexfree spagnaTelexfree spagna
Telexfree spagna
 
Direito civil
Direito civilDireito civil
Direito civil
 
2011 Pew Social Media and Life Report
2011 Pew Social Media and Life Report2011 Pew Social Media and Life Report
2011 Pew Social Media and Life Report
 
Pensar acciones o palabras
Pensar acciones o palabrasPensar acciones o palabras
Pensar acciones o palabras
 
Pensamiento critico para niños
Pensamiento critico para niñosPensamiento critico para niños
Pensamiento critico para niños
 
La empresa
La empresaLa empresa
La empresa
 

Similar to Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
Kamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Red_Hat_Storage
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
BlueData, Inc.
 
OpenStack Online Meetup
OpenStack Online MeetupOpenStack Online Meetup
OpenStack Online Meetup
Tesora
 
What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?
OpenStack_Online
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Sri Ambati
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
Francisco González Jiménez
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
James Serra
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Community
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
Kellyn Pot'Vin-Gorman
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
DataWorks Summit/Hadoop Summit
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
Kellyn Pot'Vin-Gorman
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Ontico
 

Similar to Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise (20)

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
OpenStack Online Meetup
OpenStack Online MeetupOpenStack Online Meetup
OpenStack Online Meetup
 
What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. BDaaS Meetup October 6, 2016 Tom Phelan, Co-founder and Chief Architect tap@bluedata.com DEPLOYING BIG-DATA-AS-A-SERVICE IN THE ENTERPRISE
  • 6. Outline • Big Data: Conflicting Enterprise Needs • Big-Data-as-a-Service (BDaaS) • BDaaS Enterprise Requirements • Design Decisions • Implementation • Demo
  • 7. Conflicting Enterprise Needs Data scientists want flexibility: • Different versions (and new releases) of Hadoop, Spark, et.al. • Different sets of BI / analytics tools IT wants control: • Multi-tenancy • QoS, Data Access • Security • Network, Authorization/Authentication
  • 8. Big Data New Realities Big Data Traditional Assumptions Bare-metal Data locality HDFS on local disks Big Data New Realities Containers and VMs Compute and storage separation In-place access on remote data stores New Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to- insights
  • 9. Big-Data-as-a-Service Defined “BDaaS basically provides a cloud based framework that offers end-to-end big data solutions to business organizations.” On-Demand, Self-Service, Elastic Big Data Infrastructure, Applications, Analytics Source: http://www.marketsandmarkets.com/Market-Reports/big-data-as-a-service-market-4129107.html
  • 10. • Core BDaaS • Performance BDaaS • Feature BDaaS • Integrated BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification Four Types of BDaaS
  • 11. Core BDaaS • Minimal platform, such as Hadoop with YARN Performance BDaaS • “Downwards” vertical integration • Includes optimized infrastructure • Tight integration with Core BDaaS Four Types of BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  • 12. Feature BDaaS • “Upwards” vertical integration • Include features beyond Hadoop • Support for multiple Core BDaaS providers & BI tools Integrated BDaaS • Full vertical integration and optimization • Includes both Performance BDaaS & Feature BDaaS Four Types of BDaaS Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification
  • 13. BDaaS – Public Cloud or On-Prem?
  • 14. BDaaS – Public Cloud or On-Prem
  • 15. BDaaS On-Prem – Architectures • Deployment Mechanisms • Bare Metal • Virtualization • Virtual Machines • Containers Virtual Machines Containers Source: www.docker.com/what-docker
  • 16. Virtualization Tradeoffs • Tradeoffs depend on virtualization technology • Hypervisor (Virtual Machines) • Performance: CPU tax • Security: Strong isolation and fault containment • Examples: VMware BDE, OpenStack Sahara • Linux Containers • Performance: No CPU tax • Security: Isolation and fault containment still developing • Example: BlueData EPIC, Mesos + Myriad
  • 17. Containers = the Future of Big Data
  • 19. BDaaS – Enterprise Requirements • Multi-tenancy • Resource Allocation/Isolation • No Noisy Neighbor • Security • Network • Storage • User authorization/authentication
  • 20. • Support for your application • Quickly add support for new apps, frameworks, & versions • Cluster expansion and contraction • Support for HA configurations BDaaS – Enterprise Requirements
  • 21. • Infrastructure & operational requirements • Support for capacity expansion • Support for software upgrade • Integration with existing container orchestration • Kubernetes, Mesos, Docker Swarm • Integration with existing network configuration and policies • IP allocation and use, routing, security, SDN (e.g. Cisco ACI, VMware NSX) • Integration with user authentication systems • LDAP/AD BDaaS – Enterprise Requirements
  • 22. • Infrastructure & operational requirements (cont’d) • Integration with existing policies • Supported versions of OS, containers, KVM, VMware, etc. • Monitoring • Limitations on root access • High Availability • Geographic replication BDaaS – Enterprise Requirements
  • 24. BDaaS: Design Decisions I • Run Hadoop/Spark distros and applications unmodified – Deploy all services that run on a single BM host in a single container • Multi-tenancy support is key – Network and storage security • Clusters of containers span physical hosts
  • 25. BDaaS: Design Decisions II • Images built to “auto-configure” themselves at time of instantiation – Not all instances of a single image run the same set of services when instantiated • Master vs. worker cluster nodes – Support “reboot” of cluster
  • 26. BDaaS: Design Decisions III • Maintain the promise of containers – Keep them as stateless as possible – Container storage is always ephemeral – Persistent storage is external to the container
  • 28. Multi-Tenant Deployment 5.5 5.4 1.5 2.4 1.6 ComputeIsolation ComputeIsolation Team 1 Team 2 Team 3 ETL using Hadoop ETL using Spark Machine Learning Team 1 Team 2 Team3 Multiple teams or business groups Evaluate different Big Data analytics use cases (e.g. ETL, M/L) Use different services & tools (e.g. Hive, Notebooks, SparkR) Use different distributions of Hadoop and/or Spark BlueData EPIC software platform Shared server infrastructure Shared data sets Multiple distributions, services, tools on shared, cost-effective infrastructure Shared Data (HDFS) Shared, Centrally Managed Server Infrastructure
  • 29. How We Did It: Implementation I Resource Utilization •CPU cores vs. CPU shares •Over-provisioning of CPU recommended •No over-provisioning of memory – Swap Network •Connect containers across hosts •Persistence of IP address across container restart •DHCP/DNS service required for IP allocation and hostname resolution •Deploy VLANs and VxLAN tunnels for tenant-level traffic isolation Noisy neighbors
  • 30. Worker HostWorker HostWorker HostWorker Host Worker HostWorker Host Network Architecture IP1 IP2 IP3 IP4 External Network Cluster Provisioning and Automation (Embedded containers for Hadoop/Spark/BI tool nodes) Internal Networking (BlueData-assigned IPs from floating IP range) Policy Engine (Resource / placement) BD IP4 BD IP5 BD IP6 BlueData EPIC BD IP7 BD IP8 BD IP9 BD IP10 BD IP11 External Switch/Gateway External Switch/Gateway Tenant1 Tenant2 Tenant3 Internal GatewayInternal Gateway BD IP1 BD IP2 BD IP3 Controller Host
  • 31. How We Did It: Implementation II Storage • Expandable, unified / and /data storage – By default, Docker provides 10 GB (fixed) plus optional / data • DataTap (version-independent, HDFS-compliant) – Connectivity to external storage Image Management • Utilize Docker’s image repository • Author new Docker images using Dockerfiles – Inject parameters at runtime TIP: Mounting block devices into a container does not support symbolic links (IOW: /dev/sdb will not work, /dm/… PCI device can change across host reboot). TIP: Mounting block devices into a container does not support symbolic links (IOW: /dev/sdb will not work, /dm/… PCI device can change across host reboot). TIP: Docker images can get large. Use “docker squash” to save on size. TIP: Docker images can get large. Use “docker squash” to save on size.
  • 32. How We Did It: Security Considerations • Security is essential since containers and host share one kernel – Non-privileged containers • Achieved through layered set of capabilities • Different capabilities provide different levels of isolation and protection • Add “capabilities” to a container based on what operations are permitted
  • 33. How We Did It: Sample Dockerfile # Spark-1.5.2 docker image for RHEL/CentOS 6.x FROM centos:centos6 # Download and extract spark RUN mkdir /usr/lib/spark; curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.4.tgz | tar -xz -C /usr/lib/spark/ # Download and extract scala RUN mkdir /usr/lib/scala; curl -s http://www.scala-lang.org/files/archive/scala-2.10.3.tgz | tar xz -C /usr/lib/scala/ # Install zeppelin RUN mkdir /usr/lib/zeppelin; curl -s http://10.10.10.10:8080/build/thirdparty/zeppelin/zeppelin-0.6.0-incubating-SNAPSHOT-v2.tar.gz|tar xz -C /usr/lib/zeppelin RUN yum clean all && rm -rf /tmp/* /var/tmp/* /var/cache/yum/* ADD configure_spark_services.sh /root/configure_spark_services.sh RUN chmod -x /root/configure_spark_services.sh && /root/configure_spark_services.sh
  • 34. A Word About Performance … Performance Testing: Spark •Spark 1.x on YARN •HiBench - Terasort – Data sizes: 100Gb, 500GB, 1TB •10 node physical/virtual cluster •36 cores and112GB memory per node •2TB HDFS storage per node (SSDs) •800GB ephemeral storage
  • 35. Spark on Docker: Performance MB/s
  • 36. DEMO
  • 37. NEW – BDaaS On-Prem and Cloud BlueData on AWS public cloud •Extending the user experience and value of BlueData to public cloud •Single pane of glass for on-prem and off-prem Big Data workloads •Initial AWS support; then MS Azure, Google Cloud Platform, others •Ask us about our directed availability program for AWS