The document provides details of compatibility testing between BlueData EPIC software and EMC Isilon storage. It describes:
1) The testing environment including the BlueData, Cloudera, Hortonworks and EMC Isilon technologies and configurations used.
2) A series of validation tests conducted to demonstrate connectivity and functionality between the technologies using NFS and HDFS protocols.
3) Preliminary performance benchmarks conducted on standard hardware in the BlueData labs.
4) The process of installing and configuring BlueData EPIC software on controller and worker nodes, and EMC Isilon storage.
Compute intensive performance efficiency comparison: HP Moonshot with AMD APU...Principled Technologies
AMD’s accelerated processing units can be an enormous boon to those who perform compute intensive processing workloads, such as the 3D rendering workload we tested. In the Principled Technologies labs, an AMD-based HP Moonshot 1500 chassis with the ProLiant M700 server cartridge outperformed an Intel Xeon processor E5-2660 V2-based server —delivering 12.6 times the rendering performance of a single Intel server. It achieved this performance advantage while utilizing 10 percent less power than the more traditional server solution, and used just 4.3U of rack space instead of the 12U that 12 Intel servers would have used.
Improve Aerospike Database performance and predictability by leveraging Intel...Principled Technologies
In Dell EMC PowerEdge R740xd servers, enabling ADQ boosted Aerospike performance and led to near-linear scaling in both DRAM and Intel® Optane™ persistent memory configurations
In our tests, we found that the HP Z8 tower with Intel Xeon Gold 6226R processors completed three sample media and entertainment tasks in up to 44 percent less time than the Apple Mac Pro with Intel Xeon W-3275M processor, while adding only 11 percent to the purchase price.
Keep remote desktop power users productive with Dell EMC PowerEdge R840 serve...Principled Technologies
When the Dell EMC™ PowerEdge™ R840 launched, we found that companies could get more power for their CPU-intensive workloads with this 2U four-socket rack server.1 Now, it presents an opportunity for you to support more power users, speed desktop responsiveness, and grow your employee base.
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...Principled Technologies
Upgrading the hardware running your SQL Server to a space-efficient modular Dell EMC modern environment can help your company achieve a great deal of database work in a small amount of space. With the Dell Express Flash technology, adding a caching solution such as Samsung AutoCache can make the environment even more efficient.
In the PT labs, we ran a mixed database workload on six Dell EMC PowerEdge FC630 servers, powered by Intel Xeon E5-2667 processors, in three PowerEdge FX2 enclosures. The solution included the QLogic QLE2692 16Gb FC adapter with StorFusion Technology, Dell EMC Storage SC9000 all-flash storage, and Dell EMC PowerEdge Express Flash NVMe Performance PCIe SSDs.
With no caching solution, the 36 SQL Server 2016 VMs on the six servers achieved a total of 431,839 orders per minute while an Oracle workload ran on 12 VMs. When we added a caching solution to accelerate the SQL database volumes, the performance across the 36 SQL Server 2016 VMs doubled to 871,580. These numbers show the power of server-side caching to alleviate pressure on the storage array allowing you to get even more out of the Dell EMC modern environment.
Your datacenter is capable of doing great things—if you let it. Upgrades from Intel for compute, storage, and networking components can help your business support new services and expand your customer base. In our hands-on testing, we found that new Intel processors, high-bandwidth network components, and SATA or PCIe SSDs working together can boost your datacenter’s capabilities, which could translate to better business operations for your organization.
Consolidating Web servers with the Dell PowerEdge FX2 enclosure and PowerEdge...Principled Technologies
Consolidating Web servers to a new environment can save you a great deal on operating costs such as power and cooling, and the shared nature of converged infrastructure solutions can maximize these savings. In our tests, we found that the Dell PowerEdge FX2 enclosure with Intel Atom processor C2750-powered FM120 nodes provided better consolidation ratios and power efficiency than both the HP Moonshot 1500 shared infrastructure solution and the current-generation HP ProLiant DL320e Gen8 v2 rack server. The Dell PowerEdge FX2 could consolidate 12 legacy Web servers and deliver up to 6.7 times the power efficiency that legacy servers would use. It also delivered up to 110.1 percent more performance/watt compared to the current-generation Web server solutions we tested from HP.
As these results show, the Dell PowerEdge FX2 with FM120x4 microserver blocks could provide your organization with dramatic power savings through consolidation, all while providing the Web server performance you require.
Compute intensive performance efficiency comparison: HP Moonshot with AMD APU...Principled Technologies
AMD’s accelerated processing units can be an enormous boon to those who perform compute intensive processing workloads, such as the 3D rendering workload we tested. In the Principled Technologies labs, an AMD-based HP Moonshot 1500 chassis with the ProLiant M700 server cartridge outperformed an Intel Xeon processor E5-2660 V2-based server —delivering 12.6 times the rendering performance of a single Intel server. It achieved this performance advantage while utilizing 10 percent less power than the more traditional server solution, and used just 4.3U of rack space instead of the 12U that 12 Intel servers would have used.
Improve Aerospike Database performance and predictability by leveraging Intel...Principled Technologies
In Dell EMC PowerEdge R740xd servers, enabling ADQ boosted Aerospike performance and led to near-linear scaling in both DRAM and Intel® Optane™ persistent memory configurations
In our tests, we found that the HP Z8 tower with Intel Xeon Gold 6226R processors completed three sample media and entertainment tasks in up to 44 percent less time than the Apple Mac Pro with Intel Xeon W-3275M processor, while adding only 11 percent to the purchase price.
Keep remote desktop power users productive with Dell EMC PowerEdge R840 serve...Principled Technologies
When the Dell EMC™ PowerEdge™ R840 launched, we found that companies could get more power for their CPU-intensive workloads with this 2U four-socket rack server.1 Now, it presents an opportunity for you to support more power users, speed desktop responsiveness, and grow your employee base.
SQL Server 2016 database performance on the Dell EMC PowerEdge FC630 QLogic 1...Principled Technologies
Upgrading the hardware running your SQL Server to a space-efficient modular Dell EMC modern environment can help your company achieve a great deal of database work in a small amount of space. With the Dell Express Flash technology, adding a caching solution such as Samsung AutoCache can make the environment even more efficient.
In the PT labs, we ran a mixed database workload on six Dell EMC PowerEdge FC630 servers, powered by Intel Xeon E5-2667 processors, in three PowerEdge FX2 enclosures. The solution included the QLogic QLE2692 16Gb FC adapter with StorFusion Technology, Dell EMC Storage SC9000 all-flash storage, and Dell EMC PowerEdge Express Flash NVMe Performance PCIe SSDs.
With no caching solution, the 36 SQL Server 2016 VMs on the six servers achieved a total of 431,839 orders per minute while an Oracle workload ran on 12 VMs. When we added a caching solution to accelerate the SQL database volumes, the performance across the 36 SQL Server 2016 VMs doubled to 871,580. These numbers show the power of server-side caching to alleviate pressure on the storage array allowing you to get even more out of the Dell EMC modern environment.
Your datacenter is capable of doing great things—if you let it. Upgrades from Intel for compute, storage, and networking components can help your business support new services and expand your customer base. In our hands-on testing, we found that new Intel processors, high-bandwidth network components, and SATA or PCIe SSDs working together can boost your datacenter’s capabilities, which could translate to better business operations for your organization.
Consolidating Web servers with the Dell PowerEdge FX2 enclosure and PowerEdge...Principled Technologies
Consolidating Web servers to a new environment can save you a great deal on operating costs such as power and cooling, and the shared nature of converged infrastructure solutions can maximize these savings. In our tests, we found that the Dell PowerEdge FX2 enclosure with Intel Atom processor C2750-powered FM120 nodes provided better consolidation ratios and power efficiency than both the HP Moonshot 1500 shared infrastructure solution and the current-generation HP ProLiant DL320e Gen8 v2 rack server. The Dell PowerEdge FX2 could consolidate 12 legacy Web servers and deliver up to 6.7 times the power efficiency that legacy servers would use. It also delivered up to 110.1 percent more performance/watt compared to the current-generation Web server solutions we tested from HP.
As these results show, the Dell PowerEdge FX2 with FM120x4 microserver blocks could provide your organization with dramatic power savings through consolidation, all while providing the Web server performance you require.
This Enterprise Management Associates (EMA) research study illuminates the major challenges of deploying and managing software defined data centers (SDDC)-related technologies and processes.
These slides cover:
* The key components and business drivers of the SDDC
* The SDDC technologies and services your peers will invest in and which ones promise the highest ROI
* The key considerations when optimally placing new applications and the core risks
* The key challenges when creating new application environments
* How the Software Defined Storage (SDS) can enhance your data center
* The ‘net’ impact of Software Defined Networking (SDN) and network virtualization
* How the SDDC concept increase the ROI of private and public cloud
* The role of OpenStack within the SDDC and the key reasons for adopting OpenStack
* The role security plays within the SDDC
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...Principled Technologies
Moving compute-intensive, Hadoop big data workloads to current-generation Dell EMC PowerEdge R640 servers powered by 2nd Generation Intel Xeon Scalable processors could allow your organization to better meet the data analysis challenges of today. Faster analysis of large data sets means getting insight into your organization, products, and services sooner, which could help your organization grow and beat its competition.
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...Principled Technologies
If your company is running important business applications in VMware vSAN clusters of servers that are several years old, chances are good that you’re considering upgrading to newer hardware. Our testing demonstrated that our clusters of single-socket Dell EMC PowerEdge R7515 servers and clusters of dual-socket HPE ProLiant DL380 Gen10 servers could both improve upon the database performance of a legacy cluster with five-year-old servers by more than 50 percent, with the Dell EMC cluster achieving 93.4 percent of the performance of the HPE cluster.
Watch your transactional database performance climb with Intel Optane DC pers...Principled Technologies
Dell EMC PowerEdge R740xd servers with Intel Optane DC persistent memory handled more transactions per minute than configurations with NAND flash NVMe drives or SATA SSDs
By automating high-touch, routine tasks, Dell EMC OME integrations and plugins empower IT admins to deliver effective and efficient systems management from a single console.
Programmable I/O Controllers as Data Center Sensor NetworksEmulex Corporation
This is a presentation on 'Programmable I/O Controllers as Data Center Sensor Networks' as presented by Shaun Walsh and Sanjeev Datla at the 2011 Storage Developer's Conference in October 2011.
Get higher transaction throughput and better price/performance with an Amazon...Principled Technologies
In addition, the EBS gp3-backed EC2 r5b.16xlarge instance delivered a lower average transaction latency to offer more consistent transactional database performance than two Microsoft Azure E64ds_v4 VM configurations
VDI performance comparison: Dell PowerEdge FX2 and FC430 servers with VMware ...Principled Technologies
Replacing your legacy VDI servers with a new Intel Xeon processor E5-2650 v3-powered Dell PowerEdge FX2 solution using VMware Virtual SAN can be a great boon for your enterprise.
In the Principled Technologies (PT) labs, this space-efficient, affordable solution outperformed a legacy server and traditional SAN VSAN by offering 72 percent greater VDI users. Additionally, it achieved greater performance while using 91 percent less space and at a cost of only $176.52 per user.
By supporting more users, saving space, and its affordability, an upgrade to the Intel-powered Dell PowerEdge FX2 solution using VMware Virtual SAN can be a wise move when replacing your aging, older infrastructure.
Reduce complexity and save money with Dell Engineered Solutions for VMware EV...Principled Technologies
Companies like ManuCorp have seemingly contradictory goals for their virtualized infrastructure: They want a solution that eases the deployment and management burden for generalized IT staff while simultaneously saving money over the long term. According to our analysis, ManuCorp would do well to choose Dell Engineered Solutions for VMware EVO:RAIL, saving up to 63.9 percent in costs over three years compared to a solution with Cisco UCS blades and NetApp storage.
Less experienced administrators like ManuCorp already has in house would be able to plug in the Dell Engineered Solutions for VMware EVO:RAIL and use its single, easy-to-use interface to deploy end-to-end virtual infrastructure and complete updates without any additional training or instruction. The Cisco UCS and NetApp solution required extra tools and a wider skillset, which can mean adding a more experienced person and inviting the chance for human error. In addition, the hyper-converged Dell Engineered Solutions for VMware EVO:RAIL appliance reduced power consumption compared to the do-it-yourself environment with Cisco UCS blades and NetApp, which can contribute to big operating cost savings.
Prepare images for machine learning faster with servers powered by AMD EPYC 7...Principled Technologies
A server cluster with 3rd Gen AMD EPYC processors achieved higher throughput and took less time to prepare images for classification than a server cluster with 3rd Gen Intel Xeon Platinum 8380 processors
Software Defined Data Center: The Intersection of Networking and StorageEMC
There has been quite a bit of marketing rhetoric around Software Defined Data Center (SDDC) since VMware’s acquisition of Nicira. In this session we explore the components of a SDDC. Our specific focus is on the composition of a SDDC’s resource model: Compute, Networking, and Storage. The emphasis is on the disaggregated I/O for Network and Storage resources.
Objective 1: Describe the disaggregated I/O resource model employed to facilitate the use of virtualized Ethernet and Block devices in a Software Defined Data Center.
After this session you will be able to:
Objective 2: Explain how end-user driven provisioning of virtual Ethernet devices and Block devices serve to decouple resource use from infrastructure management.
Objective 3: Describe some of the opportunities and challenges associated with employing disaggregate I/O.
Category:Applications & Databases, Storage Automation & Management, Virtualization & Cloud Computing
This presentation has been well received the the SANS community and many information security teams I engage with.
It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause.
I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network.
Enjoy!
Boni Bruno, CISSP, CISM, CGEIT
www.bonibruno.com
This Enterprise Management Associates (EMA) research study illuminates the major challenges of deploying and managing software defined data centers (SDDC)-related technologies and processes.
These slides cover:
* The key components and business drivers of the SDDC
* The SDDC technologies and services your peers will invest in and which ones promise the highest ROI
* The key considerations when optimally placing new applications and the core risks
* The key challenges when creating new application environments
* How the Software Defined Storage (SDS) can enhance your data center
* The ‘net’ impact of Software Defined Networking (SDN) and network virtualization
* How the SDDC concept increase the ROI of private and public cloud
* The role of OpenStack within the SDDC and the key reasons for adopting OpenStack
* The role security plays within the SDDC
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...Principled Technologies
Moving compute-intensive, Hadoop big data workloads to current-generation Dell EMC PowerEdge R640 servers powered by 2nd Generation Intel Xeon Scalable processors could allow your organization to better meet the data analysis challenges of today. Faster analysis of large data sets means getting insight into your organization, products, and services sooner, which could help your organization grow and beat its competition.
A single-socket Dell EMC PowerEdge R7515 solution delivered better value on a...Principled Technologies
If your company is running important business applications in VMware vSAN clusters of servers that are several years old, chances are good that you’re considering upgrading to newer hardware. Our testing demonstrated that our clusters of single-socket Dell EMC PowerEdge R7515 servers and clusters of dual-socket HPE ProLiant DL380 Gen10 servers could both improve upon the database performance of a legacy cluster with five-year-old servers by more than 50 percent, with the Dell EMC cluster achieving 93.4 percent of the performance of the HPE cluster.
Watch your transactional database performance climb with Intel Optane DC pers...Principled Technologies
Dell EMC PowerEdge R740xd servers with Intel Optane DC persistent memory handled more transactions per minute than configurations with NAND flash NVMe drives or SATA SSDs
By automating high-touch, routine tasks, Dell EMC OME integrations and plugins empower IT admins to deliver effective and efficient systems management from a single console.
Programmable I/O Controllers as Data Center Sensor NetworksEmulex Corporation
This is a presentation on 'Programmable I/O Controllers as Data Center Sensor Networks' as presented by Shaun Walsh and Sanjeev Datla at the 2011 Storage Developer's Conference in October 2011.
Get higher transaction throughput and better price/performance with an Amazon...Principled Technologies
In addition, the EBS gp3-backed EC2 r5b.16xlarge instance delivered a lower average transaction latency to offer more consistent transactional database performance than two Microsoft Azure E64ds_v4 VM configurations
VDI performance comparison: Dell PowerEdge FX2 and FC430 servers with VMware ...Principled Technologies
Replacing your legacy VDI servers with a new Intel Xeon processor E5-2650 v3-powered Dell PowerEdge FX2 solution using VMware Virtual SAN can be a great boon for your enterprise.
In the Principled Technologies (PT) labs, this space-efficient, affordable solution outperformed a legacy server and traditional SAN VSAN by offering 72 percent greater VDI users. Additionally, it achieved greater performance while using 91 percent less space and at a cost of only $176.52 per user.
By supporting more users, saving space, and its affordability, an upgrade to the Intel-powered Dell PowerEdge FX2 solution using VMware Virtual SAN can be a wise move when replacing your aging, older infrastructure.
Reduce complexity and save money with Dell Engineered Solutions for VMware EV...Principled Technologies
Companies like ManuCorp have seemingly contradictory goals for their virtualized infrastructure: They want a solution that eases the deployment and management burden for generalized IT staff while simultaneously saving money over the long term. According to our analysis, ManuCorp would do well to choose Dell Engineered Solutions for VMware EVO:RAIL, saving up to 63.9 percent in costs over three years compared to a solution with Cisco UCS blades and NetApp storage.
Less experienced administrators like ManuCorp already has in house would be able to plug in the Dell Engineered Solutions for VMware EVO:RAIL and use its single, easy-to-use interface to deploy end-to-end virtual infrastructure and complete updates without any additional training or instruction. The Cisco UCS and NetApp solution required extra tools and a wider skillset, which can mean adding a more experienced person and inviting the chance for human error. In addition, the hyper-converged Dell Engineered Solutions for VMware EVO:RAIL appliance reduced power consumption compared to the do-it-yourself environment with Cisco UCS blades and NetApp, which can contribute to big operating cost savings.
Prepare images for machine learning faster with servers powered by AMD EPYC 7...Principled Technologies
A server cluster with 3rd Gen AMD EPYC processors achieved higher throughput and took less time to prepare images for classification than a server cluster with 3rd Gen Intel Xeon Platinum 8380 processors
Software Defined Data Center: The Intersection of Networking and StorageEMC
There has been quite a bit of marketing rhetoric around Software Defined Data Center (SDDC) since VMware’s acquisition of Nicira. In this session we explore the components of a SDDC. Our specific focus is on the composition of a SDDC’s resource model: Compute, Networking, and Storage. The emphasis is on the disaggregated I/O for Network and Storage resources.
Objective 1: Describe the disaggregated I/O resource model employed to facilitate the use of virtualized Ethernet and Block devices in a Software Defined Data Center.
After this session you will be able to:
Objective 2: Explain how end-user driven provisioning of virtual Ethernet devices and Block devices serve to decouple resource use from infrastructure management.
Objective 3: Describe some of the opportunities and challenges associated with employing disaggregate I/O.
Category:Applications & Databases, Storage Automation & Management, Virtualization & Cloud Computing
This presentation has been well received the the SANS community and many information security teams I engage with.
It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause.
I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network.
Enjoy!
Boni Bruno, CISSP, CISM, CGEIT
www.bonibruno.com
This presentation discusses the benefits of merging NPM & APM together to better assist problem response teams in troubleshooting network and application problems.
The presentation highlights a new product offering called NetPod which is a joint solution developed between Emulex and Dynatrace.
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
This presentation will investigate the advantages and disadvantage of running an n-node Hadoop cluster in each of the following environments:
• Bare metal
• Private cloud using a traditional virtual machine environment
• Private cloud using a containerized virtual environment
Tom will follow this with a discussion of real Hadoop Use Cases and how each would, or would not, be suitable for running in each environment.
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
The BlueData EPIC™ (Elastic Private Instant Clusters) software platform solves the infrastructure challenges and limitations that can slow down and stall Big Data deployments. With EPIC software, you can spin up Hadoop or Spark clusters – with the data and analytical tools that your data scientists need – in minutes rather than months. Leveraging the power of containers and the performance of bare-metal, EPIC delivers speed, agility, and cost-efficiency for Big Data infrastructure. It works with all of the major Apache Hadoop distributions as well as Apache Spark. It integrates with each of the leading analytical applications, so your data scientists can use the tools they prefer. You can run it with any shared storage environment, so you don’t have to move your
EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and highly efficient storage platform with native Hadoop integration that allows you to accelerate analytics, gain new flexibility, and avoid the costs of a separate Hadoop infrastructure. BlueData EPIC Software combined with EMC Isilon shared storage provides a comprehensive solution for compute + storage.
BlueData and Isilon share several joint customers and opportunities at leading financial services, advanced research laboratories, healthcare and media/communication organizations.
This paper describes the process of validating Hadoop applications running in virtual clusters on the EPIC platform with data stored on the EMC Isilon storage device using either NFS or HDFS data access protocols
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
BlueData is working in partnership with Splunk to streamline and accelerate the deployment and adoption of Hunk: Splunk Analytics for Hadoop. The BlueData EPIC software platform now integrates Hunk with Hadoop clusters running on virtualized on-premises infrastructure.
Using Hunk with the BlueData EPIC platform, our joint customers can quickly provision virtual Hadoop clusters together with Hunk in a matter of minutes – providing their data scientists and analysts with the ability to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop.
Learn more at http://www.bluedata.com
This presentation provides an overview of what’s new in the 2.0 release of the BlueData EPIC software platform.
BlueData’s EPIC software platform solves the infrastructure challenges and limitations that can slow down and stall on-premises Big Data deployments. With BlueData, you can spin up Hadoop or Spark clusters in minutes rather than months – with the data and analytical tools that your data scientists need.
The BlueData EPIC 2.0 release leverages Docker containers to simplify Big Data clusters, supports Apache Zeppelin notebooks and other new functionality for Apache Spark, and includes an enhanced App Store that provides one-click access to Big Data distributions and analytics tools.
Learn more about BlueData at http://www.bluedata.com
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
Adoption of Apache Spark in the enterprise is increasing rapidly - it's become one of the fastest growing and most popular technologies in the Big Data ecosystem.
However, implementing an enterprise-ready, on-premises Spark deployment can be very complex and it requires expertise that is generally not available to all.
BlueData makes it easier to deploy Apache Spark on-premises. With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, self-service, on-demand access to Big Data analytics and infrastructure. You can deploy Spark in standalone mode or with Hadoop / YARN. You can also build analytical pipelines and create Spark clusters using our RESTful APIs, and use web-based Zeppelin notebooks for interactive data analytics.
BlueData’s software platform leverages virtualization and Docker containers – combined with our own patent-pending innovations – to make it faster, and more cost-effective for enterprises to get up and running with a multi-tenant Spark deployment on-premises.
Learn more at www.bluedata.com
This session will examine the many options the data scientist has for running Spark clusters in public and private clouds. We will discuss various environments employing AWS, Mesos, containers, docker, and BlueData EPIC technologies and the benefits and challenges of each.
Speakers:
Tom Phelan, Co-founder and Chief Architect - BlueData Inc. Tom has spent the last 25 years as a senior architect, developer, and team lead in the computer software industry in Silicon Valley. Prior to co-founding BlueData, Tom spent 10 years at VMware as a senior architect and team lead in the core R&D Storage and Availability group. Most recently, Tom led one of the key projects – vFlash, focusing on integration of server-based Flash into the vSphere core hypervisor. Prior to VMware, Tom was part of the early team at Silicon Graphics that developed XFS, one of the most successful open source file systems. Earlier in his career, he was a key member of the Stratus team that ported the Unix operating system to their highly available computing platform. Tom received his Computer Science degree from the University of California, Berkeley.
This Big Data case study outlines the Hadoop infrastructure deployment for a Fortune 100 media and telecommunications company.
Hadoop adoption in this company had grown organically across multiple different teams, starting with “science projects” and lab initiatives that quickly grew and expanded. Going forward, some of the options they considered for their Big Data deployment included expanding their on-premises infrastructure and using a Hadoop-as-a-Service cloud offering.
Fortunately, they realized that there is a third option: providing the benefits of Hadoop-as-a-Service with on-premises infrastructure. They selected the BlueData EPIC software platform to virtualize their Hadoop infrastructure and provide on-demand access to virtual Hadoop clusters in a secure, multi-tenant model.
Learn more about this case study in the blog post at: http://www.bluedata.com/blog/2015/05/big-data-case-study-hadoop-infrastructure
Tracxn Research — Big Data Infrastructure Landscape, September 2016Tracxn
Following a rather muted 2015, the year 2016 witnessed an impressive bounce back by the big data sector, with a total funding of $1.1B secured in 37 rounds.
White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global ...EMC
This White Paper illustrates a synergistic model for deploying EMC's Greenplum Data Computing Appliance (DCA) with EMC IT's incumbent Global Data Warehouse infrastructure.
EMC IT's Journey to Cloud
PHASE 3: IT-AS-A-SERVICE
APPLICATIONS & CLOUD EXPERIENCE
See how EMC’s applications and cloud integration framework enable IT as a service.
Dell EMC Ready Solutions for Big Data are powered by the BlueData EPIC software platform - for on-demand provisioning and automation. These integrated solutions enable a cloud-like experience for Big-Data-as-a-Service (BDaaS) while ensuring the enterprise-grade security and performance of on-premises infrastructure.
With Dell EMC Ready Solutions for Big Data, customers can rapidly deploy their analytics and machine learning workloads in a secure multi-tenant architecture, for multiple different user groups running on shared infrastructure. Their users can quickly and easily provision distributed environments for Cloudera, Hortonworks, Kafka, MapR, Spark, TensorFlow, as well as other tools.
The new Ready Solutions include everything that customers need to enable BDaaS on-premises – including BlueData EPIC software as well as Dell EMC hardware, consulting, deployment, and support services.
To learn more, visit www.dellemc.com/bdaas
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...EMC
This white paper describes a software plug-in that can be used with the Nagios Core infrastructure monitoring framework to display the state of a wide range of components in the EMC Greenplum Data Computing Appliance (DCA).
Value Journal, a monthly news journal from Redington Value Distribution, intends to update the channel on the latest vendor news and Redington Value’s Channel Initiatives.
Key stories from the May Edition:
• Dell EMC takes open networking to the edge for next-generation access.
• Microsoft announces new intelligent security innovations
• Powering partners’ transformation - Pradeep Kumar, Principal Consultant, Security, Redington Value.
• Redington Value signs new vendor partnerships.
• Fortinet delivers integrated NOC-SOC solution.
• HPE to acquire Cape Networks.
• Malwarebytes introduces endpoint protection and response solution.
• Mimecast offers cyber resilience for email with new capabilities.
• SonicWall announces Capture Cloud Platform.
• Veritas revamps channel program to drive partner growth.
• Red Hat announces availability of Red Hat Storage One.
• AWS announces Amazon S3 One Zone-Infrequent Access(Z-IA).
Software Defined Everything infrastructure that virtualizes compute, network, and storage resources and delivers it as a service. Rather than by the hardware components of the infrastructure, the management and control of the compute, network, and storage infrastructure are automated by intelligent software that is running on the Lenovo x86 platform.
White Paper: Hadoop on EMC Isilon Scale-out NAS EMC
This White Paper details how EMC Isilon can be used to support an enterprise Hadoop data analytics workflow. Core architectural components are covered as well as how an enterprise can gain reliable business insight quickly and efficiently while maintaining simplicity to meet the storage requirements of an evolving Big Data analytics workflow.
This white paper provides an overview of EMC's data protection solutions for the data lake - an active repository to manage varied and complex Big Data workloads
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
This white paper describes how BlueData enables virtualization of Hadoop and Spark workloads running on Intel architecture.
Even as virtualization has spread throughout the data center, Apache Hadoop continues to be deployed almost exclusively on bare-metal physical servers. Processing overhead and I/O latency typically associated with virtualization have prevented big data architects from virtualizing Hadoop implementations.
As a result, most Hadoop initiatives have been limited in terms of agility, with infrastructure changes such as provisioning a new server for Hadoop often taking weeks or even months. This infrastructure complexity continues to slow down adoption in enterprise deployments. Apache Spark is a relatively new big data technology, but interest is growing rapidly; many of these same deployment challenges apply to on-premises Spark implementations.
The BlueData EPIC software platform addresses these limitations, enabling data center operators to accelerate Hadoop and Spark implementations on Intel architecture-based servers.
For more information, visit intel.com/bigdata and bluedata.com
Mis on andmekeskuse uus standard - hüperkonvergents?
Kui kõiki kesksüsteeme ei ole võimalik pilve viia ja serverikeskuse kasv suurendab halduse keerukust, on väljapääs serverikeskuse konvergents. Simplivity Omnicube on konvergentsi uus tase. Millised on serverikeskuse kasvuga seotud põhiprobleemid ja kuidas neid lahendada? Kuidas korraldada Disaster Recovery ja Backup?
EMA: Ten Priorities for Hybrid Cloud, Containers and DevOps in 2017 DevOps.com
EMA recently named Electric Cloud ElectricFlow as a “Top 3” for DevOps Automation in their research report entitled, “Ten Priorities for Hybrid Cloud, Containers and DevOps in 2017.”
Join Torsten Volk, Managing Research Director at Enterprise Management Associates (EMA), and Anand Ahire, GM of DevOps Release Automation at Electric Cloud, as they review the report findings and share tips for overcoming the challenges of DevOps and Continuous Delivery at scale.
During the webinar, they will discuss:
Automation, Abstraction and Visibility as key enablers to adopt DevOps
Concrete EMA recommendations for effectively implementing these priorities
How ElectricFlow helps increase cross-cloud business agility
A short solution demo will be provided. Attendees will receive a complimentary copy of EMA’s “Decision Guide: Ten Priorities for Hybrid Cloud, Containers and DevOps in 2017.”
1. Technical Guide – ISV Partner Validation
TECHNICAL VALIDATION OF
BLUEDATA EPIC WITH EMC ISILON
ETD Solution Architecture Compatibility and
Performance Testing Validation Brief
Boni Bruno, CISSP, CISM, CGEIT
Chief Solutions Architect
ABSTRACT
This document captures details on the technologies, results, and environment used to
perform various functionality tests to demonstrate compatibility between BlueData
EPIC Software and EMC technologies described herein.
November 3 2015
Revision 1.3
5. EMC Corporatoin 5
INTRODUCTION
PURPOSE
The intent of this document is to capture details pertaining to the technologies and environment used to confirm compatibility
between BlueData EPIC Software and EMC Isilon storage system
BLUEDATA AND ISILON JOINT VALUE PROPOSITION
The BlueData EPIC™ (Elastic Private Instant Clusters) software platform solves the infrastructure challenges and limitations that can
slow down and stall Big Data deployments. With EPIC software, you can spin up Hadoop or Spark clusters – with the data and
analytical tools that your data scientists need – in minutes rather than months. Leveraging the power of containers and the
performance of bare-metal, EPIC delivers speed, agility, and cost-efficiency for Big Data infrastructure. It works with all of the major
Apache Hadoop distributions as well as Apache Spark. It integrates with each of the leading analytical applications, so your data
scientists can use the tools they prefer. You can run it with any shared storage environment, so you don’t have to move your data.
And it delivers the enterprise-grade security and governance that your IT teams require. With the BlueData EPIC software platform,
you can provide Hadoop-as-a-Service or Spark-as-a-Service in an on-premises deployment model.
EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and highly efficient storage platform with native
Hadoop integration that allows you to accelerate analytics, gain new flexibility, and avoid the costs of a separate Hadoop
infrastructure. BlueData EPIC Software combined with EMC Isilon shared storage provides a comprehensive solution for compute +
storage.
BlueData and Isilon share several joint customers and opportunities at leading financial services, advanced research laboratories,
healthcare and media/communication organizations. As such, the following diagram describes the key value propositions:
This paper describes the process of validating Hadoop applications running in virtual clusters on the EPIC platform with data stored
on the EMC Isilon storage device using either NFS or HDFS data access protocols.
TECHNOLOGY OVERVIEW
The BlueData EPIC platform is an infrastructure software platform, purpose built for to virtualize/containerize Big Data/Hadoop
applications to increase agility, significantly reduce infrastructure cost by increasing utilization while maintaining security and control.
As such, BlueData software enables Big Data-as-a-service on-premises.
BlueData + Isilon è Lowest TCO + Max Agility
HDFS DAS Isilon
Physical Hadoop Compute
Infrastructure
(e.g. MR, Hive, Impala etc)
BlueData
Containerized
Infrastructure
NFSHDFSHDFS
Single Distribution & Version; No Elasticity Multiple Distribution & Versions; Elasticity
$$ compute savings
$$ storage savings
BlueData DataTap (dtap://….)
Federate across
DAS & Isilon for
frictionless
transition to Isilon
ü Bring elastic Hadoop and Spark compute to Isilon with NFS/HDFS DataTap
ü Frictionless transition from DAS to Isilon environments with single namespace
ü Future proof against rapidly evolving ecosystem – e.g. Cloudera Navigator, Ranger
6. EMC Corporation 6
The point of interconnection with EMC Isilon storage is the EPIC DataTap feature. DataTap surfaces an HDFS abstraction layer so that
any Big Data application can be run unmodified, but at the same time implements a high-performance connection to remote data
storage systems via NFS, HDFS, and object RESTful protocols. The result allows unmodified Hadoop applications to run against data
stored in remote NFS, HDFS, and object storage systems without loss of performance.
7. EMC Corporation 7
RELEVANT PARTNER FEATURES
Table 1 List of Partner technology features relevant to the validation consideration
DESCRIPTION DETAIL
BlueData EPIC 2.0 Software Infrastructure software that supports virtual ‘compute’ clusters of CDH, HDP and Spark
stand-alone.
CDH clusters can be configured with Cloudera Manager (and Cloudera Navigator) while HDP
clusters can be configured with Ambari (and other services enabled by Ambari)
Another key feature is DataTap that provides a HDFS protocol URI dtap:// that allows virtual
‘compute’ clusters running on BlueData to leverage external storage systems such as Isilon
via HDFS or NFS.
Note: CDH and HDP clusters are **NOT** configured with Isilon directly. These clusters
connect to Isilon via BlueData DataTap functionality
RELEVANT EMC FEATURES
Table 2 List of EMC technology features relevant to the validation consideration
DESCRIPTION DETAIL
EMC Isilon HDFS Module, NFS
VALIDATION SCOPE
PARTNER TECHNOLOGIES
Table 3 List of Partner technologies used in the validation process
TECHNOLOGY - MODEL TECHNOLOGY VERSION TECHNOLOGY INSTANCE (PHYSICAL/VIRTUAL)
BlueData EPIC EPIC 2.0 Controller Node: Physical
Worker Node: Physical
Cloudera virtual Hadoop cluster CDH 5.4.3 with Cloudera
Manager Enterprise
1 Master Node: Virtual (Docker Container)
3 Worker Node: Virtual (Docker Container
Hortonworks virtual Hadoop cluster HDP 2.3 with Ambari 2.1 1 Master Node: Virtual (Docker Container)
3 Worker Node: Virtual (Docker Container)
EMC TECHNOLOGIES
Table 4 List of EMC technologies used in the validation process
TECHNOLOGY - MODEL TECHNOLOGY VERSION TECHNOLOGY INSTANCE (PHYSICAL/VIRTUAL)
Isilon X210 Isilon OneFS v7.2.1.0
B_7_2_1_014(RELEASE)
Physical (8 node system)
Data Access Methods
Table 5 List of data access methods and technologies used in the validation process
DATA ACCESS METHOD DATA ACCESS STRUCTURE DETAILS
Isilon HDFS API Over the wire HDFS protocol implemented in Isilon
TESTING ENVIRONMENT
ARCHITECTURE ILLUSTRATION
8. EMC Corporation 8
Figure 1. Illustration of logical layout of Partner and EMC technologies used for validation
RESOURCES
Table 6 List of resources and technologies used to support the validation process
DESCRIPTION VERSION DETAILS & SPECIFICATIONS
HARDWARE
(6) BlueData Workers SuperMicro model: X9DRFR
6 workers: SuperMicro, 10G NIC, 64GB RAM, 4 cores, 8 cpu, 5 1TB drives
1 controller: Same as the worker, but with 128GB ram
CPU: Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
All boxes have the same 5 drives:
1TB Seagate SSDH hybrid drives (ssd + disk) sn: Z1DG4XE
(1) BlueData Controller SuperMicro model: X9DRFR
6 workers: SuperMicro, 10G NIC, 64GB RAM, 4 cores, 8 cpu, 5 1TB drives
1 controller: Same as the worker, but with 128GB ram
CPU: Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
All boxes have the same 5 drives:
1TB Seagate SSDH hybrid drives (ssd + disk) sn: Z1DG4XE
(8) Isilon X210 8 node system
CDH Virtual Cluster Functional testing and validations
(1) Master Node Master: XXL - 8 VCPU, 24576 MB RAM
(6) Slave Nodes Worker: Extra Large - 4 VCPU, 12288 MB RAM
9. EMC Corporation 9
BLUEDATA EPIC DOCUMENTATION
1. About BlueData EPIC
2. Installation Guide
3. User-Administrator Guide
BLUEDATA INSTALLATION AND CONFIGURATION
Note: There are various pre-requisites for BlueData EPIC Software. These are related to Hosts, Operating System, Networking and
High Availability. Details can be found here
Phase One: Installing the BlueData EPIC Controller
To install EPIC from the command line:
1. Install Red Hat Enterprise Linux 6.5 or CentOS 6.5 on the hosts that you will use for the Controller and Worker nodes.
2. Log into the host that you will be using as the Controller node using the root account and password.
3. Download the EPIC Enterprise binary (.bin) from BlueData Software, Inc. to the host that you will use as the Controller
node. The size of the download will depend on the distribution(s) included and the flavor of the .bin file.
4. Make the .bin file executable by executing the command
5. chmod a+x bluedata-epic-<type>-<os>-<package>-release-<version>-<build>.bin
Where:
- <type> can be either entdoc (for EPIC) or onedoc (for EPIC Lite)
- <os> is the operating system supported by this .bin file. This can be either centos (for CentOS) or rhel
(for Red Hat Enterprise Linux).
- <package> is the EPIC package type. This will be either eval (EPIC Lite), full (CentOS with all App Store
images and OS packages included), or minimalplus (RHEL with all App Store images included).
- <version> is the EPIC version being downloaded.
- <build> is the specific EPIC build number being downloaded. 5. Run the executable binary from the
Linux console as the root user by typing ./<epic>.bin --floating-ip <address/range>, where:
- <epic> is the full name of the .bin file (see Step 4, above).
- <address/range> is the CIDR to use, such as 192.168.25.10/24. This range of addresses allows
network access from outside the EPIC platform to the virtual nodes that EPIC will create as part of future clusters.
The installer checks the integrity of the EPIC bundle and then extracts the bundle contents and performs some initial checks.
The installer checks to see which network interface card(s) (NIC) have Internet connectivity.
- If multiple NICs can connect to the Internet, then a prompt appears asking you to select the NIC to use.
- If only one NIC can connect to the Internet, then the installer bypasses this step.
- If no NICs can connect to the Internet, the installer asks for the IP address of the proxy server being used for your network. If
prompted, enter either the number corresponding to the NIC to use to connect to the Internet or the IP address of the proxy server,
and then press [ENTER].
The End User License Agreement (EULA) appears. Read through the EULA, pressing [SPACE] to page through the content. Once you
have viewed the entire EULA, press y to accept it and continue installing EPIC.
10. EMC Corporation 10
EPIC installs on the Controller node. A series of messages appear during the installation. The following message appears once the
installation is complete:
Figure 2.1: Installation complete
This concludes the first phase of the EPIC installation. Note the URL provided, as you will use this to continue configuring EPIC.
Phase Two: Completing the Installation
The next step of the EPIC installation process is to configure EPIC using a Web browser to access the application. To do this:
1. Open a Web browser and navigate to the URL provided at the end of the command line installation process.
The EPIC Enterprise - Setup screen appears.
3. The read-only Public network interface field displays the NIC that you selected for Internet access during the command line
installation. Each host in the EPIC platform must use the same NIC to access the Internet. For example, if you selected the eth0 NIC
on the Controller node, then the eth0 NIC on each Worker node must also be able to reach the Internet.
4. Use the System Storage pull-down menu to select the type of system storage to use for the EPIC platform. The available options
are:
If the nodes each have a second or third hard drive as described in the About EPIC Guide and you want to create local
system storage using HDFS with Kerberos protections, then select Create HDFS from local disks for system storage.
. To use an existing external HDFS file system as system storage, select Use existing HDFS for System Storage and then
enter the parameters for “HDFS”
. To use an existing external NFS file system as system storage, select Use existing NFS for System Storage and then enter
the parameters described for “NFS”
Note: Configuring system storage does not preclude the ability for clusters running on BlueData to reach out to other storage
systems using DataTap. The system storage is just a default scratch space for each tenant and is referred to as Tenant Storage.
If you are creating local HDFS system storage, then select one or more hard drive(s) to use for this storage in the Select one or
more available disk(s) for HDFS field.
Click Submit to finish installing EPIC on the Controller node. EPIC displays a popup indicating that the installation process has
11. EMC Corporation 11
started successfully. Figure 2.3: Installation Started popup
This popup is soon replaced by a status summary as the installation completes. If you like, you may click the green Details button to
open a popup that displays additional information about the installation. Please allow about 20 minutes for this process to complete
(actual time will vary depending on various factors).
Figure 2.4: Completing the EPIC installation
The Bluedata software setup completed successfully popup appears when the installation process is completed. Click the Close
button to exit to the EPIC Login screen,
Figure 2.5: Installation completed
Proceed to the next section to begin adding Worker nodes.
Phase Three: Adding Worker Nodes
Once you have finished installing EPIC on the Controller node, the final step is to add the Worker nodes to the EPIC platform. As
such, workers provide additional CPU and memory so that you can create more virtual nodes. Each Worker node or host must
conform to the system requirements listed in the About EPIC Guide and any applicable Deployment Guide.
Note: This section provides a high-level overview intended to help you get up and running with EPIC as quickly as pos- sible. Please
see the User/Admin Guide for information about this screen and any applicable Deployment Guide for further directions.
12. EMC Corporation 12
To add Worker nodes:
1. Access the EPIC Login screen by opening a Web browser and navigating to the Controller IP address.
2. Enter your username and password in the appropriate fields and then click the red Login button.
3. In the main menu, select Installation. The Cluster Installation screen appears.
4. Enter the IP address(es) of the Worker node(s) that you wish to add to the EPIC platform in the Worker IP field. You may enter IP
addresses as follows, being sure not to add any spaces:
- Single IP address: Enter a properly formatted IP address, such as 10.10.1.1.
- Multiple IP addresses: Enter the first three octets of the IP addresses, and then separate each digit of the fourth octet
with a commas, such as 10.10.1.1,2,5,8. In this example, EPIC will install four Worker nodes with IP addresses of 10.10.1.1,
10.10.1.2, 10.10.1.5, and 10.10.1.8.
- Multiple IP addresses: Enter multiple IP addresses separated by commas, such as
13. EMC Corporation 13
10.10.1.1,10.10.1.2,10.10.1.5,10.10.1.8. In this example, EPIC will install four Worker nodes with the same IP addresses as the
previous example.
- IP address range: Enter an IP address range, such as 10.10.1.1-8. In this example, EPIC will install eight Worker nodes
with IP addresses from 10.10.1.1 to 10.10.1.8.
- Combination: Use a combination of the above methods, such as 10.10.1.1,10.10.1.2,5,8,10.10.1.9-12.
5. Select how to access the Worker node(s). Your available options are:
cCicking the Private Key field to open a standard File Upload dialog that allows you to browse for and select the key file. If the key
requires a pass phrase, enter that phrase in the Passphrase field. The uploaded private key will only be used for initial access to the
worker nodes and will not be permanently stored.
Root access: Check the Root Access radio button and then enter the root password for the Worker node(s) you are installing in the
Root Password field..
Figure 2.8: Root Access information
SSH Key: If the Worker node(s) have a public key installed to allow password-free access, then you may check the SSH Key based
Access radio button. Upload the private key by
Figure 2.9: SSH Key information
6. Click the blue Add Workers button to install the selected Worker node(s).
EPIC will prepare the selected Worker node(s). A series of colored indicator bars in the Worker(s) Status table will display the
installation progress for each Worker node, as follows:
Figure 2.10: Worker node installation progress
14. EMC Corporation 14
-
Once you have finished adding Worker node(s), you may view the status of each node by clicking Dashboard in the main menu to
open the Dashboard screen, and then selecting the Services tab, which presents detailed status information for each Worker node.
Note: You may only perform one set of Worker node instal- lations to one or more node(s) at once. To save time, con- sider adding
all of the Worker nodes at once by entering multiple IP addresses as described above.
Note: Verify that the Worker node(s) have finished rebooting before attempting to create a virtual cluster. Check the Ser- vice Status
tab of the Dashboard screen to ensure that all services are green before proceeding.
Figure 2.15: Site Admin Dashboard screen - Service Status tab
Once the installation of the workers is complete, you can login to BlueData UI, change to Demo Tenant and create Hadoop or Spark
clusters using this simple one page form. BlueData automatically provisions the virtual nodes (Docker containers); does the
networking and isolation; deploys the software for the relevant distribution and configures the applications (YARN, M/R, Hive etc).
15. EMC Corporation 15
EMC ISILON INSTALLATION AND CONFIGURATION
Install
The physical assembly and configuration of the Isilon cluster was performed without incident using some basic information provided
by email from EMC.
Screenshots of EMC Isilon Set Up
16. EMC Corporation 16
HDFS with Kerberos Authorization
The creation of the necessary Kerberos Authentication Provider, SPNs, and proxyusers on the Isilon was performed without incident
after following the instructions in the OneFS-7.2.0-CLI-Administration-Guide and the Configuring Kerberos with HWX and Isilon white
paper.
19. EMC Corporation 19
Isilon HDFS Access Zone configuration
Isilon HDFS Proxyusers Configuration
NFS with Simple Authorization
20. EMC Corporation 20
The creation of an NFS export and associating it with the System Access Zone was performed without incident. The only issue
encountered during configuration was that the default setting of the NFS export in Isilon which mapped the “root” user to “nobody”
had to be changed in order to permit the “root” user specified by the DataTap to write to the NFS share.
Screenshot of EPIC DataTap Screen (NFS DataTap)
HDFS with Simple Authorization
The creation of the Local Authentication Provider with “bluedata”, “hdfs”, and “root” members and configuring it with the System
Access Zone was performed without incident.
During basic connectivity testing between DataTap and Isilon using the HDFS protocol, an issue with the DataTap issuing of
commands to the DataNode was uncovered. Commands issued by DataTap to the DataNode were returned with a “Lease expired”
error. The root issue was that for the Isilon HDFS implementation the contents of the “clientname” fields had to be kept consistent
across NameNode and DataNode request packets. This restriction was not observed when running DataTap against a native Apache
HDFS instance. The EPIC DataTap code was changed to maintain this clientname consistency.
Screenshot of EPIC DataTap Screen (HDFS Non-Kerberized DataTap)
21. EMC Corporation 21
Installation of BlueData EPIC was performed using the standard installation procedure. There were no Isilon specific dependencies.
SUPPORTING INFORMATION AND CONSIDERATIONS
The Table below contains pertinent information relating to supporting the testing environment.
Table 7 List of supporting information and considerations relevant to the validation process
DESCRIPTION DETAIL
Isilon Kerberos Setup
BlueData Installation Guide This is an EMC-internal application for automating the execution of benchmarks and
collecting performance metrics.
VALIDATION TESTS
COMPATIBILITY TESTING
TEST # 1: Connect to Kerberized Isilon HDFS via BlueData DataTap object.
a. Test Goal: Ability to connect to remote Kerberized Isilon HDFS
b. Pass Criteria:
i. Browse the directories and files in the Kerberized HDFS Access Zone with trusted user
ii. Permissions denied message if un-authorized users attempt access to Kerberized Isilon HDFS
22. EMC Corporation 22
c. Test Preparation:
1. Set up standalone KDC
2. Configure HDFS Access Zone with KDC
3. Generate keytab file for a specific principal (hdfs)
d. Test Plan:
1. Login to BlueData tenant
2. Create new DataTap of type = HDFS
3. Specify Kerberos credentials including keytab file for the relevant user (see screenshot above in
installation and configuration section)
4. Click submit
5. Click on the DataTap name to browse the data in Isilon – should match all the directories/folders
as seen in Isilon file system browser
e. Result: PASS (see screenshots below)
23. EMC Corporation 23
Isilon log showing non-proxyuser being denied DataTap Access
EPIC Running a CDH 5.4.3 virtual cluster returns an error when un-authorized user “llma” attempts to access an Isilon HDFS file
system protected by Kerberos authorization via a DataTap. Access to the same HDFS file system via DataTap succeeds when
performed by authorized user “bluedata”.
24. EMC Corporation 24
TEST # 2: Connect to Isilon via BlueData NFS DataTap
a. Test Goal: Ability to connect to remote Isilon via NFS protocol
b. Pass Criteria: Browse the directories and files in the NFS Access Zone
c. Test Preparation:
1. Configure NFS access zone in Isilon
d. Test Plan:
1. Login to BlueData tenant
2. Create new DataTap of type = NFS
3. Click submit
4. Click on the DataTap name to browse the data in Isilon – should match all the directories/folders
as seen in Isilon file system browser
e. Result: PASS (see screenshots below);
25. EMC Corporation 25
TEST # 3: Browse Kerberized Isilon HDFS DataTap from CDH (Cloudera) Compute
Cluster
a. Test Goal: Ability to connect to Kerberized Isilon HDFS DataTap from any Hadoop compute cluster without any
specialized configuration (i.e. without any distribution specific integration with Isilon HDFS)
b. Pass Criteria: Login to master node of Hadoop cluster and issue hadoop fs commands against Isilon HDFS DataTap
c. Test Preparation:
1. Create a Cloudera (CDH) virtual cluster using BlueData cluster
CDH virtual
Utilize the Isilon DataTap (isi-hdfs-kerb)
26. EMC Corporation 26
d. Test Plan:
1. ssh into master node (10.39.250.2) of virtual Hadoop cluster (see above)
2. Execute command: hadoop fs – ls dtap://isi-hdfs-kerb
3. The directories/folders listed must match Isilon file system browser
e. Result: PASS (see screenshots below);
27. EMC Corporation 27
TEST # 4: Browse Kerberized Isilon HDFS DataTap from HDP (Hortonworks)
Compute Cluster
f. Test Goal: Ability to connect to Kerberized Isilon HDFS DataTap from any Hadoop compute cluster without any
specialized configuration (i.e. without any distribution specific integration with Isilon HDFS such as using Cloudera
Manager or Ambari)
g. Pass Criteria: Issue hadoop fs commands against Isilon HDFS via BlueData DataTap URI (dtap://)
h. Test Preparation:
1. Create a Hortonworks (HDP) virtual cluster using BlueData
28. EMC Corporation 28
Note: The IP address of the virtual master node of the Hadoop cluster is 10.39.250.6
i. Test Plan:
1. ssh into master node of virtual cluster
2. Execute command: hadoop fs – ls dtap://isi-hdfs-kerb
3. The directories/folders listed must match Isilon file system browser
j. Result: PASS (see screenshots below);
TEST # 5: Run a Hive query against data stored in Isilon Kerberized HDFS via Hue
console
k. Test Goal: Run Hadoop jobs directly against data in-place on Isilon HDFS
l. Pass Criteria: Create external table definitions against data stored in HDFS and obtain results from query
m. Test Preparation:
29. EMC Corporation 29
1. Create a CDH (Cloudera) virtual cluster using BlueData
2. Login to Hue
n. Test Plan:
1. Create external table ‘ratings’ against ratings.csv stored on Isilon HDFS under folder = ratings
2. Create external table ‘products’ against geoplaces2.csv stored on Isilon HDFS under folder =
products
30. EMC Corporation 30
3. Create a join query between ratings and products tables
4. Execute the query
o. Result: PASS
31. EMC Corporation 31
TEST # 5: Integrate with Cloudera Navigator for metadata search and auditing
a. Test Goal: Confirm that Cloudera Navigator can audit and track the metadata (incl. lineage) of compute services.
b. Pass Criteria: Cloudera Navigator search shows all the jobs from the compute cluster against Isilon
c. Test Preparation:
1. Create CDH 5.4.3 cluster
2. Use the Cloudera Manager (that is instantiated by BlueData) to configure Cloudera Navigator for
that above cluster (note that Cloudera manager and hue console are running on 10.35.231.2).
Cloudera Navigator services are configured to run on 10.35.231.3
32. EMC Corporation 32
3. Use the IsilonHDFS DataTap - note the configuration below:
d. Test Plan:
1. Login to BlueData tenant
2. Use the BlueData console to a Hadoop M/R job utilizing the aforementioned cluster against data
stored in Isilon (i.e. via dtap://)
34. EMC Corporation 34
3. Use the Hue Console to create external tables & execute queries against data stored in Isilon and
run a join query. Note the execution time stamps
4. Login to Cloudera Navigator – confirm that the metadata search and audit logs show each of the
actions as well as sub operations with the appropriate time stamps. Testing was specifically done
with HIVE, IMPALA & M/R
e. Result: PASS (see screenshots below);
35. EMC Corporation 35
M/R Job
The time stamp of the job from the test preparation screen is: Oct 31, 2015 at 8:15:43
Note that Navigator has the same time stamp and the job input and job output to IsilonHDFS datatap is also shown here.
HIVE QUERY: Note the time stamp from the previous Hue screens of 10/30/2015 from 9.51am
Note that the external table is using the ratings folder in Isilon
36. EMC Corporation 36
Note the corresponding audit events in Cloudera Navigator for HIVE
IMPALA: Note the time stamp in Hue of 10/31/2015, 8.04am
37. EMC Corporation 37
TEST # 6: Preliminary performance benchmarks using standard development/test
hardware in BlueData Labs.
a. Test Goal: Run the Intel HiBench-Terasort test suite using BlueData compute clusters against various data sizes in
Isilon
b. Pass Criteria: Successful completion of all the terasort for 100GB and 1TB and measurement of execution times in
seconds.
i. Note: Tuning performance was a non-goal since this environment used 1 Gig network vs. the
recommended 10 gig network
ii. This size of VMs used were limited to 51GB RAM due to the specs of the underlying hardware (dev/test
machines)
c. Test Preparation:
1. Specs of Hardware for running BlueData compute cluster:
Hosts:
SuperMicro model: X9DRFR
6 workers: SuperMicro, 10G NIC, 64GB RAM, 4 cores, 8 cpu, 5 1TB drives
1 controller: Same as the worker, but with 128GB ram
CPU: Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
All boxes have the same 5 drives:
1TB Seagate SSDH hybrid drives (ssd + disk) sn: Z1DG4XE
Switch:
Arista 10G switch, model:
DCS-7050T-36 32x 10GBASE-T + 4x SFP+
2. Create special VM flavors for master/worker nodes
Master: benchmark - 6 VCPU, 51200 MB RAM, 350 GB root disk
Worker: benchmark - 6 VCPU, 51200 MB RAM, 350 GB root disk
3. Create CDH 5.4.3 cluster
39. EMC Corporation 39
d. Test Plan:
1. Login to BlueData tenant
2. Use the BlueData console to a Hadoop M/R job utilizing the aforementioned cluster against data
stored in Isilon (i.e. via dtap://)
3. Use the Hue Console to create external tables against data stored in Isilon and run a join query
4. Login to Cloudera Navigator – confirm that the metadata search and audit logs show each of the
actions as well as sub operations.
5. SEE APPENDIX SECTION FOR THE ACTUAL COMMANDS EXECUTED
e. Result: PASS
TERASORT 100GB 2909.479 seconds
TERASORT 1TB 33887.223 seconds
41. EMC Corporation 41
TESTING ENGINEERS
NAME TITLE COMPANY PHONE EMAIL
Boni Bruno Principal Solutions
Architect
EMC 818-297-4571 Boni.Bruno@emc.com
Anant Chintamaneni VP, Products BlueData 650-906-7312 anant@bluedata.com
Tom Phelan Co-Founder BlueData tap@bluedata.com
TESTING ACCEPTANCE
NAME TITLE COMPANY PHONE EMAIL
Boni Bruno Principal Solutions
Architect
EMC 818-297-4571 Boni.Bruno@emc.com
Anant Chintamaneni VP, Products BlueData 650-906-7312 anant@bluedata.com
Tom Phelan Co-Founder BlueData tap@bluedata.com
46. EMC Corporation 46
TEST # 7: Hadoop Cluster (CDH) with Isilon as HDFS replacement
Cloudera, on standalone infrastructure was configured directly with Isilon to understand the value proposition of BlueData as the
infrastructure layer with DataTap.
Key Observations:
1. Manual deployment of Cloudera using Cloudera Manager and Isilon as services option required a custom workflow.
Assumption is that each subsequent CDH cluster will require the same workflow BlueData offers a streamlined
workflow. Cloudera compute clusters (e.g. different versions or multiple) can be created unmodified without
any dependency on Isilon.
2. Cloudera Navigator Lineage (metadata search) for compute services do not function in the Cloudera+Isilon deployment
BlueData supports lineage for all compute services. BlueData also does not support HDFS level auditing (that
Cloudera Navigator supports with DAS clusters)
Cloudera Standalone (without BlueData) + Isilon Screenshot
Cloudera Manager running on 10.35.248.5
47. EMC Corporation 47
Isilon configuration in Cloudera Manager
Default settings were used. Specific access zone was created in Isilon (see next screenshot)
Isilon Networking and Access Zone
Cloudera Standalone (without BlueData) Cluster Details
48. EMC Corporation 48
Comparison of Cloudera on BlueData + Isilon vs. Cloudera Standalone + Isilon (Navigator)
Since BlueData abstracts the HDFS storage, it does not currently support HDFS auditing. However, BlueData supports auditing and
metadata lineage of all compute services. Cloudera Standalone deployed with Isilon does not support metadata lineage in Navigator.
The table below summarizes these key observations for key use cases
No Use case Cloudera on
Bluedata + Isilon
Cloudera Standalone
+ Isilon
1 Metadata – Hive Query YES NO
2 Metadata – Impala Query YES NO
3 Metadata - Pig YES NO
4 Metadata - M/R YES NO
5 Metadata - Oozie YES NO
Based on the testing pattern, the key compute lineage/metadata operations are NOT
captured by Navigator in the case of Cloudera standalone + Isilon.
1. Metadata – Hive Query
Test Preparation:
On both environments, create a simple table ‘ratings’ and run a group by query as shown below
49. EMC Corporation 49
Cloudera Standalone + Isilon (10.35.248.8); Note the time stamps of 12/07/2015
11:17pm
Cloudera Standalone + Isilon
(note the time stamp on Mac); No results are logged by Navigator
50. EMC Corporation 50
Cloudera on BlueData + Isilon (10.35.231.2);
Note the time stamp of 12/07/2015 11:21pm
Cloudera on BlueData + Isilon
(note the time stamp on Mac ); Both Hive and YARN operation executions
are logged by Navigator. Note that the time stamp matches the time stamps
in the Hue console for the corresponding query.
51. EMC Corporation 51
2. Metadata – Impala Query
Cloudera Standalone + Isilon
(note the time stamp on Mac); No results are logged by Navigator
Cloudera on BlueData + Isilon
(note the time stamp on Mac); Impala operation execution is logged with the same time
stamp as in Hue
52. EMC Corporation 52
3. Metadata – Pig Query
Cloudera Standalone + Isilon
(Note the time stamp for the Pig Job. No results captured in Navigator)
53. EMC Corporation 53
Cloudera on BlueData + Isilon
(Note the time stamp for the Pig Job. Results are captured in Navigator)
54. EMC Corporation 54
4. Metadata – Oozie
Cloudera Standalone + Isilon
(Note the time stamp for the Oozie Job. No results captured in Navigator)
55. EMC Corporation 55
Cloudera on BlueData + Isilon
(Note the time stamp and ID for the Oozie Job. Results are captured in Navigator)