This white paper describes how BlueData enables virtualization of Hadoop and Spark workloads running on Intel architecture.
Even as virtualization has spread throughout the data center, Apache Hadoop continues to be deployed almost exclusively on bare-metal physical servers. Processing overhead and I/O latency typically associated with virtualization have prevented big data architects from virtualizing Hadoop implementations.
As a result, most Hadoop initiatives have been limited in terms of agility, with infrastructure changes such as provisioning a new server for Hadoop often taking weeks or even months. This infrastructure complexity continues to slow down adoption in enterprise deployments. Apache Spark is a relatively new big data technology, but interest is growing rapidly; many of these same deployment challenges apply to on-premises Spark implementations.
The BlueData EPIC software platform addresses these limitations, enabling data center operators to accelerate Hadoop and Spark implementations on Intel architecture-based servers.
For more information, visit intel.com/bigdata and bluedata.com
This presentation provides an overview of the BlueData integration with Cloudera Manager. With this integration, customers of our BlueData EPIC software platform can leverage the power of Cloudera Manager for end-to-end Hadoop systems management and administration.
When the BlueData EPIC platform provisions a virtual CDH cluster, Cloudera Manager can be provisioned as well – so you can easily deploy, manage, monitor and perform diagnostics on your Hadoop cluster. Our customers can take advantage of the Cloudera Manager GUI to monitor their cluster, troubleshoot issues, and administer their Hadoop deployment.
Learn more about BlueData at http://www.bluedata.com
This presentation provides an overview of what’s new in the 2.0 release of the BlueData EPIC software platform.
BlueData’s EPIC software platform solves the infrastructure challenges and limitations that can slow down and stall on-premises Big Data deployments. With BlueData, you can spin up Hadoop or Spark clusters in minutes rather than months – with the data and analytical tools that your data scientists need.
The BlueData EPIC 2.0 release leverages Docker containers to simplify Big Data clusters, supports Apache Zeppelin notebooks and other new functionality for Apache Spark, and includes an enhanced App Store that provides one-click access to Big Data distributions and analytics tools.
Learn more about BlueData at http://www.bluedata.com
This Big Data case study outlines the Hadoop infrastructure deployment for a Fortune 100 media and telecommunications company.
Hadoop adoption in this company had grown organically across multiple different teams, starting with “science projects” and lab initiatives that quickly grew and expanded. Going forward, some of the options they considered for their Big Data deployment included expanding their on-premises infrastructure and using a Hadoop-as-a-Service cloud offering.
Fortunately, they realized that there is a third option: providing the benefits of Hadoop-as-a-Service with on-premises infrastructure. They selected the BlueData EPIC software platform to virtualize their Hadoop infrastructure and provide on-demand access to virtual Hadoop clusters in a secure, multi-tenant model.
Learn more about this case study in the blog post at: http://www.bluedata.com/blog/2015/05/big-data-case-study-hadoop-infrastructure
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
In a benchmark study, Intel® compared the performance of Big Data workloads running on a bare-metal deployment versus running in Docker* containers with the BlueData® EPIC™ software platform.
This in-depth study shows that performance ratios for container-based Hadoop workloads on BlueData EPIC are equal to — and in some cases, better than — bare-metal Hadoop. For example, benchmark tests showed that the BlueData EPIC platform demonstrated an average 2.33% performance gain over bare metal, for a configuration with 50 Hadoop compute nodes and 10 terabytes (TB) of data. These performance results were achieved without any modifications to the Hadoop software.
This is a revolutionary milestone, and the result of an ongoing collaboration between Intel and BlueData software engineering teams.
This white paper describes the software and hardware configurations for the benchmark tests, as well as details of the performance benchmark process and results.
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
Adoption of Apache Spark in the enterprise is increasing rapidly - it's become one of the fastest growing and most popular technologies in the Big Data ecosystem.
However, implementing an enterprise-ready, on-premises Spark deployment can be very complex and it requires expertise that is generally not available to all.
BlueData makes it easier to deploy Apache Spark on-premises. With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, self-service, on-demand access to Big Data analytics and infrastructure. You can deploy Spark in standalone mode or with Hadoop / YARN. You can also build analytical pipelines and create Spark clusters using our RESTful APIs, and use web-based Zeppelin notebooks for interactive data analytics.
BlueData’s software platform leverages virtualization and Docker containers – combined with our own patent-pending innovations – to make it faster, and more cost-effective for enterprises to get up and running with a multi-tenant Spark deployment on-premises.
Learn more at www.bluedata.com
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
The BlueData EPIC™ (Elastic Private Instant Clusters) software platform solves the infrastructure challenges and limitations that can slow down and stall Big Data deployments. With EPIC software, you can spin up Hadoop or Spark clusters – with the data and analytical tools that your data scientists need – in minutes rather than months. Leveraging the power of containers and the performance of bare-metal, EPIC delivers speed, agility, and cost-efficiency for Big Data infrastructure. It works with all of the major Apache Hadoop distributions as well as Apache Spark. It integrates with each of the leading analytical applications, so your data scientists can use the tools they prefer. You can run it with any shared storage environment, so you don’t have to move your
EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and highly efficient storage platform with native Hadoop integration that allows you to accelerate analytics, gain new flexibility, and avoid the costs of a separate Hadoop infrastructure. BlueData EPIC Software combined with EMC Isilon shared storage provides a comprehensive solution for compute + storage.
BlueData and Isilon share several joint customers and opportunities at leading financial services, advanced research laboratories, healthcare and media/communication organizations.
This paper describes the process of validating Hadoop applications running in virtual clusters on the EPIC platform with data stored on the EMC Isilon storage device using either NFS or HDFS data access protocols
This presentation provides an overview of the BlueData integration with Cloudera Manager. With this integration, customers of our BlueData EPIC software platform can leverage the power of Cloudera Manager for end-to-end Hadoop systems management and administration.
When the BlueData EPIC platform provisions a virtual CDH cluster, Cloudera Manager can be provisioned as well – so you can easily deploy, manage, monitor and perform diagnostics on your Hadoop cluster. Our customers can take advantage of the Cloudera Manager GUI to monitor their cluster, troubleshoot issues, and administer their Hadoop deployment.
Learn more about BlueData at http://www.bluedata.com
This presentation provides an overview of what’s new in the 2.0 release of the BlueData EPIC software platform.
BlueData’s EPIC software platform solves the infrastructure challenges and limitations that can slow down and stall on-premises Big Data deployments. With BlueData, you can spin up Hadoop or Spark clusters in minutes rather than months – with the data and analytical tools that your data scientists need.
The BlueData EPIC 2.0 release leverages Docker containers to simplify Big Data clusters, supports Apache Zeppelin notebooks and other new functionality for Apache Spark, and includes an enhanced App Store that provides one-click access to Big Data distributions and analytics tools.
Learn more about BlueData at http://www.bluedata.com
This Big Data case study outlines the Hadoop infrastructure deployment for a Fortune 100 media and telecommunications company.
Hadoop adoption in this company had grown organically across multiple different teams, starting with “science projects” and lab initiatives that quickly grew and expanded. Going forward, some of the options they considered for their Big Data deployment included expanding their on-premises infrastructure and using a Hadoop-as-a-Service cloud offering.
Fortunately, they realized that there is a third option: providing the benefits of Hadoop-as-a-Service with on-premises infrastructure. They selected the BlueData EPIC software platform to virtualize their Hadoop infrastructure and provide on-demand access to virtual Hadoop clusters in a secure, multi-tenant model.
Learn more about this case study in the blog post at: http://www.bluedata.com/blog/2015/05/big-data-case-study-hadoop-infrastructure
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
In a benchmark study, Intel® compared the performance of Big Data workloads running on a bare-metal deployment versus running in Docker* containers with the BlueData® EPIC™ software platform.
This in-depth study shows that performance ratios for container-based Hadoop workloads on BlueData EPIC are equal to — and in some cases, better than — bare-metal Hadoop. For example, benchmark tests showed that the BlueData EPIC platform demonstrated an average 2.33% performance gain over bare metal, for a configuration with 50 Hadoop compute nodes and 10 terabytes (TB) of data. These performance results were achieved without any modifications to the Hadoop software.
This is a revolutionary milestone, and the result of an ongoing collaboration between Intel and BlueData software engineering teams.
This white paper describes the software and hardware configurations for the benchmark tests, as well as details of the performance benchmark process and results.
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
Adoption of Apache Spark in the enterprise is increasing rapidly - it's become one of the fastest growing and most popular technologies in the Big Data ecosystem.
However, implementing an enterprise-ready, on-premises Spark deployment can be very complex and it requires expertise that is generally not available to all.
BlueData makes it easier to deploy Apache Spark on-premises. With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, self-service, on-demand access to Big Data analytics and infrastructure. You can deploy Spark in standalone mode or with Hadoop / YARN. You can also build analytical pipelines and create Spark clusters using our RESTful APIs, and use web-based Zeppelin notebooks for interactive data analytics.
BlueData’s software platform leverages virtualization and Docker containers – combined with our own patent-pending innovations – to make it faster, and more cost-effective for enterprises to get up and running with a multi-tenant Spark deployment on-premises.
Learn more at www.bluedata.com
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
The BlueData EPIC™ (Elastic Private Instant Clusters) software platform solves the infrastructure challenges and limitations that can slow down and stall Big Data deployments. With EPIC software, you can spin up Hadoop or Spark clusters – with the data and analytical tools that your data scientists need – in minutes rather than months. Leveraging the power of containers and the performance of bare-metal, EPIC delivers speed, agility, and cost-efficiency for Big Data infrastructure. It works with all of the major Apache Hadoop distributions as well as Apache Spark. It integrates with each of the leading analytical applications, so your data scientists can use the tools they prefer. You can run it with any shared storage environment, so you don’t have to move your
EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and highly efficient storage platform with native Hadoop integration that allows you to accelerate analytics, gain new flexibility, and avoid the costs of a separate Hadoop infrastructure. BlueData EPIC Software combined with EMC Isilon shared storage provides a comprehensive solution for compute + storage.
BlueData and Isilon share several joint customers and opportunities at leading financial services, advanced research laboratories, healthcare and media/communication organizations.
This paper describes the process of validating Hadoop applications running in virtual clusters on the EPIC platform with data stored on the EMC Isilon storage device using either NFS or HDFS data access protocols
Big-Data-as-a-Service (BDaaS) in an enterprise environment requires meeting the often contradictory goals of (1) providing your data scientists, analysts, and data engineers with a self-service consumption model; (2) delivering agile and scalable on-demand infrastructure for the rapidly evolving ecosystem of big data frameworks and application software; while (3) ensuring enterprise-grade capabilities for isolation, security, monitoring, etc.
In this presentation at our BDaaS meetup in Santa Clara, Tom Phelan (chief architect and co-founder of BlueData) reviewed these goals and how to resolve the potential contradictions. He also discussed the infrastructure, application, user experience, security, and maintainability considerations required before selecting (or designing and building) a Big-Data-as-a-Service platform for an enterprise big data deployment.
More info on this BDaaS meetup can be found at: http://www.meetup.com/Big-Data-as-a-Service/events/233999817
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
Alluxio Product School Webinar
January 27, 2022
For more Alluxio events: https://www.alluxio.io/events/
Speaker:
Adit Madan
Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.
Join Alluxio’s Sr. Product Mgr., Adit Madan, to learn:
- Key challenges with architecting a successful heterogeneous data platform
- How data orchestration can overcome data access challenges in a distributed, heterogeneous environment
- How to identify ways to use Alluxio to meet the needs of your own data environment and workload requirements
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSMesosphere Inc.
The application landscape inside our data center is changing: Along with the trend of moving toward microservices and containers, there are a number of new distributed data processing frameworks such as Kafka or Cassandra being released on a weekly basis. These changes have implications for the ways we think about infrastructure. With the growing need for computing power and the rise of distributed applications comes the need for a reliable and simple-use cluster manager and programming abstraction.
In this presentation, Mesosphere explains how to use DC/OS to manage microservices and fast data systems on a single platform. We will look at how container orchestration, including resource management and service management, can be streamlined to process fast data in a matter of seconds, allowing for predictive user interfaces, product recommendations, and billing charge back, among other modern app components.
There is increased interest in using Kubernetes, the open-source container orchestration system for modern, stateful Big Data analytics workloads. The promised land is a unified platform that can handle cloud native stateless and stateful Big Data applications. However, stateful, multi-service Big Data cluster orchestration brings unique challenges. This session will delve into the technical gaps and considerations for Big Data on Kubernetes.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between on-premises servers, cloud instances, and across data centers. Organizations have embarked on this journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Hadoop, Spark and to lesser extent, noSQL platforms such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts...
Speakers
Anant Chintamaneni, VP Products, BlueData
Nanda Vijaydev, Director Solutions, BlueData
It’s becoming clear that enterprises need more than one cloud. Hybrid enables enterprises to optimize how their business works – public cloud for elasticity and scale, multi-cloud for redundancy and choice, and on-premises for performance and privacy. Cloudera delivers a hybrid cloud solution that works where enterprises work, with the agility, security and governance enterprise IT needs, and the self-service analytics business people and enterprise data professionals demand. In this session, we will talk about how Cloudera helps deliver hybrid solutions for enterprises and will run a hands-on Cloudera PaaS demo to exhibit:
- Altus Environment Setup
- Configure Altus SDX
- Spin-up transient clusters with Altus
- Execute workload on Altus Data Engineering clusters
- Run interactive queries on object store with Altus Data Warehouse
- Job Analytics with Workload Experience Manager (WXM)
Speaker: Junaid Rao, Senior Cloud Sales Engineer, Cloudera
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
DataWorks Summit 2017 - Sydney
Alejandro Tesch, Cloud Evangelist, Asia Pacific and Japan, HPE
Big Data is a hot topic today for most organisations today as they race to convert vast amounts of data into useful information that can be leveraged to make critical decisions and recommendations in a very limited time windows. Today, there is a widely accepted talent gap when it comes to creating and managing Hadoop cluster, even for the experts – it can take hours (or days) to get a fully functional hadoop farm up and running. The HDP Ambari plugin for Sahara is looking to address most of this challenges by facilitating the deployment of Hortonworks Hadoop clusters and provide a set of open API to facilitate data analytics tasks in your own cloud. In this presentation we will cover why it makes sense to run your data analytics cluster in your cloud and we will demonstrate basic Sahara / Ambari functionality.
Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined. Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to Big Data initiatives – whether on-premises or in the public cloud.
This session at Hadoop Summit in San Jose, California (June 2016) discusses the emerging category of Big-Data-as-a-Service (BDaaS) - representing the intersection of Big Data and Cloud Computing.
In this session, Kris Applegate (Cloud and Big Data Solution Architect at Dell) and Thomas Phelan (Co-Founder and Chief Architect at BlueData) outlined the following:
- Innovations that paved the way for Big-Data-as-a-Service
- Definition and categories of Big-Data-as-a-Service
- Key considerations for Big-Data-as-a-Service in the enterprise, including public cloud or on-premises deployment options
A video replay can also be found here: https://youtu.be/_ucPoTKuj8Q
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
VMworld 2013
Chris Greer, FedEx
Richard McDougall, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
Business Intelligence (BI) solutions need to move at the speed of business. Unfortunately, roadblocks related to availability of resources and deployment often present an issue. What if you could accelerate the deployment of an entire BI infrastructure to just a couple hours and start loading data into it by the end of the day. In this session, we'll demonstrate how to leverage Microsoft tools and the Azure cloud environment to build out a BI solution and begin providing analytics to your team with tools such as Power BI. By end of the session, you'll gain an understanding of the capabilities of Azure and how you can start building an end to end BI proof-of-concept today.
Legacy ERP architecture offers an incredibly, efficient means of operational resource management, but a real challenge comes from extracting business insights from them. Over the past 30 years, ERP data system, such as SAP, can be hard to interact with especially at the source database level. Whether initial translation of business logic and hierarchies create significant customizations, as well as, merging those changes into analytical applications. Overall, the entire process of designing self-service reporting with business level context can be quite cumbersome, looking at an example platform like, SAP, which contain pre-packaged modules (MM, SD, PP, etc), integrating these systems into a series of pre-built analytics.
The orchestration and integration over a wide range of open source technology solutions with some commercial CDC and reporting solutions into a reference solution that mimics several real customer scenarios today, living on relational platforms. Key considerations of extracting from the operational system of record, especially the merging of multiple systems in different time zones, will be addressed. Furthermore, the integration concerns of an analytics Hadoop platform, using HIVE Acid and Merge, as well as, flattening techniques for dimensional models. Many times a customer is temporarily limited in the range of data their ERP can contain, and older data is often offloaded to secondary systems or cold archiving entities. That goes away, but the opportunities expanded with real-time reporting across all of history, and the expanded use cases with advanced machine learnings methods.
Speakers
Jordan Martz, Director of Tech Solutions, Attunity
David Freriks, Technology Evangelist, Qlik
Better performance and cost effectiveness empower better results in the cognitive era. For more information, visit: http://www.ibm.com/systems/power/hardware/linux-lc.html
The EDB Postgres Platform is an enterprise-class data management platform based on the open source database PostgreSQL, complemented by tool kits for management, integration, and migration; flexible deployment options, and services and support to enable enterprises to deploy Postgres at scale.
Big-Data-as-a-Service (BDaaS) in an enterprise environment requires meeting the often contradictory goals of (1) providing your data scientists, analysts, and data engineers with a self-service consumption model; (2) delivering agile and scalable on-demand infrastructure for the rapidly evolving ecosystem of big data frameworks and application software; while (3) ensuring enterprise-grade capabilities for isolation, security, monitoring, etc.
In this presentation at our BDaaS meetup in Santa Clara, Tom Phelan (chief architect and co-founder of BlueData) reviewed these goals and how to resolve the potential contradictions. He also discussed the infrastructure, application, user experience, security, and maintainability considerations required before selecting (or designing and building) a Big-Data-as-a-Service platform for an enterprise big data deployment.
More info on this BDaaS meetup can be found at: http://www.meetup.com/Big-Data-as-a-Service/events/233999817
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
Alluxio Product School Webinar
January 27, 2022
For more Alluxio events: https://www.alluxio.io/events/
Speaker:
Adit Madan
Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access. Alluxio enables you to embrace the separation of storage from compute and use Alluxio data orchestration to simplify adoption of the data lake and data mesh paradigms for analytics and AI/ML workloads.
Join Alluxio’s Sr. Product Mgr., Adit Madan, to learn:
- Key challenges with architecting a successful heterogeneous data platform
- How data orchestration can overcome data access challenges in a distributed, heterogeneous environment
- How to identify ways to use Alluxio to meet the needs of your own data environment and workload requirements
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSMesosphere Inc.
The application landscape inside our data center is changing: Along with the trend of moving toward microservices and containers, there are a number of new distributed data processing frameworks such as Kafka or Cassandra being released on a weekly basis. These changes have implications for the ways we think about infrastructure. With the growing need for computing power and the rise of distributed applications comes the need for a reliable and simple-use cluster manager and programming abstraction.
In this presentation, Mesosphere explains how to use DC/OS to manage microservices and fast data systems on a single platform. We will look at how container orchestration, including resource management and service management, can be streamlined to process fast data in a matter of seconds, allowing for predictive user interfaces, product recommendations, and billing charge back, among other modern app components.
There is increased interest in using Kubernetes, the open-source container orchestration system for modern, stateful Big Data analytics workloads. The promised land is a unified platform that can handle cloud native stateless and stateful Big Data applications. However, stateful, multi-service Big Data cluster orchestration brings unique challenges. This session will delve into the technical gaps and considerations for Big Data on Kubernetes.
Containers offer significant value to businesses; including increased developer agility, and the ability to move applications between on-premises servers, cloud instances, and across data centers. Organizations have embarked on this journey to containerization with an emphasis on stateless workloads. Stateless applications are usually microservices or containerized applications that don’t “store” data. Web services (such as front end UIs and simple, content-centric experiences) are often great candidates as stateless applications since HTTP is stateless by nature. There is no dependency on the local container storage for the stateless workload.
Stateful applications, on the other hand, are services that require backing storage and keeping state is critical to running the service. Hadoop, Spark and to lesser extent, noSQL platforms such as Cassandra, MongoDB, Postgres, and mySQL are great examples. They require some form of persistent storage that will survive service restarts...
Speakers
Anant Chintamaneni, VP Products, BlueData
Nanda Vijaydev, Director Solutions, BlueData
It’s becoming clear that enterprises need more than one cloud. Hybrid enables enterprises to optimize how their business works – public cloud for elasticity and scale, multi-cloud for redundancy and choice, and on-premises for performance and privacy. Cloudera delivers a hybrid cloud solution that works where enterprises work, with the agility, security and governance enterprise IT needs, and the self-service analytics business people and enterprise data professionals demand. In this session, we will talk about how Cloudera helps deliver hybrid solutions for enterprises and will run a hands-on Cloudera PaaS demo to exhibit:
- Altus Environment Setup
- Configure Altus SDX
- Spin-up transient clusters with Altus
- Execute workload on Altus Data Engineering clusters
- Run interactive queries on object store with Altus Data Warehouse
- Job Analytics with Workload Experience Manager (WXM)
Speaker: Junaid Rao, Senior Cloud Sales Engineer, Cloudera
Driving in the Desert - Running Your HDP Cluster with Helion, Openstack, and ...DataWorks Summit
DataWorks Summit 2017 - Sydney
Alejandro Tesch, Cloud Evangelist, Asia Pacific and Japan, HPE
Big Data is a hot topic today for most organisations today as they race to convert vast amounts of data into useful information that can be leveraged to make critical decisions and recommendations in a very limited time windows. Today, there is a widely accepted talent gap when it comes to creating and managing Hadoop cluster, even for the experts – it can take hours (or days) to get a fully functional hadoop farm up and running. The HDP Ambari plugin for Sahara is looking to address most of this challenges by facilitating the deployment of Hortonworks Hadoop clusters and provide a set of open API to facilitate data analytics tasks in your own cloud. In this presentation we will cover why it makes sense to run your data analytics cluster in your cloud and we will demonstrate basic Sahara / Ambari functionality.
Enterprises have been using both Big Data and Cloud Computing technologies for years. Until recently, the two have not been combined. Now the agility and efficiency benefits of self-service elastic infrastructure are being extended to Big Data initiatives – whether on-premises or in the public cloud.
This session at Hadoop Summit in San Jose, California (June 2016) discusses the emerging category of Big-Data-as-a-Service (BDaaS) - representing the intersection of Big Data and Cloud Computing.
In this session, Kris Applegate (Cloud and Big Data Solution Architect at Dell) and Thomas Phelan (Co-Founder and Chief Architect at BlueData) outlined the following:
- Innovations that paved the way for Big-Data-as-a-Service
- Definition and categories of Big-Data-as-a-Service
- Key considerations for Big-Data-as-a-Service in the enterprise, including public cloud or on-premises deployment options
A video replay can also be found here: https://youtu.be/_ucPoTKuj8Q
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
VMworld 2013
Chris Greer, FedEx
Richard McDougall, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
Business Intelligence (BI) solutions need to move at the speed of business. Unfortunately, roadblocks related to availability of resources and deployment often present an issue. What if you could accelerate the deployment of an entire BI infrastructure to just a couple hours and start loading data into it by the end of the day. In this session, we'll demonstrate how to leverage Microsoft tools and the Azure cloud environment to build out a BI solution and begin providing analytics to your team with tools such as Power BI. By end of the session, you'll gain an understanding of the capabilities of Azure and how you can start building an end to end BI proof-of-concept today.
Legacy ERP architecture offers an incredibly, efficient means of operational resource management, but a real challenge comes from extracting business insights from them. Over the past 30 years, ERP data system, such as SAP, can be hard to interact with especially at the source database level. Whether initial translation of business logic and hierarchies create significant customizations, as well as, merging those changes into analytical applications. Overall, the entire process of designing self-service reporting with business level context can be quite cumbersome, looking at an example platform like, SAP, which contain pre-packaged modules (MM, SD, PP, etc), integrating these systems into a series of pre-built analytics.
The orchestration and integration over a wide range of open source technology solutions with some commercial CDC and reporting solutions into a reference solution that mimics several real customer scenarios today, living on relational platforms. Key considerations of extracting from the operational system of record, especially the merging of multiple systems in different time zones, will be addressed. Furthermore, the integration concerns of an analytics Hadoop platform, using HIVE Acid and Merge, as well as, flattening techniques for dimensional models. Many times a customer is temporarily limited in the range of data their ERP can contain, and older data is often offloaded to secondary systems or cold archiving entities. That goes away, but the opportunities expanded with real-time reporting across all of history, and the expanded use cases with advanced machine learnings methods.
Speakers
Jordan Martz, Director of Tech Solutions, Attunity
David Freriks, Technology Evangelist, Qlik
Better performance and cost effectiveness empower better results in the cognitive era. For more information, visit: http://www.ibm.com/systems/power/hardware/linux-lc.html
The EDB Postgres Platform is an enterprise-class data management platform based on the open source database PostgreSQL, complemented by tool kits for management, integration, and migration; flexible deployment options, and services and support to enable enterprises to deploy Postgres at scale.
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
Learn how CARFAX utilized the power of Control-M to help drive big data processing via Cloudera. See why it was a no-brainer to choose Control-M to help manage workflows through Hadoop, some of the challenges faced, and the benefits the business received by using an existing, enterprise-wide workload management system instead of choosing “yet another tool.”
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
The BlueData EPIC™ software platform solves the challenges that can slow down and stall Big Data initiatives. It makes deployment of Big Data infrastructure easier, faster, and more
cost-effective – eliminating complexity as a barrier to adoption.
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
Alluxio Tech Talk
Feb 12, 2019
Speaker:
Dipti Borkar, Alluxio
The rise of compute intensive workloads and the adoption of the cloud has driven organizations to adopt a decoupled architecture for modern workloads – one in which compute scales independently from storage. While this enables scaling elasticity, it introduces new problems – how do you co-locate data with compute, how do you unify data across multiple remote clouds, how do you keep storage and I/O service costs down and many more.
Enter Alluxio, a virtual unified file system, which sits between compute and storage that allows you to realize the benefits of a hybrid cloud architecture with the same performance and lower costs.
In this webinar, we will discuss:
- Why leading enterprises are adopting hybrid cloud architectures with compute and storage disaggregated
- The new challenges that this new paradigm introduces
- An introduction to Alluxio and the unified data solution it provides for hybrid environments
An Overview of All The Different Databases in Google CloudFibonalabs
Google cloud platform (GCP) is a high-performance infrastructure for cloud computing, data analytics, and machine learning. Google Cloud runs on the same infrastructure that Google uses for its end-user products like Google Search, Gmail, Google Drive, Google Photos, etc.
With BlueData, you can spin up instant containerized environments for the Hortonworks Data Platform (HDP) and other Big Data analytics and machine learning workloads — providing your data science teams with on-demand environments for greater agility. You can decouple compute from storage resources, to improve efficiency and reduce costs. And you can ensure the enterprise-grade security and governance that your IT teams require.
BlueData has completed certification through the rigorous Hortonworks QATS (Quality Assured Testing Suite) program for deploying HDP in a containerized environment. This certification enables Hortonworks and BlueData to provide best-in-class support and high performance for their customers’ existing and future investments in HDP.
“We’ve seen rapidly growing interest in running HDP on containers, therefore it was key that we work closely with BlueData to benefit those users,” said Scott Andress, vice president of global channels & alliances at Hortonworks. “They passed our most rigorous QATS certification tests, validating that BlueData provides complete interoperability and high performance for customers running HDP in containerized environments.”
At our March Data Analytics Meetup, Dan Rodriguez and Cherian Mathew demonstrated the variations in Microsoft Azure programs and how they are impacting digital transformation.
Analytics and Lakehouse Integration Options for Oracle ApplicationsRay Février
This Red Hot session is designed for customers who are currently using Oracle Cloud applications such as Fusion and EPM, and are interested in gaining a better understanding of the integration options that are available to them.
Here is a high level agenda:
- We will start by discussing the modern data platform on OCI, the Lakehouse architecture and the OCI related services that supports it.
- We will then discuss the data extraction methods available on OCI for Fusion and EPM.
- Last but not least, we will end with a few best practices and possible use cases.
In the interest of time, we will mainly focus on integration patterns that are recommended for Fusion and EPM, but don’t hesitate to reach out if you would to talk to us about other Oracle applications.
Enjoy!
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
Learn more about the Cisco Big Data Warehouse Expansion Solution featuring MapR Distribution including Apache Hadoop.
The BDWE solution begins with the collection of data usage statistics by Appfluent. Then the BDWE solution optimizes Cisco UCS hardware for running the MapR Distribution including Hadoop, software for federating multiple data sources, and a comprehensive services methodology for assessing, migrating, virtualizing, and operating a logically expanded warehouse.
Overview of the architecture, and benefits of Dell HPC Storage with Intel EE Lustre in High Performance Computing and Big Science workloads.
Presented by Andrew Underwood at the Melbourne Big Data User Group - January 2016.
Lustre is a trademark of Seagate Technology.
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
Find out how Hortonworks and IBM help you address these challenges to enable success to optimize your existing EDW environment.
https://hortonworks.com/webinar/modernize-existing-edw-ibm-big-sql-hortonworks-data-platform/
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
Presentation from San Francisco Kubernetes Meetup on October 30, 2018
https://www.meetup.com/San-Francisco-Kubernetes-Meetup/events/255431002
What is KubeDirector? - Tom Phelan & Joel Baxter, Bluedata
Kubernetes is clearly the container orchestrator of choice for cloud-native stateless applications. And with the introduction of StatefulSets and Persistent Volumes it is becoming possible to run stateful applications on Kubernetes.
Now the new KubeDirector project allows users to manage complex stateful clusters for AI, machine learning, and big data analytics on Kubernetes without writing a single line of GO code.
KubeDirector is an open source Apache project that uses the standard Kubernetes custom resource functionality and API extensions to deploy and manage complex stateful scale-out application clusters.
This session will provide an overview of the KubeDirector architecture, show how to author the metadata and artifacts required for an example stateful application (e.g. with Spark, Jupyter, and Cassandra), and demonstrate the deployment and management of the cluster on Kubernetes using KubeDirector.
https://github.com/bluek8s/kubedirector
Dell EMC Ready Solutions for Big Data are powered by the BlueData EPIC software platform - for on-demand provisioning and automation. These integrated solutions enable a cloud-like experience for Big-Data-as-a-Service (BDaaS) while ensuring the enterprise-grade security and performance of on-premises infrastructure.
With Dell EMC Ready Solutions for Big Data, customers can rapidly deploy their analytics and machine learning workloads in a secure multi-tenant architecture, for multiple different user groups running on shared infrastructure. Their users can quickly and easily provision distributed environments for Cloudera, Hortonworks, Kafka, MapR, Spark, TensorFlow, as well as other tools.
The new Ready Solutions include everything that customers need to enable BDaaS on-premises – including BlueData EPIC software as well as Dell EMC hardware, consulting, deployment, and support services.
To learn more, visit www.dellemc.com/bdaas
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls as well as the encryption of data at rest and in flight.
The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it is not decrypted again until it is accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it is being specifically accessed by a processing application.
TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does have its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS filesystem accessed by this cluster.
BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. This session at the Strata Data Conference in March 2018 (by Thomas Phelan, co-founder and chief architect at BlueData) offers a detailed overview of how transparent data encryption works with HDFS, with a particular focus on containerized environments.
You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). Moreover, you’ll learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63763
La plateforme logicielle BlueData EPIC™ simplifie, accélère et rend plus rentable le déploiement d’infrastructures et d’applications Big Data telles que Hadoop, Spark, Kafka, Cassandra, et plus, dans l’infrastructure locale ou dans le cloud public.
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments for Big Data workloads using Kafka poses some challenges – including container management, scheduling, network configuration and security, and performance.
In this session at Kafka Summit in August 2017, Nanda Vijyaydev of BlueData shared lessons learned from implementing Kafka-as-a-Service with Docker containers.
https://kafka-summit.org/sessions/kafka-service-docker-containers
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed Big Data applications like Apache Spark.
Some of these challenges include container lifecycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is “all in” on Docker containers – with a specific focus on Spark applications. They’ve learned first-hand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy Big Data workloads using Docker.
This session at Spark Summit in February 2017 (by Thomas Phelan, co-founder and chief architect at BlueData) described lessons learned as well as some tips and tricks on how to Dockerize your Big Data applications in a reliable, scalable, and high-performance environment.
In this session, Tom described how to network Docker containers across multiple hosts securely. He discussed ways to achieve high availability across distributed Big Data applications and hosts in your data center. And since we’re talking about very large volumes of data, performance is a key factor. So Tom discussed some of the storage options that BlueData explored and implemented to achieve near bare-metal I/O performance for Spark using Docker.
https://spark-summit.org/east-2017/events/lessons-learned-from-dockerizing-spark-workloads
The BlueData EPIC software platform makes deployment of Big Data infrastructure and applications easier, faster, and more cost-effective – whether on-premises or on the public cloud.
With BlueData EPIC on AWS, you can quickly and easily deploy your preferred Big Data applications, distributions and tools; leverage enterprise-class security and cost controls for multi-tenant deployments on the Amazon cloud; and tap into both Amazon S3 and on-premises storage for your Big Data analytics.
Sign up for a free two-week trial at www.bluedata.com/aws
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
Get started with Spark Streaming, Kafka, and Cassandra for real-time data analytics.
BlueData makes it easy to deploy Spark infrastructure and applications on- premises. The BlueData EPIC software platform is purpose-built to simplify and accelerate the deployment of Spark, Hadoop, and other tools for Big Data analytics—leveraging Docker containers and virtualized infrastructure.
Our new Real-Time Pipeline Accelerator solution provides the software and professional services you need for building data pipelines in a multi-tenant environment for Spark Streaming, Kafka, and Cassandra. With help from the BlueData team, you’ll also have two end-to-end real-time data pipelines as a starting point.
Learn more about BlueData at www.bluedata.com
Accelerate Hadoop and Spark deployment in a multi-tenant lab environment for dev/test/QA, evaluation of multiple tools for Big Data analytics, and other use cases. BlueData provides a turnkey on-premises solution with software and services to get up and running in two weeks.
The new Big Data Lab Accelerator solution provides a full enterprise license of BlueData EPIC software along with the professional services needed to deploy an on-premises multi-tenant Big Data lab. Within two weeks, customers will have a lab environment to evaluate Big Data tools and spin up multiple Hadoop or Spark clusters for development, testing and quality assurance. As part of this deployment, BlueData will also work with customers to implement initial use cases for Big Data analytics.
Learn more about BlueData at www.bluedata.com
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
BlueData is working in partnership with Splunk to streamline and accelerate the deployment and adoption of Hunk: Splunk Analytics for Hadoop. The BlueData EPIC software platform now integrates Hunk with Hadoop clusters running on virtualized on-premises infrastructure.
Using Hunk with the BlueData EPIC platform, our joint customers can quickly provision virtual Hadoop clusters together with Hunk in a matter of minutes – providing their data scientists and analysts with the ability to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop.
Learn more at http://www.bluedata.com
BlueData makes on-premises Spark infrastructure easy.
With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, on-demand access to Big Data analytics and infrastructure. You can use Spark with or without the Hadoop ecosystem of tools – using HDFS, Tachyon, or any shared storage system.
You can also build analytical pipelines and create Spark clusters using our RESTful APIs. BlueData’s software platform leverages virtualization and patent-pending innovations to make it simpler, faster, and more cost-effective to deploy Hadoop or Spark infrastructure on-premises.
Learn more at http://www.bluedata.com
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Hadoop Virtualization - Intel White Paper
1. Even as virtualization has spread throughout the data center, Apache Hadoop continues to be deployed almost exclusively on
bare-metal physical servers. Processing overhead and I/O latency typically associated with virtualization have prevented big
data architects from virtualizing Hadoop implementations.
As a result, most Hadoop initiatives have been limited in terms of agility, with infrastructure changes such as provisioning a
new server for Hadoop often taking weeks or even months. This infrastructure complexity continues to slow down adoption
in enterprise deployments. Apache Spark is a relatively new big data technology, but interest is growing rapidly; many of these
same deployment challenges apply to on-premises Spark implementations.
The BlueData EPIC (Elastic Private Instant Clusters) software platform addresses these limitations, enabling data center
operators to accelerate Hadoop and Spark implementations on Intel® architecture-based servers.
The BlueData EPIC* software platform offers data center
operators the agility and cost performance of virtualized
infrastructure for big data, with high manageability and flexibility
when integrating into existing data center environments.
Introduction to BlueData EPIC
The BlueData EPIC software platform
reduces the complexity of big data
infrastructure deployments, providing
the ability for end users to quickly and
easily deploy Hadoop or Spark clusters
in a virtualized environment running on
Docker containers. These clusters can
deliver faster time-to-value for big data,
providing the cloud-like experience of
Hadoop-as-a-Service or Spark-as-a-
Service in their own data centers.
The BlueData EPIC platform helps
improve hardware utilization, reduces
cluster sprawl, and minimizes the
need to move data for big data
analytics. BlueData EPIC also provides
for simplified deployment and
administration, while making virtual
clusters look and feel like physical
clusters for big data analytics.
Taking advantage of the power of
containers and virtualization, BlueData’s
software helps deliver greater agility
and cost-efficiency for on-premises
big data infrastructure. The benefits of
these capabilities include the following:
• Business agility. Virtual clusters
can be spun up or down in minutes,
providing elasticity for capacity spikes,
as well as rapid response to emerging
business needs.
• Data protection. Multiple virtual
workloads can co-exist on the same
multi-tenant physical cluster, while
isolating data on each virtual cluster
from the others.
• Resource efficiency. Multiple business
units and user groups can share
physical cluster resources, avoiding
the cost and complexity of each having
its own big data infrastructure.
To meet varying customer needs, the
EPIC software platform is available in
two editions. EPIC Lite is a community
edition of the platform that is available
for a single instance, free of charge; it
is intended for evaluation purposes
and for personal use. EPIC Enterprise
is a fully supported, highly scalable
commercial edition that is available on
a subscription basis for up to hundreds
of physical nodes. For a full comparison
of the two product editions, see
www.bluedata.com/product/comparison.
white paper
BlueData Enables
Virtualization of Enterprise
Hadoop* and Spark* Workloads
Virtualization for Big Data
Intel® Xeon® Processor E5-2600 v3 Product Family
2. BlueData EPIC Software Architecture
The core components of the EPIC platform—ElasticPlane*, IOBoost*, and
DataTap*—are illustrated in Figure 1 and described below.
ElasticPlane: Virtual Clusters on Demand
ElasticPlane enables spinning up virtual
clusters on demand via self-service in a
secure multi-tenant environment, with
a policy engine for automated QoS and
SLA management. End users can easily
create virtual Hadoop or Spark clusters
with BlueData EPIC ’s ElasticPlane
functionality and self-service interface.
BlueData also provides multi-tenancy
and data isolation to help ensure logical
separation between each group within
the organization.
The solution enables different project
teams or departments across the
enterprise to share the same physical
infrastructure—and access the same
data sources—for their big data
analytics. The platform integrates with
enterprise security and authentication
mechanisms such as LDAP, Active
Directory, and Kerberos*.
IOBoost: Enhanced Performance
IOBoost enhances the I/O performance
and scalability of virtual clusters
with hierarchical data caching and
tiering, plus single-copy data transfer
from physical storage to the virtual
cluster. The IOBoost functionality of
the BlueData EPIC platform provides
application-aware caching and elastic
resource management that adapts
dynamically to changing application
requirements, helping drive up
performance.
Write-dominant workloads in particular
benefit from IOBoost, which takes
advantage of knowing how the
application will access data. BlueData’s
IOBoost technology provides a non-
persistent memory cache, the behavior
of which changes to improve the
efficiency of access to physical storage
devices. IOBoost accesses the external
file system by means of BlueData’s
DataTap file system connector.
Hadoop-as-a-Service or
Spark-as-a-Service
in an On-Premises
Deployment Model
The BlueData EPIC* software
platform gives business users
the ability to set up self-service
virtual Hadoop* or Spark*
clusters without having to submit
requests for scarce IT resources
and then wait for an environment
to be set up for them. Within
minutes, data scientists and
analysts can deploy big data
services and applications to meet
their needs on demand.
The ability to explore, analyze,
and draw insights from data
allows users to seize business
opportunities while they are still
relevant.
• Ad hoc analytics. Identify
emerging trends and
relationships to enhance
decision support.
• “Fail-fast” experimentation.
Try out new approaches to big
data challenges with minimal
investment.
• Rapid response. Spin virtual
clusters up and down fast,
as changing needs and
opportunities dictate.
Figure 1. BlueData EPIC* software architecture.
Marketing
Local HDFS
Remote
HDFS
CEPHNFS Gluster Object
Store
R D Support Sales
BI/Analytics Tools
Manufacturing
ElasticPlane™ - Self-service, multi-tenant clusters
BlueData EPIC™ Platform
IOBoost™ - Extreme performance and scalability
DataTap™ - In-place access to enterprise data stores
2BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads
3. Specifically, as the application writes
data to the Hadoop Distributed File
System (HDFS*), IOBoost functions as
a write-behind cache, optimizing the
performance of sequential writes.
DataTap: In-Place Processing of Data
DataTap allows in-place processing of
data, eliminating the need to duplicate
data across Hadoop systems. DataTap
provides HDFS protocol abstraction
that allows big data applications to
run unmodified with fast access to
data sources other than HDFS. With
BlueData EPIC’s DataTap capability,
organizations can access data from any
shared storage system (including HDFS
as well as NFS*, GlusterFS*, CEPH*, and
Swift*) for big data analytics.
That means organizations don’t need to
make multiple copies of data or move
data into HDFS before running their
analysis. Sensitive data can stay in their
secure storage system with enterprise-
grade data governance, without the cost
and risks of creating and maintaining
multiple copies.
DataTap effectively decouples compute
from storage, providing the ability
to independently scale compute and
storage on an as-needed basis. This
approach helps enable more effective
utilization of infrastructure resources
and lower data center operating costs.
Deployment Considerations
and Guidance
BlueData’s software applies patent-
pending innovations to enable
virtualization that is specifically tailored
to the needs of big data. The use
of Docker containers is completely
transparent, but BlueData customers
benefit from greater performance
and deployment flexibility due to
their lightweight nature. This enables
enterprises to quickly and easily deploy
Hadoop or Spark in a lightweight
container environment, running on
either bare-metal physical servers or on
virtual machines.
BlueData EPIC supports Hadoop and
Spark applications without requiring
those applications to be modified in
any way. Likewise, the platform utilizes
the underlying features of the physical
storage devices for data backup,
replication, and high availability, so
it is not necessary for organizations
to modify their existing processes to
facilitate security and durability of
their data.
Synergies with Intel® Architecture
The ability to run on large clusters of
mainstream two-socket servers extends
the cost-performance advantages
of virtualizing big data workloads.
BlueData software running on the Intel®
Xeon® processor E5-2600 v3 product
family is a powerful combination to
overcome key virtualization challenges
such as network latency, infrastructure
security, and power inefficiencies.
Deploying BlueData EPIC environments
on two-socket servers powered by
these processors takes advantage of
the following benefits:
• Improved performance and
virtualization density. With increased
core counts, larger cache, and higher
memory bandwidth, the processor
delivers dramatic improvements over
its predecessors.
• Hardware-based security. Intel®
Platform Protection Technology,
including Intel® Trusted Execution
Technology, Intel® OS Guard, and BIOS
Guard, enhances protection against
malicious attacks.
• Increased power efficiency. Per-
core P states dynamically respond to
changing workloads and adapt power
levels on each individual core, to
deliver better performance per watt
than predecessor platforms.
Beyond the processor platform used
with BlueData deployments, using Intel®
Solid-State Drives (Intel® SSDs) helps
optimize the execution environment at
the system level. For example, the
Intel® SSD Data Center (Intel SSD
DC) P3608 Series1
delivers high
performance and low latency that
help accelerate virtualized Hadoop
workloads, using connectivity based
on the Non-Volatile Memory Express
(NVMe) standard and eight lanes of PCI
Express* (PCIe*) 3.0.
Created by an industry coalition
including Intel, NVMe replaces the older
SATA standard with a new technology
developed specifically to deliver
latency and throughput advantages
for high-speed SSDs and other non-
volatile memory-based storage. The
Intel SSD DC P3608 Series builds on
those capabilities with a unique dual-
controller architecture that improves
scaling across the execution cores of
Intel® Xeon® processors. It is available
in a low-profile PCIe form factor, in
capacities up to 4 TB.
Intel® Ethernet Controllers help
accelerate workloads including
those based on virtualized Hadoop
with purpose-built capabilities for
virtualization, such as intelligent offload
of traffic management to network
hardware. By handling traffic functions
in network silicon, Intel Ethernet
removes the associated burden from
the processor, freeing execution
resources for other work.
Configuration Best Practices
Ongoing experimentation by Intel and
BlueData indicates that the following
suggested guidelines may help data
center operators achieve optimal
throughput on virtualized Hadoop and
Spark workloads using the BlueData
EPIC software platform. While detailed
examination is beyond the scope of
this paper, the following guidance is
particularly relevant to I/O-bound
workloads.
• Configure systems to enhance disk
performance. The performance
of the storage where the files are
stored must be sufficient to avoid a
bottleneck.
3BlueData Enables Virtualization of Enterprise Hadoop* and Spark* Workloads