Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed Big Data applications like Apache Spark.
Some of these challenges include container lifecycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. At BlueData, we’re “all in” on Docker containers – with a specific focus on Spark applications. We’ve learned first-hand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy Big Data workloads using Docker.
In this session, you’ll learn about networking Docker containers across multiple hosts securely. We’ll discuss ways to achieve high availability across distributed Big Data applications and hosts in your data center. And since we’re talking about very large volumes of data, performance is a key factor. So we’ll discuss some of the storage options we explored and implemented at BlueData to achieve near bare-metal I/O performance for Spark using Docker. We’ll share our lessons learned as well as some tips and tricks on how to Dockerize your Big Data applications in a reliable, scalable, and high-performance environment.