Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
3 examples for Big Data analytics containerized:
1. The installation with Docker and Weave for small and medium,
2. Hadoop on Mesos w/ Appache Myriad
3. Spark on Mesos
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
Introduction to Apache CloudStack by David Nalleybuildacloud
Apache CloudStack is a mature, easy to deploy IaaS platform. That doesn't mean that it can be done without thought or preparation. Learn how CloudStack can be most efficiently deployed, and the problems to avoid in the process.
About David Nalley
David is a recovering sysadmin with a decade of experience. He’s a committer on the Apache CloudStack (incubating) project, a contributor to the Fedora Project and the Vice President of Infrastructure at the Apache Software Foundation.
Building clouds with apache cloudstack apache roadshow 2018ShapeBlue
Talk given at Apache Roadshow, FOSS Backstage, Berlin, June 2018
Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. This talk will give an introduction to the technology, its history and its architecture. It will look common use-cases (and some real production deployments) that are seen across both public and private cloud infrastructures and where CloudStack can be completed by other open source technologies.
The talk will also compare and contrast Apache Cloudstack with other IaaS platforms and why he thinks that the technology, combined with the Apache governance model will see CloudStack become the de-facto open source cloud platform. He will run a live demo of the software and talk about ways that people can get involved in the Apache CloudStack project.
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
3 examples for Big Data analytics containerized:
1. The installation with Docker and Weave for small and medium,
2. Hadoop on Mesos w/ Appache Myriad
3. Spark on Mesos
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
Introduction to Apache CloudStack by David Nalleybuildacloud
Apache CloudStack is a mature, easy to deploy IaaS platform. That doesn't mean that it can be done without thought or preparation. Learn how CloudStack can be most efficiently deployed, and the problems to avoid in the process.
About David Nalley
David is a recovering sysadmin with a decade of experience. He’s a committer on the Apache CloudStack (incubating) project, a contributor to the Fedora Project and the Vice President of Infrastructure at the Apache Software Foundation.
Building clouds with apache cloudstack apache roadshow 2018ShapeBlue
Talk given at Apache Roadshow, FOSS Backstage, Berlin, June 2018
Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. This talk will give an introduction to the technology, its history and its architecture. It will look common use-cases (and some real production deployments) that are seen across both public and private cloud infrastructures and where CloudStack can be completed by other open source technologies.
The talk will also compare and contrast Apache Cloudstack with other IaaS platforms and why he thinks that the technology, combined with the Apache governance model will see CloudStack become the de-facto open source cloud platform. He will run a live demo of the software and talk about ways that people can get involved in the Apache CloudStack project.
This session will examine the many options the data scientist has for running Spark clusters in public and private clouds. We will discuss various environments employing AWS, Mesos, containers, docker, and BlueData EPIC technologies and the benefits and challenges of each.
Speakers:
Tom Phelan, Co-founder and Chief Architect - BlueData Inc. Tom has spent the last 25 years as a senior architect, developer, and team lead in the computer software industry in Silicon Valley. Prior to co-founding BlueData, Tom spent 10 years at VMware as a senior architect and team lead in the core R&D Storage and Availability group. Most recently, Tom led one of the key projects – vFlash, focusing on integration of server-based Flash into the vSphere core hypervisor. Prior to VMware, Tom was part of the early team at Silicon Graphics that developed XFS, one of the most successful open source file systems. Earlier in his career, he was a key member of the Stratus team that ported the Unix operating system to their highly available computing platform. Tom received his Computer Science degree from the University of California, Berkeley.
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
This session will introduce the basics of primary storage in CloudStack. Additionally, I discuss the challenges of guaranteeing storage performance in a cloud and how by leveraging the latest enhancements to CloudStack, storage administrators can deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. I'll review the CloudStack enhancements in detail, outline the management benefits they provide and discuss common go-to-market approaches.
About Mike Tutkowski
Mike Tutkowski, a member of the CloudStack PMC, develops software for the Apache Software Foundation's CloudStack project to help drive improvements in its storage component and to integrate SolidFire more deeply into the product.
Using Ansible to deploy a 6-node Hortonworks Data Platform (hadoop) cluster on AWS with the ObjectRocket ansible-hadoop playbook.
Presented at the Ansible NOVA MeetUp on February 23, 2017: https://www.meetup.com/Ansible-NOVA/events/236853616/
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
Deploying OpenStack in production at any scale, upgrade support is one of the requirements to have a successful deployment. Without upgrade management, adeployment will have bugs and security issues from day 1. Also in longer term, it will miss the latest features that OpenStack offers.
Hypervisor Selection in Apache CloudStack 4.4Tim Mackey
Building an infrastructure as a service cloud involves a number of technology decisions, many of which could have unforeseen impact. Hypervisors form the core of an IaaS cloud, and whether you are a fan of Microsoft Hyper-V, VMware vSphere, KVM in any Linux variant or XenServer from Citrix, each of these hypervisors provide unique capabilities within an Apache CloudStack 4.4 based cloud.
In this talk Ben will walk you through running Cassandra in a docker environment to give you a flexible development environment that uses only a very small set of resources, both locally and with your favorite cloud provider. Lessons learned running Cassandra with a very small set of resources are applicable to both your local development environment and larger, less constrained production deployments.
PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.
Cloudstack is an open source Infrastructure-as-a-Service (IaaS) software platform available under the GPLv3 license, which enables users to build, manage and deploy compute cloud environments. The community edition is based on the latest, leading edge features and bits that the Cloud.com team of engineers are working on and is supported by our open source community.
Using CloudStack a free and open source cloud computing software to build a private cloud. During the training attendees will be instructed on how to install Cloudstack to manage virtual infrastructure in a private cloud computing configuration. At the conclusion of the Build a Private Cloud section users will have the knowledge needed to create a simple private cloud computing environment.
This session will examine the many options the data scientist has for running Spark clusters in public and private clouds. We will discuss various environments employing AWS, Mesos, containers, docker, and BlueData EPIC technologies and the benefits and challenges of each.
Speakers:
Tom Phelan, Co-founder and Chief Architect - BlueData Inc. Tom has spent the last 25 years as a senior architect, developer, and team lead in the computer software industry in Silicon Valley. Prior to co-founding BlueData, Tom spent 10 years at VMware as a senior architect and team lead in the core R&D Storage and Availability group. Most recently, Tom led one of the key projects – vFlash, focusing on integration of server-based Flash into the vSphere core hypervisor. Prior to VMware, Tom was part of the early team at Silicon Graphics that developed XFS, one of the most successful open source file systems. Earlier in his career, he was a key member of the Stratus team that ported the Unix operating system to their highly available computing platform. Tom received his Computer Science degree from the University of California, Berkeley.
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
This session will introduce the basics of primary storage in CloudStack. Additionally, I discuss the challenges of guaranteeing storage performance in a cloud and how by leveraging the latest enhancements to CloudStack, storage administrators can deliver consistent, repeatable performance to 10s, 100s or 1,000s of application workloads in parallel. I'll review the CloudStack enhancements in detail, outline the management benefits they provide and discuss common go-to-market approaches.
About Mike Tutkowski
Mike Tutkowski, a member of the CloudStack PMC, develops software for the Apache Software Foundation's CloudStack project to help drive improvements in its storage component and to integrate SolidFire more deeply into the product.
Using Ansible to deploy a 6-node Hortonworks Data Platform (hadoop) cluster on AWS with the ObjectRocket ansible-hadoop playbook.
Presented at the Ansible NOVA MeetUp on February 23, 2017: https://www.meetup.com/Ansible-NOVA/events/236853616/
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
Deploying OpenStack in production at any scale, upgrade support is one of the requirements to have a successful deployment. Without upgrade management, adeployment will have bugs and security issues from day 1. Also in longer term, it will miss the latest features that OpenStack offers.
Hypervisor Selection in Apache CloudStack 4.4Tim Mackey
Building an infrastructure as a service cloud involves a number of technology decisions, many of which could have unforeseen impact. Hypervisors form the core of an IaaS cloud, and whether you are a fan of Microsoft Hyper-V, VMware vSphere, KVM in any Linux variant or XenServer from Citrix, each of these hypervisors provide unique capabilities within an Apache CloudStack 4.4 based cloud.
In this talk Ben will walk you through running Cassandra in a docker environment to give you a flexible development environment that uses only a very small set of resources, both locally and with your favorite cloud provider. Lessons learned running Cassandra with a very small set of resources are applicable to both your local development environment and larger, less constrained production deployments.
PPTV is using CloudStack 3.0.2 in its production environment. Currently there are more than 150 hosts, and migrate their apps to cloud everyday (10 host per day). At the end of 2013, there will be more than 1000 hosts in a CloudStack environment.
Cloudstack is an open source Infrastructure-as-a-Service (IaaS) software platform available under the GPLv3 license, which enables users to build, manage and deploy compute cloud environments. The community edition is based on the latest, leading edge features and bits that the Cloud.com team of engineers are working on and is supported by our open source community.
Using CloudStack a free and open source cloud computing software to build a private cloud. During the training attendees will be instructed on how to install Cloudstack to manage virtual infrastructure in a private cloud computing configuration. At the conclusion of the Build a Private Cloud section users will have the knowledge needed to create a simple private cloud computing environment.
Managing Docker Containers In A Cluster - Introducing KubernetesMarc Sluiter
Containerising your applications with Docker gets more and more attraction. While managing your Docker containers on your developer machine or on a single server is not a big hassle, it can get uncomfortable very quickly when you want to deploy your containers in a cluster, no matter if in the cloud or on premises. How do you provide high availability, scaling and monitoring? Fortunately there is a rapidly growing ecosystem around docker, and there are tools available which support you with this. In this session I want to introduce you to Kubernetes, the Docker orchestration tool started and open sourced by Google. Based on the experience with their data centers, Google uses some interesting declarative concepts like pods, replication controllers and services in Kubernetes, which I will explain to you. While Kubernetes still is a quite young project, it reached its first stable version this summer, thanks to many contributions by Red Hat, Microsoft, IBM and many more.
This presentation describes how hortonworks is delivering Hadoop on Docker for a cloud-agnostic deployment approach which presented in Cisco Live 2015.
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
The recently launched HDP 2.3 is a major advancement of Open Enterprise Hadoop. It represents the best of community led development with innovations spanning Apache Hadoop, Apache Ambari, Ranger, HBase, Spark and Storm. In this session we will provide an in-depth overview of new functionality and discuss it's impact on new and ongoing big data initiatives.
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
Part 3 of 3 of series focusing on the infrastructure aspect of getting started with Big Data. This presentation demonstrates how to use Apache Whirr to launch a Hadoop cluster on Amazon EC2--easily.
Presented at the Boston Predictive Analytics Big Data Workshop, March 10, 2012. Sample code and configuration files are available on github.
Higher order infrastructure: from Docker basics to cluster management - Nicol...Codemotion
The container abstraction hit the collective developer mind with great force and created a space of innovation for the distribution, configuration and deployment of cloud based applications. Now that this new model has established itself work is moving towards orchestration and coordination of loosely coupled network services. There is an explosion of tools in this arena at different degrees of stability but the momentum is huge. On the above premise this session we'll give an overview of the orchestration landscape and a (semi)live demo of cluster management using a sample application.
From Monolith to Docker Distributed ApplicationsCarlos Sanchez
Docker is revolutionizing the way people think about applications and deployments. It provides a simple way to run and distribute Linux containers for a variety of use cases, from lightweight virtual machines to complex distributed micro-services architectures.
Containers allow to run services in isolation with a minimum performance penalty, increased speed, easier configuration and less complexity, making it ideal for continuous integration and continuous delivery based workloads. But migrating an existing application to a distributed microservices architecture is no easy task, requiring a shift in the software development, networking and storage to accommodate the new architecture.
We will provide insight on our experience creating a Jenkins platform based on distributed Docker containers running on Apache Mesos and Marathon, applicable for all types of applications, but specially Java and JVM based nones.
Azure: Docker Container orchestration, PaaS ( Service Farbic ) and High avail...Alexey Bokov
Deep dive into Azure cloud technologies including common considerations about technology choices and then going deep into some of them. First we start from Azure Container Service and Docker containers orchestration by using Mesos or Swarm. Next part is about PaaS v2 which called Azure Service Fabric - crash course and deep dive into some parts of SF. After that we going through high Availability and Disaster Recovery in Azure:
- Azure DNS - cloud API for DNS records hosting
- Traffic Manager – load balancing and fault-tolerance on DNS level
- Azure Load Balancer – load balancing on transport level
-Application Gateway – load balancing on application level
Last part of deck is about IaaS based services and some updates for storage service:
* Azure Batch for computational tasks
* VM Scale sets
* Storage - managed disks and cool storage
From Monolith to Docker Distributed ApplicationsCarlos Sanchez
Docker is revolutionizing the way people think about applications and deployments. It provides a simple way to run and distribute Linux containers for a variety of use cases, from lightweight virtual machines to complex distributed microservice architectures. But migrating an existing Java application to a distributed microservice architecture is no easy task, requiring a shift in the software development, networking, and storage to accommodate the new architecture. This presentation provides insights into the experience of the speaker and his colleagues in creating a Jenkins platform based on distributed Docker containers running on Apache Mesos and Marathon and applicable to all types of applications, especially Java- and JVM-based ones.
Docker is a key player in the microservices movement and is arguably the leader in containerization technology.
That said, there are many ways to “do Docker”.
Between the leading cloud providers AWS, Azure, and Google; plus other platform stacks like Docker/Swarm, Apache Mesos – DC/OS, and Kubernetes; it can get confusing.In this session, Michele will bring her customer experiences building solutions across most of these platforms – to provide you with the highlights, the architecture topologies, and some perspective on the way she helps her customers choose the right platform for their cloud, on premise or hybrid solutions.
Neutron Done the SDN Way
Dragonflow is an open source distributed control plane implementation of Neutron which is an integral part of OpenStack. Dragonflow introduces innovative solutions and features to implement networking and distributed network services in a manner that is both lightweight and simple to extend, yet targeted towards performance-intensive and latency-sensitive applications. Dragonflow aims at solving the performance
Using the Azure Container Service in your companyJan de Vries
We know containers can solve some problems for us, but how should they be deployed within Azure. The Azure Container Service can be used to host your monolith solution, micro-services and everything in between.
In this session we will create multiple containers, deploy them using the Azure Container Service and see how this service can provide us with enough management information to use in a professional environment. We will also cover some best practices on setting up such a solution and how you can migrate your existing software solutions.
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments for Big Data workloads using Kafka poses some challenges – including container management, scheduling, network configuration and security, and performance.
In this session at Kafka Summit in August 2017, Nanda Vijyaydev of BlueData shared lessons learned from implementing Kafka-as-a-Service with Docker containers.
https://kafka-summit.org/sessions/kafka-service-docker-containers
A new movement is taking cloud by storm; Docker is evolving the way services are deployed by organizations so that they can operate more efficiently at scale — both in the cloud and on bare metal. In the same way shipping containers revolutionized the cargo industry, cheap, zero-penalty Linux Containers (LXC) are like shrink-wrapped VMs but without the fat. What’s not obvious, however, is how to roll your own Docker deployments and all tools you’ll need to leverage along the way.
This discussion will cover:
• Principles of Immutable Infrastructure
• Docker Basics
• Docker for Dev & QA
• Docker in Production
• Business Drivers
• Answering the Question: Is Docker Ready for Prime Time?
Docker Online Meetup #28: Production-Ready Docker SwarmDocker, Inc.
presented by Alexandre Beslic (@abronan)
Swarm v1.0 is now ready for running your apps in production!
Swarm is the easiest way to run Docker applications at large scale on a cluster. It turns a pool of Docker Engines into a single, virtual Engine. You don’t have to worry about where to put containers, or how they’re going to talk to each other - it just handles all that for you.
We’ve spent the last few months tirelessly hardening and tuning it, and in combination with multi-host networking and the new volume system in Docker Engine 1.9, we can confidently say that it’s ready for running your apps in production. In our tests, we’ve been running Swarm on EC2 with 1,000 nodes and 30,000 containers and it keeps on scheduling containers in less than half a second. Not even breaking a sweat! Keep an eye for a blog post soon with the full details.
Read more: http://blog.docker.com/2015/11/swarm-1-0/
Docker, cornerstone of cloud hybridation ? [Cloud Expo Europe 2016]Adrien Blind
The following talk discusses the opportunity to leverage on docker to create an hybrid logical cloud built simultaneously on top of traditionnal datacenters and public cloud vendors and enabling to manage new kind of containers (Windows, linux over ARM). It also discusses the value of such capacity for applications in a contexte of topology orchestrations and micro service oriented applications.
Docker, cornerstone of an hybrid cloud?Adrien Blind
In this presentation, I propose to explore the orchestration & hybridation potential raised by Docker 1.12 Swarm Mode and the subsequent benefits.
I'll first remind why docker fits well the microservices paradigms, and how does this architecture engender new challenges : service discovery, app-centric security, scalability & resilience, and of course, orchestration.
I'll then discuss the opportunity to create your own docker CaaS platform hybridating simultaneously on various cloud vendors & traditional datacenters, better than just leveraging on vendors integrated offers.
Finally, I'll discuss the rise of new technologies (Windows containers, ARM architectures) in the docker landscape, and the opportunity of integrating them in a global docker composite orchestration, enabling to depict globally complex apps.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
9. HADOOP PROVISIONG ISSUES
Each cloud provider has a proprietary API
Create images for each provider
Network configuration
Service discovery
Resize, failover, member join support
10. OUR APPROACH – DETAILS
Build your Docker image
Install or pre-install Hadoop services with Ambari
Install Serf and dnsmasq
Build your cloud image
Use Ansible to create an image
Provision the cluster
11. BUILD DOCKER IMAGES
Create the Dockerfile
Have Docker.io to build the image
Optionally pre-install services
Use Ambari
Push image to Docker.io
Licensing questions
12. BUILD CLOUD IMAGES
Use a Docker ready base image
Use Ansible to provision the image template
Pull the Docker images
Apply custom infrastructure
Use cloud provider specific playbooks
AWS EC2
Azure
13. ANSIBLE
Configuration as data
Simplest way to automate IT
Secure and agentless
Goal oriented
One playbook – multiple modules
We use it to “burn” cloud images/templates
14. PROVISIONING – ISSUES
FQDN
/etc/hosts is read-only in Docker
Everybody needs to know everybody
DNS
Single point of failure
Dynamic cluster – nodes joining, leaving, failing
Routing
Cloud – ability to inter-host container routing
Collision free private IP range for Docker bridge
15. PROVISIONING – SOLUTION
FQDN
Use –h and –dns Docker params
DNS
dnsmasq is running on each Docker container
Serf member-xxx events trigger dnsmasq reconfiguration
Routing
Docker bridge configuration – follows a convention
16. SERF
Gossip based membership
Service discovery
Decentralized
Lightweight, fault tolerant
Highly available
DevOps friendly
Keep an eye on Consul, Open vSwitch, pipework
17. SERF – DECENTRALIZED SERVICE DISCOVERY
Gossip instead of heartbeat
LAN, WAN profiles
Provides membership information
Event handlers: member_join, member_leave, member_failed, member-
update, member-reap, user
Query
21. AWS EC2 – HADOOP CLUSTER
Use EC2 REST API to provision instances (from Dockerized image)
Start Docker containers
One Ambari server
N-1 Ambari agents connecting to server
Connect ambari-shell to
Define blueprint
Provision the cluster
23. AWS EC2 - CLOUDFORMATION
Manually set up VPC is too complicated
Use CloudFormation
Manage the stack together
Template-based
Environments under version control
Customizable at runtime
No extra charge
"VpcId" : {
"Type" : "String",
"Description" : "VpcId of your existing Virtual Private Cloud (VPC)"
},
"SubnetId" : {
"Type" : "String",
"Description" : "SubnetId of an existing subnet (for the primary
network) in your Virtual Private Cloud (VPC)"
},
"SecondaryIPAddressCount" : {
"Type" : "Number",
"Default" : "1",
"MinValue" : "1",
"MaxValue" : "5",
"Description" : "Number of secondary IP addresses to assign to the
network interface (1-5)",
"ConstraintDescription": "must be a number from 1 to 5."
},
"SSHLocation" : {
"Description" : "The IP address range that can be used to SSH to the
EC2 instances",
"Type": "String",
"MinLength": "9",
"MaxLength": "18",
"Default": "0.0.0.0/0",
"AllowedPattern": "(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})/
(d{1,2})",
"ConstraintDescription": "must be a valid IP CIDR range of the form
x.x.x.x/x."
}
},
24. CLOUDBREAK
Cloudbreak is a powerful left surf that
breaks over a coral reef, a mile off
southwest the island of Tavarua, Fiji.
Cloudbreak is a cloud-agnostic
Hadoop as a Service API. Abstracts
the provisioning and ease
management and monitoring of on-
demand clusters.
Provisioning Hadoop has never been easier
25. CLOUDBREAK
Benefits
Elastic
Scalable
Blueprints
Flexible
Main REST resources
/template – specify a cluster infrastructure
/stack – creates a cloud infrastructure built from a template
/blueprint – describes a Hadoop cluster
/cluster – creates a Hadoop cluster
26. RESULTS AND ACHIEVEMENTS
Hadoop as a Service API
Available for EC2 and Azure cloud
OpenStack, bare metal is coming soon
Open source under Apache 2 licence
Same goals as Apache Ambari Launchpad project
What's next?
27. HADOOP SERVICES - AS A SERVICE
Leverage YARN
Slider (Hoya) providers
HBase, Accumulo
SequenceIQ providers - Flume, Tomcat
YARN -1964
QoS for YARN – heuristic scheduler
Platform as a Service API
28. BANZAI PIPELINE
Banzai Pipeline is a surf reef break located
in Hawaii, off Ehukai Beach Park in
Pupukea on O'ahu's North Shore.
Banzai Pipeline is a RESTful
application development
platform for building on-
demand data and job pipelines
running on Hadoop YARN.
Banzai Pipeline is a big data API for the REST
29. THANK YOU
Get the code: https://github.com/sequenceiq
Read about: http://blog.sequenceiq.com
Facebook: http://facebook.com/sequenceiq
Twitter: http://twitter.com/sequenceiq
LinkedIn: http://linkedin.com/sequenceiq
Contact: janos.matyas@sequenceiq.com
FEEL FREE TO CONTRIBUTE
Editor's Notes
Thanks for coming – today will talk about Docker based Hadoop provisioning.
Quick introduction of who we are - Young startup, from Budapest, Hungary. Janos Matyas – CTO, open source contributor, Hadoop YARN evangelist.
Why we have started this at all – there are so many options.
We repeated the same steps over and over – and scripted. Still, we felt that there is something missing.
See bullet points
Been through many different approaches. Bare metal, cloud VM, so on – ended up using Docker.
Tested many provisioning frameworks – Ambari is the one.
Quick question - How many of you have used Docker before.
Docker is a container based virtualization framework. Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.
I can run 5-6 containers – less overhead than 1 virtualbox. No SOCKS proxy, etc.
The ‘provisioning’ framework. No need to enter details, there were pretty good sessions about Ambari.
Blueprints 1.5.1 tech preview, 1.6 fully supported. Blueprint = stack definition + component layout.
REST API – we have created, open sourced Ambari client + shell (come and join the Ambari Meetup today at 3:30)
Now, the issues.
Do it again and again – for each cloud provider.
Create the image – but how do you know what’s the requirement, building an image each and every time?
Network – this is a big issue. EC2 has API, Azure his own. Open Stack has a network as a service component – Neutrom. SDN – Software define network!!!
Everything is dynamic – how do you do service discovery?
Extra features – fully dynamic Hadoop cluster.
Will expand on these shortly.
Sounds too easy – lets get into details.
A Docker image is described by a Dockerfile – like a Vagrant file for virtualbox for example.
You want trusted build – use Docker.io
Faster provisioning – a 100+ node Hadoop cluster in less than 5 minutes? Come and join the Ambari meetup.
Licensing –Ganglia or Nagios (BSD and GPL). Hortonworks Hadoop – Apache 2
Bigtop is coming…
Amazon Linux – Redhat based – recently is Docker ready. OpenStack stack Nova hypervisor supports Docker.
Apply the network and other infrastructure relates stuff.
Remember the licensing – use our Ansible script to build your cloud image. Or modify.
IT automation war - Ansible vc Chef, Puppet.
Ansible configurations are simple data descriptions of your infrastructure (both human-readable and machine-parsable).
Needs only SSH.
Dev – env : use default Docker bridge (easy)
All talks to each other
DNS – heavy management overweight
-h for hostname, --dns to specify the DNS service to use
Convention: AMI launch index
Serf is a decentralized solution for cluster membership, failure detection and orchestration.
Serf, Zookeeper, etcd, doozerd. All three have server nodes that require a quorum of nodes to operate – strong consistency.
Serf - eventual consistency
Most important thing is that gossip based – will expand shortly.
Decentralized – all nodes are equal.
Fire and forget
Waits for anwer – limited response collection.
Custom event handlers
Tags – e.g. Ambari server, hostgroups, etc
Load increases – how to cluster knows that there is a new member.
Running on each Docker container – updated by SERF events.
Amazon supports Docker natively.
Start N number of nodes. Pass our userdata script .at startup.
Start the containers – they will know about each other using Serf.
Shell or REST API or Ambari UI.
You need security – strongly recommended use your VPC instead of default VPC.
Use different availability zones for maximum uptime.
Who did VPS knows – can be scripted. It is harder to decommision / change / delete than add components.
Use CloudFormation.
This is a very easy but still error prone process – though it helps a let.
We build an API on top, and automated the whole process.
We are not a Service Provider – this is an API.
Elastic – arbitrary number of nodes.
Scalable – follow your workload change.
Blueprints – supports different cluster blueprints
Flexible – Use your favorite cloud, bring your own Hadoop – one common API
One API – any size, anywhere.
Why we needed Cloudbreak – this is not the end of the story.
We wanted to have a Platform as a Service API.
We are YARN evangelists – wanted to run everything on YARN.
Community driven.
Heuristic scheduler.
A fully dynamic big data pipeline.
Build your pipeline, run dynamically / on demand. All pre-coded, zero coding, only configuration.
Data pipeline – run services on demand, short or long term. Start when needed, stoped when is idle. Apply ETL on demand.
Job pipeline – all major ML are supported (Mahout, Mllib), and 44 other MR jobs (correlations, joins, summarizations, filtering, sort, sharding, shuffle)
Streaming pipeline – Spark based
Custom SDK – abstracts the complexity behind MR and Spark.
Subscribe to the Beta test.
Contribute.
We did contributions on several Apache and other open source projects.
Babilon at SequenceIQ; Java and Scala is the default. Groovy is used very often. Than Go – Docker + Serf – we had to learn Go to fix things. Ansible for IT.
Strongly suggest to use Docker – we use it everywhere. CI/CD, cloud.
For a demo come and join the Ambari meetup.
Thanks for coming. Q&A. Join me after or follow us through one of the social medias listed.