This document provides information on building a high performance computing cluster, including definitions of supercomputers, why they are needed, types of supercomputers, and steps for building a cluster. It outlines identifying the application, selecting hardware and software components, installation, configuration, testing, and maintenance. Homemade and commercial clusters are compared, and opportunities for generating revenue from clusters are discussed. Additional online resources for learning more are provided at the end.
Ironic is a modern open-source tool for hardware provisioning. Combining a RESTful API, a scale-out control plane, and pluggable hardware drivers for both in- and out-of-band management, Ironic installs operating systems in a fast, efficient, and reliable fashion.
In fact, Ironic does not “install” an operating system in the traditional sense – it doesn’t use a kickstart/preseed file or an ISO image. Instead, compressed machine images are copied onto each host, and a minimal configuration (IP, host name, SSH keys) is applied at first boot. This guarantees the consistency of the initial state of each machine in a way that traditional installers do not. Bonus: it’s also faster!
With a vibrant community of developers from the most popular server hardware vendors, Ironic’s support for many of the latest and greatest management technologies is coming directly from the creators of these technologies. Meanwhile, the project’s leaders work to create a common abstraction layer that provides a consistent experience across all supported hardware. But Ironic is still a young project – it was only started in 2013 – and there is much on the roadmap.
In this session, Devananda will demonstrate how to install Ironic with Ansible, modify a cloud image for bare metal, and deploy it to a server. He will discuss the history and architecture of the project, and its current goals and challenges. Attendees should be familiar with the task of hardware provisioning and standards like PXE and IPMI, but do not need deep knowledge of related tools.
Automated Out-of-Band management with Ansible and RedfishJose De La Rosa
Ansible is an open source automation engine that automates complex IT tasks such as cloud provisioning, application deployment and a wide variety of system administration tasks. It is a one-to-many agentless mechanism where complex deployment tasks can be controlled and monitored from a central control machine.
Redfish is an open industry-standard specification and schema designed for modern and secure management of platform hardware. On Dell EMC PowerEdge servers the Redfish management APIs are available via the integrated Dell Remote Access Controller (iDRAC), which can be used by IT administrators to easily monitor and manage at scale their entire infrastructure using a wide array of clients on devices such as laptops, tablets and smart phones.
Together, Ansible and Redfish can be used by system administrators to fully automate at large scale server monitoring, provisioning and update tasks from one central location, significantly reducing complexity and helping improve the productivity and efficiency of IT administrators.
Ironic is a modern open-source tool for hardware provisioning. Combining a RESTful API, a scale-out control plane, and pluggable hardware drivers for both in- and out-of-band management, Ironic installs operating systems in a fast, efficient, and reliable fashion.
In fact, Ironic does not “install” an operating system in the traditional sense – it doesn’t use a kickstart/preseed file or an ISO image. Instead, compressed machine images are copied onto each host, and a minimal configuration (IP, host name, SSH keys) is applied at first boot. This guarantees the consistency of the initial state of each machine in a way that traditional installers do not. Bonus: it’s also faster!
With a vibrant community of developers from the most popular server hardware vendors, Ironic’s support for many of the latest and greatest management technologies is coming directly from the creators of these technologies. Meanwhile, the project’s leaders work to create a common abstraction layer that provides a consistent experience across all supported hardware. But Ironic is still a young project – it was only started in 2013 – and there is much on the roadmap.
In this session, Devananda will demonstrate how to install Ironic with Ansible, modify a cloud image for bare metal, and deploy it to a server. He will discuss the history and architecture of the project, and its current goals and challenges. Attendees should be familiar with the task of hardware provisioning and standards like PXE and IPMI, but do not need deep knowledge of related tools.
Automated Out-of-Band management with Ansible and RedfishJose De La Rosa
Ansible is an open source automation engine that automates complex IT tasks such as cloud provisioning, application deployment and a wide variety of system administration tasks. It is a one-to-many agentless mechanism where complex deployment tasks can be controlled and monitored from a central control machine.
Redfish is an open industry-standard specification and schema designed for modern and secure management of platform hardware. On Dell EMC PowerEdge servers the Redfish management APIs are available via the integrated Dell Remote Access Controller (iDRAC), which can be used by IT administrators to easily monitor and manage at scale their entire infrastructure using a wide array of clients on devices such as laptops, tablets and smart phones.
Together, Ansible and Redfish can be used by system administrators to fully automate at large scale server monitoring, provisioning and update tasks from one central location, significantly reducing complexity and helping improve the productivity and efficiency of IT administrators.
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
Heterogeneous computing refers to systems that use more than one kind of processor and direct applications to run in the processor that is the most efficient for that specific task. Power Systems servers based on the POWER8 processor support several accelerators that are integrated into the system to improve the efficiency of an application.
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Lars Marowsky-Brée
A presentation discussing various aspects that affect performance of Ceph clusters, and how to map, model, and predict their performance.
This lays the groundwork for building a Ceph cluster measurement and benchmark suite that eventually will build up a data corpus on performance characteristics that can be used to answer these key questions:
- How to build a storage system that meets my requirements?
- If I build a system like this, what will its characteristics be?
- If I change XY in my existing system, how will its characteristics change?
Building reliable Ceph clusters with SUSE Enterprise StorageLars Marowsky-Brée
This tutorial was presented by Lars Marowsky-Brée at SUSECon 2016 in Washington, DC (TUT91787). It covers real world survival skills and considerations in architecting, deploying, and operating Ceph clusters to deliver Software-Defined-Storage in the business world for block, file, and object storage.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Dustin Franklin (GPGPU Applications Engineer, GE Intelligent Platforms ) presents:
"GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density."
Learn more at: http://www.gputechconf.com/page/home.html
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuMarco Obinu
Slides presented during HomeGen by CloudGen Verona, about how to properly size an Azure IaaS VM, with an additional focus on high availability and cost-saving topics.
Session recording: https://youtu.be/C8v6c6EkJ9A
Demo: https://github.com/OmegaMadLab/SqlIaasVmPlayground
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
Heterogeneous computing refers to systems that use more than one kind of processor and direct applications to run in the processor that is the most efficient for that specific task. Power Systems servers based on the POWER8 processor support several accelerators that are integrated into the system to improve the efficiency of an application.
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Lars Marowsky-Brée
A presentation discussing various aspects that affect performance of Ceph clusters, and how to map, model, and predict their performance.
This lays the groundwork for building a Ceph cluster measurement and benchmark suite that eventually will build up a data corpus on performance characteristics that can be used to answer these key questions:
- How to build a storage system that meets my requirements?
- If I build a system like this, what will its characteristics be?
- If I change XY in my existing system, how will its characteristics change?
Building reliable Ceph clusters with SUSE Enterprise StorageLars Marowsky-Brée
This tutorial was presented by Lars Marowsky-Brée at SUSECon 2016 in Washington, DC (TUT91787). It covers real world survival skills and considerations in architecting, deploying, and operating Ceph clusters to deliver Software-Defined-Storage in the business world for block, file, and object storage.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Dustin Franklin (GPGPU Applications Engineer, GE Intelligent Platforms ) presents:
"GPUDirect support for RDMA provides low-latency interconnectivity between NVIDIA GPUs and various networking, storage, and FPGA devices. Discussion will include how the CUDA 5 technology increases GPU autonomy and promotes multi-GPU topologies with high GPU-to-CPU ratios. In addition to improved bandwidth and latency, the resulting increase in GFLOPS/watt poses a significant impact to both HPC and embedded applications. We will dig into scalable PCIe switch hierarchies, as well as software infrastructure to manage device interopability and GPUDirect streaming. Highlighting emerging architectures composed of Tegra-style SoCs that further decouple GPUs from discrete CPUs to achieve greater computational density."
Learn more at: http://www.gputechconf.com/page/home.html
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuMarco Obinu
Slides presented during HomeGen by CloudGen Verona, about how to properly size an Azure IaaS VM, with an additional focus on high availability and cost-saving topics.
Session recording: https://youtu.be/C8v6c6EkJ9A
Demo: https://github.com/OmegaMadLab/SqlIaasVmPlayground
* Know the reasons why various operating systems exist and how they are functioned for dedicated purposes
* Understand the basic concepts while building system software from scratch
• How can we benefit from cheap ARM boards and the related open source tools?
- Raspberry Pi & STM32F4-Discovery
Presented at NSA User Group. Steps through recent activities and technologies in use across NSA and the IC. Specifically mentions data ingress/egress with JBoss Messaging and MRG-M, storage of data with XFS and GFS, and data presentation capabilities with JBoss Enterprise Middleware Portfolio. 15-20min on Security Automation with SCAP.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
In this deck from the Stanford HPC Conference, Mohan Potheri from VMware presents: Sharing High-Performance Interconnects Across Multiple Virtual Machines.
"Virtualized devices offer maximum flexibility: sharing of hardware between virtual machines, the use of VMware vMotion to handle migration and take snapshots. However, when performance is the most critical requirement there are other options. VMware Direct Path I/O delivers excellent performance, but only for a single virtual machine. Single root I/O virtualization (SR-IOV), on the other hand, offers the performance of pass-through mode while allowing devices to be shared by multiple virtual machines.
This session introduces SR-IOV, explains how it is enabled in VMware vSphere, and provides details of specific use cases that important for machine learning and high-performance computing. It includes performance comparisons that demonstrate the benefits of SR-IOV and information on how to configure and tune these configurations."
Watch the video: https://youtu.be/-iYYmsBw8SU
Learn more: https://www.vmware.com
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...Haidee McMahon
For details on Intel's Out of The Box Network Developers Ireland meetup, goto https://www.meetup.com/Out-of-the-Box-Network-Developers-Ireland/events/237726826/
Intel Talk : Enhanced Platform Awareness for Openstack to increase NFV performance
By Andrew Duignan
Bio: Andrew Duignan is an Electronic Engineering graduate from University College Dublin, Ireland. He has worked as a software engineer in Motorola and now at Intel Corporation. He is now in a Platform Applications Engineering role, supporting technologies such as DPDK and virtualization on Intel CPUs. He is based in the Intel Shannon site in Ireland.
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
For many years, the Xen community has been delivering a solid virtualization platform for the enterprise. In support of the Xen community innovation effort, Oracle has been translating our enterprise experience with mission-critical workloads and large-scale infrastructure deployments into upstream contributions for the Linux and Xen efforts. In this session, you'll hear from a key Oracle expert, and community member, about Oracle contributions that focus on large-scale Xen deployments, networking, PV drivers, new PVH architecture, performance enhancements, dynamic memory usage with ‘tmem', and much more. This is your chance to get an under the hood view and see why the Xen architecture is the ideal choice for the enterprise.
Klepsydra Streaming Distribution Optimiser (SDO):
• • • •
•
Runs on a separate computer
Executes several dry runs on the OBC
Collect statistics
Runs a genetic algorithm to find the optimal solution for latency, power or throughput
The main variable to optimise is the distribution of layers are the two dimension of the threading model.
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
Delivery of a new Bio-informatics infrastructure at the Wellcome Trust Sanger Center. We include how to programatically create, manage and provide providence for images used both at Sanger and elsewhere using open source tools and continuous integration.
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
The ever-growing continuous influx of data causes every component in a system to burst at its seams. GPUs and ASICs are helping on the compute side, whereas in-memory and flash storage devices are utilized to keep up with those local IOPS. All of those can perform extremely well in smaller setups and under contained workloads. However, today's workloads require more and more power that directly translates into higher scale. Training major AI models can no longer fit into humble setups. Streaming ingestion systems are barely keeping up with the load. These are just a few examples of why enterprises require a massive versatile infrastructure, that continuously grows and scales. The problems start when workloads are then scaled out to reveal the hardships of traditional network infrastructures in coping with those bandwidth hungry and latency sensitive applications. In this talk, we are going to dive into how intelligent hardware offloads can mitigate network bottlenecks in Big Data and AI platforms, and compare the offering and performance of what's available in major public clouds, as well as a la carte on-premise solutions.
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
Nowadays system administrators have great choices when it comes down to Linux performance profiling and monitoring. The challenge is to pick the appropriate tools and interpret their results correctly.
This talk is a chance to take a tour through various performance profiling and benchmarking tools, focusing on their benefit for every sysadmin.
More than 25 different tools are presented. Ranging from well known tools like strace, iostat, tcpdump or vmstat to new features like Linux tracepoints or perf_events. You will also learn which tools can be monitored by Icinga and which monitoring plugins are already available for that.
At the end the goal is to gather reference points to look at, whenever you are faced with performance problems.
Take the chance to close your knowledge gaps and learn how to get the most out of your system.
OSDC 2017 | Open POWER for the data center by Werner FischerNETWAYS
IBM's POWER (Performance Optimization With Enhanced RISC) architecture is known to run mission-critical applications and to provide bank-style "RAS" (Reliability, Availability, Serviceability) features since 1990. Opening the architecture in 2013 enabled other vendors like Tyan or Rackspace to build servers based on the current POWER8 edition of this architecture. The current POWER8 CPUs provide up to 12 cores with 8x Simultaneous Multithreading - leading to 96 threads per CPU. Up to eight memory channels enable up to 230 GB/s memory bandwidth per CPU. Increased L1, L2, L3 and new L4 caches help to boost the performance of memory-bound applications like databeses, by providing more than 1 TB/s of bandwidth. In this talk Werner will give an overview of the architecture and show the performance possibilities of POWER8, using the PostgreSQL database as an example. By comparing PostgreSQL 9.4, 9.5 and 9.6 benchmarking results he will visualize the increased efficiency thanks to PowergreSQL's optimizations for POWER over the last years. Finally, he will outline one other benefit of OpenPOWER systems: from the very beginning (the first instruction to initialize the first CPU core, long before DRAM, firmware management or PCIe works) up to running your Linux OS and application like a database, only open source code gets executed.
OSDC 2017 - Werner Fischer - Open power for the data centerNETWAYS
IBM's POWER (Performance Optimization With Enhanced RISC) architecture is known to run mission-critical applications and to provide bank-style "RAS" (Reliability, Availability, Serviceability) features since 1990. Opening the architecture in 2013 enabled other vendors like Tyan or Rackspace to build servers based on the current POWER8 edition of this architecture. The current POWER8 CPUs provide up to 12 cores with 8x Simultaneous Multithreading - leading to 96 threads per CPU. Up to eight memory channels enable up to 230 GB/s memory bandwidth per CPU. Increased L1, L2, L3 and new L4 caches help to boost the performance of memory-bound applications like databeses, by providing more than 1 TB/s of bandwidth. In this talk Werner will give an overview of the architecture and show the performance possibilities of POWER8, using the PostgreSQL database as an example. By comparing PostgreSQL 9.4, 9.5 and 9.6 benchmarking results he will visualize the increased efficiency thanks to PowergreSQL's optimizations for POWER over the last years. Finally, he will outline one other benefit of OpenPOWER systems: from the very beginning (the first instruction to initialize the first CPU core, long before DRAM, firmware management or PCIe works) up to running your Linux OS and application like a database, only open source code gets executed.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Agenda Domain : High Performance Computing
Presentation Level : Beginner
Prerequisite : Familiarity with Linux
-What is a Supercomputer?
-Why do we need it?
-Types of Supercomputers
-The Recipe of building a cluster
-Basic Concepts
-Identifying the Application
-Selection of Raw Materials
-Preparation
-Configuration
-Deployment
-Testing
-Maintenance
-Home made vs. commercial clusters
-Making money from clusters
-Other resources and links
3. What is a Supercomputer?
“ An extremely fast computer that can perform hundreds of millions of instructions
per second.”
-A powerful system built from collection of special purpose hardware
-It is designed for a specific application
-Processing power range is very high
-There is no standard for supercomputer specification
-Works on parallel processing schema
5. Indian Supercomputers
PARAM Padma is C-DAC's next
generation high performance scalable
computing cluster, currently with a peak
computing power of One Teraflop.
KABRU is a 144 node (Xeon DP) Linux cluster.
Though it is a very fast supercomputer, it is not
the fastest in the world. With a sustained
performance of 1002.3 GFlops(reached on
October 13th 2004) of double precision
arithmetic it is the second fastest supercomputer
in India and the fastest supercomputer in India
belonging to an academic institution.
--IMSc
6. Why do we need it?
“Obviously, we need it for more processing power!!!”
- We use it where computation can be parallel.
- We use it where “divide and conquer” is prominent in algorithms
- We use it for High Performance/Availability computing
- We use it for distributed computing
In India, CDAC uses supercomputers for research in:
Bioinfomatics, Computational Structural Mechanics , Computational
Atmospheric Science, Evolutionary Computing , Computational Chemistry etc.
7. Types of Supercomputers
“Two broad categories: Tightly coupled parallel systems & Loosely Coupled
Clusters”
Modern supercomputing clusters:
- High performance (HP) clusters
- Load-leveling clusters
- Web-service clusters
- Storage clusters
- Database clusters
A special type is Single System Image (SSI) Clusters
8. The Recipe for building a cluster
“…before you make soup, you need hunger to enjoy it…”
Let’s brush up the basic concepts:
- Linux Installation Basics
- DHCP
- Network Boot (via PXE Boot or Etherboot)
- Interconnect
9. The Recipe for building a cluster
[Identifying the Application]
“Why would you need a supercomputer? Hey , I need it just for fun!”
Building a cluster for:
- High Performance (HP) need
- High Availability (HA) need
or
-“just need it for fun”
“There is really lot of fun when you write & test your algorithms on a cluster…”
10. The Recipe for building a cluster
[Selection of raw material]
“Innovators build great things from non-great elements !”
Selection of Hardware:
- Few number of old mother boards
- Enough number of processors to sit on the boards
- Minimum of 32Mb RAM per board
- Network support via On-Board or External NIC cards
- At least one Hard disk & CD-ROM Drive
- Either BIOS support for Network Boot or A floppy drive for each board
“mean to say, just get few boxes from your friends if you feel lazy to build your own
hardware…”
11. The Recipe for building a cluster
[Selection of raw material]
“Innovators build great things from non-great elements !”
Selection of software:
- The OpenSSI (http://www.openssi.org)
- OSCAR (http://oscar.sourceforge.net/)
- TFTP
- Etherboot (http://rom-o-matic.net/5.2.4/ )
“mean to say, just get few boxes from your friends if you feel lazy to build your own
hardware…”
12. The OpenSSI
“the most fantastic product I have ever seen !”
It has internode communication, clusterwide process management, clusterwide devices, a cluster
filesystem, clusterwide IPC (pipes, fifos, msgqueues, semaphores, etc.) and clusterwide tcp/ip networking.
13. The Recipe for building a cluster
[Preparation]
“Question : Dedicated Cluster or Temporary Cluster ?”
- A clean install of base OS (Fedora Core 3)
on PCs that has bootable device
- A clean Network Configuration
14. The Recipe for building a cluster
[Configuration]
- Download and unpack OpenSSI (http://www.openssi.org)
- Go through the DOCS
- ./install does everything for year
Enter a clustername.
Enter a node number between 1 and 125.
Select a Network Interface Card (``NIC'') for the cluster interconnect.
Select (P)XE or (E)therboot as the network boot protocol for this node.
Select whether you want to enable root filesystem failover.
- For Adding nodes, run openssi-config-node. Select ``Add a new node''.
“ Remember that node 1 is called the init node…”
15. The Recipe for building a cluster
[Configuration]
- Essentials
# cluster -v
(To check the membership of nodes in cluster with their status)
# bash-ll
(The shell that performs load leveling.
/etc/sysconfig/loadlevellist can also be added to specify special processes to be loadlevelled and run
service loadlevel restart )
# ssi-ksync
(To rebuild the ramdisk to include the driver and update the network boot images)
# onnode <node_number> <command>
(To run a specific command on a specific node)
16. The Recipe for building a cluster
[Testing]
while(no_of_processes < 1000)
{
if (fork()==0)
{
run_relevent_algorithm();
exit(1);
}
no_of_processes++;
}
“…This piece of code can do wonders…Its fun to keep adding zeros in the loop...”
17. The Recipe for building a cluster
[Maintenance]
- Make sure there is no IP conflict if network is shared
- Perform regular efficiency audit on network
- Put proper firewall for security
“ you actually don’t need to bother much for maintenance…”
18. Home made vs. commercial clusters
“The obvious difference is in the looks…”
19. Making money from clusters
“Are’nt you interested in this…”
-Host web servers , file servers etc.
-Create supercomputer for fun, give access to yours friends
-Provide low-cost high performance computing facility to research institutes
-Convert offices and academic institutions into night-time research facility
20. Other resources and links
“go ahead find out more…”
Download this presentation and various other interesting things at:
http://www.parolkar.com/download.aspx
Other links:
http://www.openssi.org
http://www.beowulf.org
http://sourceforge.net/projects/ci-linux
http://linux-ha.org
http://www.openmosix.org