An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
This document provides an overview of the architecture of warehouse-scale computers (WSCs). It describes how WSCs consist of large numbers of standardized servers organized in racks and arrays. The servers communicate over an Ethernet network hierarchy with switches at the rack and array level. This network architecture provides high aggregate bandwidth and storage capacity but also increases latency for remote memory access compared to local server memory. The document outlines the key components and networking design of WSCs.
The document discusses intelligent placement of datacenters for internet services. It aims to minimize costs and environmental impacts by developing a framework to model datacenter characteristics, costs, incentives and select optimal locations. The approach uses simulated annealing combined with linear programming to evaluate solutions and optimize total costs, subject to constraints like response time and availability. Evaluating various locations shows smart placement can save millions. Future work includes testing with real service data and incentives from other regions.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
This document discusses how data is increasingly dominating high performance computing workloads. It notes that while computing power doubles every two years, data storage and movement capabilities are not keeping pace. This is leading to a "data tsunami" as experiments and simulations generate terabytes of data per day. The document then summarizes Sun Microsystems' end-to-end infrastructure for data-centric HPC workflows, including their Lustre parallel storage system, unified storage, tape archives, high performance computing blades, and InfiniBand switches. It positions Sun as uniquely able to deliver an integrated solution from computation to long-term data retention to help users cope with the challenges posed by rapidly growing datasets.
2012 benjamin klenk-future-memory_technologies-presentationSaket Vihari
1) Memory technologies like DRAM are hitting physical limits as transistors cannot scale further and require more power for refresh. 2) Emerging technologies like Phase Change Memory (PCM), Hybrid Memory Cube (HMC), Racetrack Memory, and Spin-Torque Transfer RAM (STTRAM) aim to provide higher capacity, bandwidth and lower power. 3) HMC in particular stacks DRAM vertically using through-silicon vias to provide more banks and higher bandwidth, but fabrication is costly. PCM provides much higher density but slower access times. Racetrack and STTRAM are promising but still require significant research to improve characteristics like access time and density.
Hybrid Memory Cubes offer Smart Designers and Buyers a Competitive AdvantageBill Kohnen
By placing intelligent memory on the same substrate as the processing unit, Hybrid Memory Cubes deliver optimum performance and a lower total cost of ownership.
An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
This document provides an overview of the architecture of warehouse-scale computers (WSCs). It describes how WSCs consist of large numbers of standardized servers organized in racks and arrays. The servers communicate over an Ethernet network hierarchy with switches at the rack and array level. This network architecture provides high aggregate bandwidth and storage capacity but also increases latency for remote memory access compared to local server memory. The document outlines the key components and networking design of WSCs.
The document discusses intelligent placement of datacenters for internet services. It aims to minimize costs and environmental impacts by developing a framework to model datacenter characteristics, costs, incentives and select optimal locations. The approach uses simulated annealing combined with linear programming to evaluate solutions and optimize total costs, subject to constraints like response time and availability. Evaluating various locations shows smart placement can save millions. Future work includes testing with real service data and incentives from other regions.
From Rack scale computers to Warehouse scale computersRyousei Takano
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
This document discusses how data is increasingly dominating high performance computing workloads. It notes that while computing power doubles every two years, data storage and movement capabilities are not keeping pace. This is leading to a "data tsunami" as experiments and simulations generate terabytes of data per day. The document then summarizes Sun Microsystems' end-to-end infrastructure for data-centric HPC workflows, including their Lustre parallel storage system, unified storage, tape archives, high performance computing blades, and InfiniBand switches. It positions Sun as uniquely able to deliver an integrated solution from computation to long-term data retention to help users cope with the challenges posed by rapidly growing datasets.
2012 benjamin klenk-future-memory_technologies-presentationSaket Vihari
1) Memory technologies like DRAM are hitting physical limits as transistors cannot scale further and require more power for refresh. 2) Emerging technologies like Phase Change Memory (PCM), Hybrid Memory Cube (HMC), Racetrack Memory, and Spin-Torque Transfer RAM (STTRAM) aim to provide higher capacity, bandwidth and lower power. 3) HMC in particular stacks DRAM vertically using through-silicon vias to provide more banks and higher bandwidth, but fabrication is costly. PCM provides much higher density but slower access times. Racetrack and STTRAM are promising but still require significant research to improve characteristics like access time and density.
Hybrid Memory Cubes offer Smart Designers and Buyers a Competitive AdvantageBill Kohnen
By placing intelligent memory on the same substrate as the processing unit, Hybrid Memory Cubes deliver optimum performance and a lower total cost of ownership.
This document discusses optimizations for TCP/IP networking performance on multicore systems. It describes several inefficiencies in the Linux kernel TCP/IP stack related to shared resources between cores, broken data locality, and per-packet processing overhead. It then introduces mTCP, a user-level TCP/IP stack that addresses these issues through a thread model with pairwise threading, batch packet processing from I/O to applications, and a BSD-like socket API. mTCP achieves a 2.35x performance improvement over the kernel TCP/IP stack on a web server workload.
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
The document summarizes the author's participation report at the IEEE CloudCom 2014 conference. Some key points include:
- The author attended sessions on virtualization and HPC on cloud.
- Presentations had a strong academic focus and many presenters were Asian.
- Eight papers on HPC on cloud covered topics like reliability, energy efficiency, performance metrics, and applications like Monte Carlo simulations.
The document summarizes several AI accelerators for cloud datacenters including Google TPU, HabanaLabs Gaudi, Graphcore IPU, and Baidu Kunlun. It discusses their architectures, performance, and how they address challenges in datacenters like workload diversity and energy efficiency. The accelerators use specialized hardware like systolic arrays and FPGA/ASIC designs to achieve much higher performance and efficiency than CPUs and GPUs for AI tasks like training deep learning models.
This document provides an overview and summary of key concepts around virtualization that will be covered in more depth at a technical deep dive session, including:
- Virtualization capabilities for desktops/laptops and servers including workstation virtualization and server consolidation.
- How virtual machines work and the overhead associated with virtualization.
- Properties of virtualization like partitioning, isolation, and encapsulation.
- Benefits of server virtualization like consolidation, simpler management, and automated resource pooling.
- Comparison of "hosted" and vSphere virtualization architectures.
- Technologies used in virtualization like binary translation, hardware assistance from Intel VT/AMD-V.
- Ability to virtualize CPU intensive applications with
- The document discusses trends in AI chips, including the rise of deep learning models enabled by increased computing power and data availability.
- It outlines the AI stack from algorithms and neural network models down to chips, memory, and hardware. Popular deep learning model types and applications are also summarized.
- The trends are towards more specialized hardware like Google's TPUs for cloud servers and dedicated chips for mobile/edge devices from companies like Qualcomm and Nvidia. Processing-in-memory and new memory technologies may help address bandwidth bottlenecks.
- Overall hardware is still catching up to the needs of large neural networks, and there is a lack of unified software tools and frameworks to program diverse AI accelerators.
This document provides a summary of the IBM POWER9 AC922 system with 6 GPUs. It includes details on the POWER9 processor which features 24 cores per die, an enhanced cache hierarchy up to 120MB, and on-chip accelerators. The AC922 system utilizes two POWER9 processors, supports up to 512GB memory via 16 DDR4 DIMMs, and has three Nvidia Volta GPUs per socket connected via NVLink 2.0. It also discusses the POWER ISA v3.0 instruction set and how POWER9 serves as a premier acceleration platform with technologies like CAPI, OpenCAPI, and NVLink.
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
This document presents a software-based technique for partitioning shared last-level caches (L2 caches) on multicore systems to improve performance. It implements page coloring to allocate physical pages for each process to distinct cache line colors. Experimental results on a Power5 system show this approach can control cache usage and improve performance for multiprogrammed workloads by up to 17% compared to an uncontrolled shared cache. The document also finds that cache stall rates provide a better performance analysis metric than miss rates for some workloads.
Design and implementation of a reliable and cost-effective cloud computing in...Francesco Taurino
This document summarizes the INFN Napoli experience in designing and implementing a reliable and cost-effective cloud computing infrastructure. Key aspects included using existing hardware, virtualization and clustering technologies to consolidate services and reduce costs. A network with redundant switches and storage servers using GlusterFS provided high availability. Custom tools were developed to simplify administration tasks like provisioning, migration, and load balancing of virtual machines. The solution provided an efficient and reliable private cloud with over one year of uninterrupted uptime.
This document discusses scaling text mining to one million documents. It describes the resource requirements of various text mining analytics and different scaling framework options to distribute the workload across multiple machines. The key challenges in scaling up include managing the corpus, integrating and testing analytics, tracking errors and progress, and storing and accessing the output. Distributed frameworks like UIMA Asynchronous Scaleout and Hadoop can help put computationally heavy analytics on separate machines to improve performance.
A Prototype Storage Subsystem based on Phase Change MemoryIBM Research
IBM scientists for the first time demonstrated a hybrid storage and caching subsystem, code-named Project Theseus, at the recent 2014 Non-Volatile Memories Workshop in San Francisco, California. And the amazing achievement is that they were using two year old PCM chip prototypes.
This document provides an introduction to high-performance computing (HPC) including definitions, applications, hardware, and software. It defines HPC as utilizing parallel processing through computer clusters and supercomputers to solve complex modeling problems. The document then describes typical HPC cluster hardware such as computing nodes, a head node, switches, storage, and a KVM. It also outlines cluster management software, job scheduling, and parallel programming tools like MPI that allow programs to run simultaneously on multiple processors. An example HPC cluster at SIU called Maxwell is presented with its technical specifications and a tutorial on logging into and running simple MPI programs on the system.
Windows Server 2012 includes several improvements to networking and Hyper-V that help address challenges around availability, reliability, security, and costs. New features like Receive Side Scaling, Receive Segment Coalescing, and Dynamic Virtual Machine Queuing improve network performance and scalability. Single Root I/O Virtualization and NIC Teaming provide network virtualization capabilities and redundancy. These features help optimize hardware utilization, reduce latency and complexity, and improve throughput and workload scaling.
Superfluid Orchestration of heterogeneous Reusable Functional Blocks for 5G n...Stefano Salsano
The demo is composed of three scenes presenting tools and results from the Superfluidity project.
1) RDCL 3D is an extensible web framework which can be used to: edit, validate, visualize service and component descriptors expressed with different modelling languages (RDCLs); deploy the component / services over execution platforms.
2) Software defined wireless network (RAN as a Service). An end-to-end wireless network is described as a chain of RFBs (Reusable Functional Blocks) with RDCL 3D. This chain is dynamically instantiated in a cloud environment using containers. The demonstration shows a full software solution orchestrating different RFBs (RAN and CORE) over Central/EDGE/Front-End clouds. The fronthaul network is also made reprogrammable through SDN, which is also deployed as RFBs.
3) Orchestration of micro-VNFs (Unikernels). We have added support for Unikernels (ClickOS) in the XEN hypervisor and in OpenVIM Virtual Infrastructure Manager. Regular VMs (XEN HVM) and Unikernels can run together in the same infrastructure. In the demo we dynamically instantiate an end-to-end service on the infrastructure by chaining regular VMs and Unikernel-based VNFs.
This document discusses hardware trends and challenges for building exascale computers. It describes the evolution of processor/node architectures including multi-core and many-core designs. Reaching exascale performance will require addressing power consumption, concurrency, scalability, and fault tolerance issues. Evolutionary paths using commodity processors are unlikely to succeed, while aggressive approaches using clean-sheet designs for low-power customized chips may be needed to achieve exascale performance by 2018. International efforts are underway to develop exascale systems, but overcoming technical challenges to efficiently utilize extreme parallelism remains difficult.
Gluster Webinar: Introduction to GlusterFSGlusterFS
GlusterFS is an open source, scale-out network filesystem. It runs on commodity hardware and allows indefinite growth in capacity and performance by simply adding server nodes. Key benefits include flexibility to deploy on any hardware, linearly scalable performance, and superior storage economics compared to traditional storage solutions. GlusterFS uses a distributed hashing technique instead of a metadata server to provide high availability and reliability.
Datacenters have unique characteristics including massive scale, limited geographic scope, and regular topologies. Their goals include providing extreme bisection bandwidth, low latency, predictable performance, and differentiation between tenants. Traditional network designs do not meet these goals, requiring new approaches that leverage single administration, control over endpoints and traffic placement, and commodity hardware.
This document discusses optimizations for TCP/IP networking performance on multicore systems. It describes several inefficiencies in the Linux kernel TCP/IP stack related to shared resources between cores, broken data locality, and per-packet processing overhead. It then introduces mTCP, a user-level TCP/IP stack that addresses these issues through a thread model with pairwise threading, batch packet processing from I/O to applications, and a BSD-like socket API. mTCP achieves a 2.35x performance improvement over the kernel TCP/IP stack on a web server workload.
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
The document summarizes the author's participation report at the IEEE CloudCom 2014 conference. Some key points include:
- The author attended sessions on virtualization and HPC on cloud.
- Presentations had a strong academic focus and many presenters were Asian.
- Eight papers on HPC on cloud covered topics like reliability, energy efficiency, performance metrics, and applications like Monte Carlo simulations.
The document summarizes several AI accelerators for cloud datacenters including Google TPU, HabanaLabs Gaudi, Graphcore IPU, and Baidu Kunlun. It discusses their architectures, performance, and how they address challenges in datacenters like workload diversity and energy efficiency. The accelerators use specialized hardware like systolic arrays and FPGA/ASIC designs to achieve much higher performance and efficiency than CPUs and GPUs for AI tasks like training deep learning models.
This document provides an overview and summary of key concepts around virtualization that will be covered in more depth at a technical deep dive session, including:
- Virtualization capabilities for desktops/laptops and servers including workstation virtualization and server consolidation.
- How virtual machines work and the overhead associated with virtualization.
- Properties of virtualization like partitioning, isolation, and encapsulation.
- Benefits of server virtualization like consolidation, simpler management, and automated resource pooling.
- Comparison of "hosted" and vSphere virtualization architectures.
- Technologies used in virtualization like binary translation, hardware assistance from Intel VT/AMD-V.
- Ability to virtualize CPU intensive applications with
- The document discusses trends in AI chips, including the rise of deep learning models enabled by increased computing power and data availability.
- It outlines the AI stack from algorithms and neural network models down to chips, memory, and hardware. Popular deep learning model types and applications are also summarized.
- The trends are towards more specialized hardware like Google's TPUs for cloud servers and dedicated chips for mobile/edge devices from companies like Qualcomm and Nvidia. Processing-in-memory and new memory technologies may help address bandwidth bottlenecks.
- Overall hardware is still catching up to the needs of large neural networks, and there is a lack of unified software tools and frameworks to program diverse AI accelerators.
This document provides a summary of the IBM POWER9 AC922 system with 6 GPUs. It includes details on the POWER9 processor which features 24 cores per die, an enhanced cache hierarchy up to 120MB, and on-chip accelerators. The AC922 system utilizes two POWER9 processors, supports up to 512GB memory via 16 DDR4 DIMMs, and has three Nvidia Volta GPUs per socket connected via NVLink 2.0. It also discusses the POWER ISA v3.0 instruction set and how POWER9 serves as a premier acceleration platform with technologies like CAPI, OpenCAPI, and NVLink.
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
This document presents a software-based technique for partitioning shared last-level caches (L2 caches) on multicore systems to improve performance. It implements page coloring to allocate physical pages for each process to distinct cache line colors. Experimental results on a Power5 system show this approach can control cache usage and improve performance for multiprogrammed workloads by up to 17% compared to an uncontrolled shared cache. The document also finds that cache stall rates provide a better performance analysis metric than miss rates for some workloads.
Design and implementation of a reliable and cost-effective cloud computing in...Francesco Taurino
This document summarizes the INFN Napoli experience in designing and implementing a reliable and cost-effective cloud computing infrastructure. Key aspects included using existing hardware, virtualization and clustering technologies to consolidate services and reduce costs. A network with redundant switches and storage servers using GlusterFS provided high availability. Custom tools were developed to simplify administration tasks like provisioning, migration, and load balancing of virtual machines. The solution provided an efficient and reliable private cloud with over one year of uninterrupted uptime.
This document discusses scaling text mining to one million documents. It describes the resource requirements of various text mining analytics and different scaling framework options to distribute the workload across multiple machines. The key challenges in scaling up include managing the corpus, integrating and testing analytics, tracking errors and progress, and storing and accessing the output. Distributed frameworks like UIMA Asynchronous Scaleout and Hadoop can help put computationally heavy analytics on separate machines to improve performance.
A Prototype Storage Subsystem based on Phase Change MemoryIBM Research
IBM scientists for the first time demonstrated a hybrid storage and caching subsystem, code-named Project Theseus, at the recent 2014 Non-Volatile Memories Workshop in San Francisco, California. And the amazing achievement is that they were using two year old PCM chip prototypes.
This document provides an introduction to high-performance computing (HPC) including definitions, applications, hardware, and software. It defines HPC as utilizing parallel processing through computer clusters and supercomputers to solve complex modeling problems. The document then describes typical HPC cluster hardware such as computing nodes, a head node, switches, storage, and a KVM. It also outlines cluster management software, job scheduling, and parallel programming tools like MPI that allow programs to run simultaneously on multiple processors. An example HPC cluster at SIU called Maxwell is presented with its technical specifications and a tutorial on logging into and running simple MPI programs on the system.
Windows Server 2012 includes several improvements to networking and Hyper-V that help address challenges around availability, reliability, security, and costs. New features like Receive Side Scaling, Receive Segment Coalescing, and Dynamic Virtual Machine Queuing improve network performance and scalability. Single Root I/O Virtualization and NIC Teaming provide network virtualization capabilities and redundancy. These features help optimize hardware utilization, reduce latency and complexity, and improve throughput and workload scaling.
Superfluid Orchestration of heterogeneous Reusable Functional Blocks for 5G n...Stefano Salsano
The demo is composed of three scenes presenting tools and results from the Superfluidity project.
1) RDCL 3D is an extensible web framework which can be used to: edit, validate, visualize service and component descriptors expressed with different modelling languages (RDCLs); deploy the component / services over execution platforms.
2) Software defined wireless network (RAN as a Service). An end-to-end wireless network is described as a chain of RFBs (Reusable Functional Blocks) with RDCL 3D. This chain is dynamically instantiated in a cloud environment using containers. The demonstration shows a full software solution orchestrating different RFBs (RAN and CORE) over Central/EDGE/Front-End clouds. The fronthaul network is also made reprogrammable through SDN, which is also deployed as RFBs.
3) Orchestration of micro-VNFs (Unikernels). We have added support for Unikernels (ClickOS) in the XEN hypervisor and in OpenVIM Virtual Infrastructure Manager. Regular VMs (XEN HVM) and Unikernels can run together in the same infrastructure. In the demo we dynamically instantiate an end-to-end service on the infrastructure by chaining regular VMs and Unikernel-based VNFs.
This document discusses hardware trends and challenges for building exascale computers. It describes the evolution of processor/node architectures including multi-core and many-core designs. Reaching exascale performance will require addressing power consumption, concurrency, scalability, and fault tolerance issues. Evolutionary paths using commodity processors are unlikely to succeed, while aggressive approaches using clean-sheet designs for low-power customized chips may be needed to achieve exascale performance by 2018. International efforts are underway to develop exascale systems, but overcoming technical challenges to efficiently utilize extreme parallelism remains difficult.
Gluster Webinar: Introduction to GlusterFSGlusterFS
GlusterFS is an open source, scale-out network filesystem. It runs on commodity hardware and allows indefinite growth in capacity and performance by simply adding server nodes. Key benefits include flexibility to deploy on any hardware, linearly scalable performance, and superior storage economics compared to traditional storage solutions. GlusterFS uses a distributed hashing technique instead of a metadata server to provide high availability and reliability.
Datacenters have unique characteristics including massive scale, limited geographic scope, and regular topologies. Their goals include providing extreme bisection bandwidth, low latency, predictable performance, and differentiation between tenants. Traditional network designs do not meet these goals, requiring new approaches that leverage single administration, control over endpoints and traffic placement, and commodity hardware.
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
Session ID: HKG18-500K1
Session Name: HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter
Speaker: Not Available
Track: Keynote
★ Session Summary ★
For decades we have been able to take advantage of Moore’s Law to improve single thread performance, reduce power and cost with each generation of semiconductor technology. While technology has advanced after the end of Dennard scaling more than 10 years ago, the advances have slowed down. Server performance increases have relied on increasing core counts and power budgets.
At the same time, workloads have changed in the era of cloud computing. Scale out is becoming more important than scale up. Domain specific architectures have started to emerge to improve the energy efficiency of emerging workloads like deep learning
This talk will provide a historical perspective and discuss emerging trends driving the development of modern servers processors.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-500k1/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-500k1.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-500k1.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Keynote
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
The document summarizes emerging computing trends in data centers, including:
1) The shift to multi-core CPU designs after Dennard scaling broke down, driven by the need for energy efficient designs for cloud computing.
2) The rise of heterogeneous computing using application-specific accelerators like GPUs and FPGAs to improve efficiency for targeted workloads like machine learning.
3) How technologies developed for mobile and edge computing like ARM cores can improve data center server efficiency through typical-use optimization rather than just peak performance.
The document discusses network design concepts for building a resilient network. It emphasizes the importance of considering redundancy at multiple levels, from the physical infrastructure to network protocols. Well-designed networks are modular, have clearly defined functional layers, and incorporate redundancy through techniques like load balancing and diverse circuit paths. Hierarchical network designs with logical areas can also improve convergence times during failures.
This document discusses network-on-chip (NoC) architectures for multiprocessor systems-on-chip. It describes how NoCs use routers and wires to connect hundreds or thousands of processor cores. The document outlines the different layers of a typical NoC architecture, including the application, transport, network, data link and physical layers. It also discusses common NoC router architectures and design methodologies, and introduces a bidirectional NoC architecture that aims to improve bandwidth utilization.
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
Computer Architecture and the von Neumann memory Barrier. New computer architectures for the 21st century: neuromorphic computing, processing in memory, and dataflow computing. Applications to machine learning, AI, image processing and other use cases. Future Technology Conference 2018 - Vancouver BC
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
Solace Systems The Evolution of Messaging The Rise of the ApplianceIosif Itkin
Solace Systems The Evolution of Messaging The Rise of the Appliance
Clive Andrews
Mat Hobbis
Obninsk, 2 March, 2013
LSE The focus beyond Low Latency
EXTENT Trading Technology Trends & Quality Assurance
This document discusses disruptive technologies, specifically how Moore's Law has impacted the technology industry and networking. It provides three key points:
1. Moore's Law, which predicted the doubling of transistors on integrated circuits every two years, has been the guiding principle for new product development. However, for networking, transistor count has doubled but speed has increased slowly.
2. Networking performance has not kept up with Moore's Law like CPU performance has. Network ASICs have increased 10x over 12 years while CPUs increased 64x.
3. Merchant silicon using full custom chip designs has allowed networking to scale at Moore's Law growth rates, providing higher port density, lower price per port, and lower power consumption
The document outlines an agenda for a presentation on the VEDLIoT project. The agenda includes an introduction to VEDLIoT by Pedro Trancoso, a presentation on VEDLIoT Hardware Platforms by Kevin Mika, and a discussion of Performance Evaluation and Benchmarking in VEDLIoT by Mario Pormann. The VEDLIoT project aims to develop very efficient deep learning techniques for IoT applications through the use of heterogeneous hardware platforms and accelerators.
In-memory processing has started to become the norm in large scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory accesstimes and how it impacts big data and NoSQL technologies.We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.The key takeaway is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.
The CMS online cluster consists of more than 2700 computers, mostly running under Scientific Linux CERN. They run the 15000 application instances responsible for the data acquisition and experiment control in a private network. The high availability of the network and services and the independence from external networks allows their operation around the clock. After testing virtualization, it is being deployed to further enhance high availability while allowing even easier servicing. Due to the ever increasing luminosity provided to CMS by the LHC, the cluster size and software running in it has been evolving to meet the increased demand of performance. Only in the last year, the processing power of the High Level Trigger farm was increased by 50% without disruption to ongoing operations and it is foreseen to continue growing. At the same time, large updates of the running software happen once every two weeks with smaller updates occurring all the time due to the many developers of the different subsystems. The configuration management infrastructure based on quattor has been instrumented accordingly to be flexible and easy to use by the software librarians while still performant and robust. Big parts of the cluster can be reconfigured and failing computers reinstalled in only a few minutes. The monitoring infrastructure is being revamped to increase performance and allow a fine grained and user configurable notification that will allow the final experts to receive the notifications of the problems directly and on demand. Details will be given on the adopted solutions which include the following topics: implementation of the redundant and load balanced network and core IT services; deployment and configuration management infrastructure and its customization; the new monitoring infrastructure; virtualization techniques for redundant services… Special emphasis will be put on the scalable approach allowing to increase the size of the cluster with no administration overhead. Finally, the lessons learnt from the two years of running will be presented together with the prospects for the short and long term upgrades and the new technologies now in the pipeline.
Multicloud as the Next Generation of Cloud Infrastructure Brad Eckert
So, what are data center networks really built for? Short answer "applications".
Whether it is a public cloud provider, private enterprise, FSI or telco cloud - the nature of applications across each data center type impose a different set of demands on the underlying network infrastructure. A next-generation architecture is one that is versatile yet modular enough to address these different application needs, whether these are HPC and Big Data, legacy or real-time content. A common architecture goal is for a unified and consolidated network design that can leverage standardized technology attributes and can integrate a versatile workload environment be it high-performance bare metal servers to a microservices enabled container environment. This tutorial is aimed at an in-depth structured understanding of data center business and technical requirements and how EVPN-VXLAN constructs serve as a swiss-knife approach to achieve the same. Practical case study examples that translate theoretical concepts into building blocks for designing and automating multi-tenant data center deployments. Explore how a unified technology solution can help build a network that grows with increasing east-west traffic, seamlessly connects with the backbone for north-south communication while leveraging familiar protocol concepts to achieve security insertion. We will also go over operator issues with traffic optimization, multicast and BUM traffic handling and other common pitfalls. A final step would be to define requirements for a cohesive solution using a centralized controller that enables a data center network operator to leverage the same degree of agility and visibility for both the physical network and the application infrastructure to truly build a software-defined data center.
PLNOG 8: Ivan Pepelnjak - Data Center Fabrics - What Really Matters PROIDEA
The document discusses data center fabric architectures. It describes how traditional data center designs focus on north-south traffic but modern applications generate more east-west traffic between servers. New fabric architectures are needed to provide flexible workload placement and mobility. Common fabric approaches use leaf-spine Clos network designs with non-blocking switching fabrics to provide any-to-any connectivity between endpoints. Large-scale fabrics can be built today using existing switching equipment and protocols like ECMP routing rather than new technologies. The key is to keep layer 2 domains small and use overlay encapsulation for virtual networks.
The document discusses navigating data center architectures, including:
- Juniper offers three data center options (EX Series, QFabric System, and Contrail) which can present confusing alternatives.
- The document outlines four key data center architectures: Virtual Chassis Fabric, IP Fabric, QFabric, and open architectures. It provides details on capabilities and use cases for each.
- Juniper's MetaFabric architecture is presented as a flexible portfolio that spans switching, routing, management, network virtualization, security, and professional services to address customer data center needs.
PacketCloud: an Open Platform for Elastic In-network Services. yeung2000
This document proposes PacketCloud, an open platform for hosting elastic in-network services. PacketCloud uses cloudlets located at ISP network edges to provide virtual instances for third-party services. These services can be user-requested or transparently intercept traffic. A prototype demonstrates services like encryption achieving over 500Mbps on one node and over 10Gbps across 20 nodes in a cloudlet with minimal delay. The platform aims to efficiently share network resources while providing economic rewards for ISPs and third parties.
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
5. Background: personal experience
• Bandwidth is a scarce resource
Network Memory Disk CPU Year
10Mb/s 2MB 10MB 386/20M 1994
100Mb/s 128MB 2GB PentiumII/233 1998
100Mb/s 256MB 40GB PentiumIII/800 2002
1Gb/s 2GB 160GB Core2/2GHZ 2007
1Gb/s 4GB 500GB Core2 Quad/3GHZ 2011
X100 X2000, but X50000 X150X4, but multi- 17 years
slow access core and instruction
level progress
5
6. Background: technology trends
– Disk is cheap (TB and PB are common)
• 500RMB for 1TB
– Memory is cheap (32GB a PC is not uncommon)
• 150RMB for 2GB DRAM
– CPU is powerful yet inexpensive (multi-core)
• 2000RMB for Intel core i7 with 4 cores
– But “network bandwidth is a scarce resource
• Intra-DC: replication everywhere for fault tolerance
• Inter-DC: Input and output need bandwidth
• 50$ (per 1G port), 500$ (per 10G port)
– 0.1$ = 1GB bandwidth = 1CPU hour = 1GB storage per
month
6
8. DCN reference design
• Does not scale
• Low bandwidth
• Single point of failure
• High cost
8
9. Outline
• DCN background
• Opportunities
• Research challenges
• A modular DCN design
9
10. Right time for DCN research
• It is a real problem
• It is an important problem
– DCN as the infrastructure for cloud computing
• The assumptions are different
– Data centers are owned by single organization
– We can innovate at both end-hosts and network
devices
– Security is easier (closed environment and trusted
people)
10
11. DCN research: opportunities
• Full of research problems
– Scalability: tens of thousands to millions servers
– Performance
– Fault tolerance
– Cost saving
– Feel free to suggest new “TCP” protocols
• You can invent your own DCN!
11
12. Outline
• DCN background
• Opportunities
• Research challenges
• A modular DCN design
12
13. Research challenges
Applications Architectures
• Search • Topology design
• Distributed execution engine • Network virtualization
• Distributed file systems • Electrical/optical switching
• Online social networking • Commodity vs. special system
• HPC applications
Technologies Protocols
• DCN management • DCN routing
• DCN platform • TCP incast congestion control
• Energy efficiency • Multicast
13
14. Architecture design
• Scaling: from thousands to millions of servers
• High capacity: support various traffic patterns
• Fault tolerance
• Cost efficient
• Easy to deploy and manage
14
17. Dcell/Bcube (msra-sigcomm08,09)
• Put intelligence at servers
• Use Ethernet switches as crossbar
• Innovations in topology design and routing
DCell BCube
17
20. Technologies: research platform
• A DCN research platform
– High performance: comparable to ASIC
– Easy to program: comparable to commodity server
– Rich functions
• Programmable packet forwarding
• Experiment various control/management funcs
• Can implement various routing/congestion control
designs
• ServerSwitch (msra-nsdi11)
20
21. Applications
• A unified network for both data center and
HPC applications?
Data center HPC
Topology Tree-based Torus/mesh, fat-tree
Routing Deterministic routing Single path routing
Per-packet adaptive L2 spanning tree
routing to exploit path L3 shortest path routing
diversity
Flow control No packet drop Packets can be dropped
Hop by hop End-to-end
Application support Scientific applications Search, e-commerce,
cloud computing
Programming API MPI/RDMA TCP/IP socket
21
22. Outline
• DCN background
• Opportunities
• Research challenges
• A modular DCN design
22
23. Team
• Chuanxiong Guo, Guohan Lu, Haitao Wu,
Yongqiang Xiong
• Interns: Zhiqiang Zhou, Jiaxin Cao, Jiabo Ju, Qin
Jia, Jun Li
• Alumni/Alumna
– members: Songwu Lu, Dan Li
– interns: Lei Shi, Yunfeng Shi, Danfeng Zhang, Xuan Zhang,
Byunchul Park, Nan Hua, Chen Tian, Min-Chen Zhao, Chao
Kong, Kai Chen, Wenfei Wu, Shuang Yang, Peng Su, Bruce
Chen, Zhenqian Feng, Min-Jeong Shi, Yibo Zhu…
23
29. Solution: ServerSwitch
• Full programmability at server CPU
– Kernel module for low latency processing
Software
– User space for ease-to-use
programmability
• Low latency and high throughput
PCI-E
interconnection
Hardware
• Packet forwarding in commodity
switching ASIC
– High performance and limited
programmability
29
31. Summary
• DCN is an area full of opportunities and
challenges
• The best is yet to come!
• Further information
• http://research.microsoft.com/en-
us/projects/msradcn/default.aspx
31