The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
To cope with the steaming out of Moore’s law and Dennard’s scaling end, the world of High-Performance Computing is rapidly evolving toward high-throughput architectures with specialized hardware for vectors and tensor operations in conjunction with sophisticated power management subsystems. RISC-V ISA and Open-HW can prove its effectiveness in fostering innovation in the HPC market as it has done in the embedded one. In this talk, I will introduce a set of building blocks for future HPC systems we have been designing at the ETH Zurich and the University of Bologna.
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
In this video from the Open Compute Summit, Siamak Tavallaei from Microsoft presents an overview of the Microsoft Project Olympus AI Accelerator Chassis, also known as the HGX-1.
Watch the presentation video: http://wp.me/p3RLHQ-guX
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
An update on the Intel Neuromorphic Research Community’s growth and benchmark results, including the addition of new corporate members and numerous new benchmarking updates computed on Intel’s neuromorphic test chip, Loihi.
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
IBM and the Netherlands Institute for Radio Astronomy ASTRON have unveiled the world’s first water-cooled 64-bit microserver. The prototype, which is roughly the size of a smartphone, is part of the proposed IT roadmap for the Square Kilometre Array (SKA), an international consortium to build the world’s largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today’s fastest computers.
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size.
Not only is the microserver compact, it is also very energy-efficient. One of its innovations is hotwater cooling, which in addition to keeping the chip operating temperature below 85 degrees C, will also transport electrical power by means of a copper plate. The concept is based on the same technology IBM developed for the SuperMUC supercomputer located outside of Munich, Germany. IBM scientists hope to keep each microserver operating between 35–40 watts including the system on a chip (SOC) — the current design is 60 watts.
The next step for scientists is to begin to take 128 of the microserver boards using the newest T4240 chips to create a 2U rack unit with 1536 cores and 3072 threads with up to 6 terabytes of DRAM. In addition, they will be adding an Ethernet switch and power module to the integrated water-cooling.
In this deck from the HPC Advisory Council Spain Conference, Dan Olds from OrionX discusses the High Performance Interconnect (HPI) market landscape, plus provides ratings and rankings of HPI choices today.
"The HPI market is the very high-end of the networking equipment market where high bandwidth and low latency are non-negotiable. It started out as a specialist proprietary segment but has blossomed into an indispensable, large, and growing area. Products in this category are used to build extreme-scale computing systems. They are typically not used for traditional telco, enterprise, or service provider networking needs. In this talk, we’ll take a look at the technologies and performance of their high-end technology and the coming battle between onloading vs. offloading interconnect architectures."
Watch the video presentation: http://wp.me/p3RLHQ-fON
Learn more: http://orionx.net/wp-content/uploads/2016/06/HPI-Environment-OrionX-Constellation-DataCenter-20160626.pdf
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
To cope with the steaming out of Moore’s law and Dennard’s scaling end, the world of High-Performance Computing is rapidly evolving toward high-throughput architectures with specialized hardware for vectors and tensor operations in conjunction with sophisticated power management subsystems. RISC-V ISA and Open-HW can prove its effectiveness in fostering innovation in the HPC market as it has done in the embedded one. In this talk, I will introduce a set of building blocks for future HPC systems we have been designing at the ETH Zurich and the University of Bologna.
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
In this video from the Open Compute Summit, Siamak Tavallaei from Microsoft presents an overview of the Microsoft Project Olympus AI Accelerator Chassis, also known as the HGX-1.
Watch the presentation video: http://wp.me/p3RLHQ-guX
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
An update on the Intel Neuromorphic Research Community’s growth and benchmark results, including the addition of new corporate members and numerous new benchmarking updates computed on Intel’s neuromorphic test chip, Loihi.
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
IBM and the Netherlands Institute for Radio Astronomy ASTRON have unveiled the world’s first water-cooled 64-bit microserver. The prototype, which is roughly the size of a smartphone, is part of the proposed IT roadmap for the Square Kilometre Array (SKA), an international consortium to build the world’s largest and most sensitive radio telescope. Scientists estimate that the processing power required to operate the telescope will be equal to several millions of today’s fastest computers.
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size.
Not only is the microserver compact, it is also very energy-efficient. One of its innovations is hotwater cooling, which in addition to keeping the chip operating temperature below 85 degrees C, will also transport electrical power by means of a copper plate. The concept is based on the same technology IBM developed for the SuperMUC supercomputer located outside of Munich, Germany. IBM scientists hope to keep each microserver operating between 35–40 watts including the system on a chip (SOC) — the current design is 60 watts.
The next step for scientists is to begin to take 128 of the microserver boards using the newest T4240 chips to create a 2U rack unit with 1536 cores and 3072 threads with up to 6 terabytes of DRAM. In addition, they will be adding an Ethernet switch and power module to the integrated water-cooling.
In this deck from the HPC Advisory Council Spain Conference, Dan Olds from OrionX discusses the High Performance Interconnect (HPI) market landscape, plus provides ratings and rankings of HPI choices today.
"The HPI market is the very high-end of the networking equipment market where high bandwidth and low latency are non-negotiable. It started out as a specialist proprietary segment but has blossomed into an indispensable, large, and growing area. Products in this category are used to build extreme-scale computing systems. They are typically not used for traditional telco, enterprise, or service provider networking needs. In this talk, we’ll take a look at the technologies and performance of their high-end technology and the coming battle between onloading vs. offloading interconnect architectures."
Watch the video presentation: http://wp.me/p3RLHQ-fON
Learn more: http://orionx.net/wp-content/uploads/2016/06/HPI-Environment-OrionX-Constellation-DataCenter-20160626.pdf
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
Session ID: HKG18-500K1
Session Name: HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the Datacenter
Speaker: Not Available
Track: Keynote
★ Session Summary ★
For decades we have been able to take advantage of Moore’s Law to improve single thread performance, reduce power and cost with each generation of semiconductor technology. While technology has advanced after the end of Dennard scaling more than 10 years ago, the advances have slowed down. Server performance increases have relied on increasing core counts and power budgets.
At the same time, workloads have changed in the era of cloud computing. Scale out is becoming more important than scale up. Domain specific architectures have started to emerge to improve the energy efficiency of emerging workloads like deep learning
This talk will provide a historical perspective and discuss emerging trends driving the development of modern servers processors.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-500k1/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-500k1.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-500k1.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Keynote
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
Delivering Carrier Grade OCP for Virtualized Data CentersRadisys Corporation
This webinar explores the requirements for carrier grade Open Compute Project (OCP) infrastructure for virtualized telecom data centers delivering SDN and NFV for digital services.
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
In this deck from the Swiss HPC Conference, Mark Wilkinson presents: 40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility.
"DiRAC is the integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, and astrophysics, cosmology, and nuclear physics, all areas in which the UK is world-leading. DiRAC provides a variety of compute resources, matching machine architecture to the algorithm design and requirements of the research problems to be solved. As a single federated Facility, DiRAC allows more effective and efficient use of computing resources, supporting the delivery of the science programs across the STFC research communities. It provides a common training and consultation framework and, crucially, provides critical mass and a coordinating structure for both small- and large-scale cross-discipline science projects, the technical support needed to run and develop a distributed HPC service, and a pool of expertise to support knowledge transfer and industrial partnership projects. The on-going development and sharing of best-practice for the delivery of productive, national HPC services with DiRAC enables STFC researchers to produce world-leading science across the entire STFC science theory program."
Watch the video: https://wp.me/p3RLHQ-k94
Learn more: https://dirac.ac.uk/
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Expectations for optical network from the viewpoint of system software research
1. Ryousei Takano
National Institute of Advanced Industrial Science and Technology
(AIST)
Special session on challenges and opportunities of integrated
photonics in future datacenters
ACSI 2015@Tsukuba, 27 Jan. 2015
Expectations for optical network
from the viewpoint of system
software research
2. Outline
• Trends in datacenter research and
development
• AIST IMPULSE Project
• Workload analysis
• Proposed architecture:
Dataflow-‐‑‒centric computing
2
3. Introduction
• BigData is a killer app in datacenters, it
requires a clean slate architecture like
“disaggregation”, “datacenter in a box”.
• Optical network is key to making them.
• Optical path network (all optical path
between end-‐‑‒to-‐‑‒end) in a datacenter
– Pros: huge bandwidth, energy efficiency
– Cons: path switching latency, utilization
• To take advantage of optical path network,
a new datacenter OS is essential.
– Key idea: control/data plane separation
3
4. Optical Network in DCs
• Similar concept (“disaggregation” or “datacenter
in a box”) projects are launched recently.
– Open Compute Project (Facebook)
– Rack Scale Computing (Intel)
– Extremely Shrinking Computing (IBM)
– The Machine (HP)
– FireBox (UCB)
– CTR Consortium (MIT)
• Optical network, including photonic-‐‑‒electronic
convergence and short (<1km) reach inter-‐‑‒
connection, is key to drive innovation in future
datacenters.
4
5. 5
Architecture
Service Cluster Back-End Cluster
Front-End Cluster
Web
250 racks
Ads 30 racks
Cache (~144TB)
Search Photos Msg Others UDB ADS-DB TaoLeader
Multifeed 9 racks
Other small services
“Flash at Facebook”,
Flash Summit 2013
Standard
Systems
I
Web
III
Database
IV
Hadoop
V
Photos
VI
Feed
CPU
High
2&x&E5*2670
High
2&x&E5*2660
High
2&x&E5*2660
Low
High
2&x&E5*2660
Memory Low
High
144GB
Medium
64GB
Low
High
144GB
Disk Low
High&IOPS
3.2&TB&Flash
High
15&x&4TB&SATA
High
15&x&4TB&SATA
Medium
Services Web,&Chat Database
Hadoop
(big&data)
Photos,&Video
MulPfeed,
Search,&Ads
Five Standard Servers
6. Open Compute Project
6
• OCP was founded to openly share designs of
datacenter products by Facebook in April 2011.
• Shift from commodity products to user-‐‑‒driven
design to improve the energy efficiency of large
scale datacenters
– Industry Standard: 1.9 PUE
– Open Compute Project: 1.07 PUE
• Specifications: server, storage,
rack, network switch, etc.
• Products: Quanta Rackgo X,
GIGABYTE DataCenter Solution
Open Compute Rack v2
8. HP “The Machine”
8
The
Machine
could
be
six
%mes
more
powerful
than
an
equivalent
conven2onal
design,
while
using
just
1.25
percent
of
the
energy
and
being
around
1/100
the
size.
h:p://www.hpl.hp.com/research/systems-‐research/themachine/
9. Datacenter in a Box
• The machine is six times faster with 1.25 percent
of the energy comparing with K.
9
HPC Challengeʼ’s RandomAccess benchmark
10. The Machine: Architecture
10
Photonic
Interconnect
Compute Elements
Memory Elements
NV Memory Elements
Storage Elements
Architecture evolution/revolution
“Computing Ensemble”: bigger than a
server, smaller than a datacenter,
built-in system software
– Disaggregated pools of uncommitted
compute, memory, and storage
elements
– Optical interconnects enable dynamic,
on-demand composition
– Ensemble OS software using
virtualization for composition and
management
– Management and programming
virtual appliances add value for IT
and application developers
On-demand composition
Ensemble OS Management
Ensemble Programming
11. Machine OS
• Linux++: Linux-‐‑‒based OS for The Machine
– A new concept of memory management
– An emulator to make a conventional computer
behave like The Machine
– A developerʼ’s preview released in June 2015?
• Carbon
– HP will replace Linux++ with Carbon.
11
12. UC Berkeley
1 Terabit/sec optical fibers
FireBox Overview!
High Radix
Switches
SoC
SoC
SoC
SoC
SoC
SoC
SoCSoC
SoC
SoC
SoC
SoC
SoC
SoC
SoC
SoC
Up to 1000 SoCs +
High-BW Mem
(100,000 core total)
NVM
NVM
NVM
NVM
NVM
NVM
NVM
NVMNVM
NVM
NVM
NVM
NVM
NVM
NVM
NVM
Up to 1000 NonVolatile
Memory Modules (100PB total)
InterXBox&
Network&
Many&Short&Paths&
Thru&HighXRadix&Switches&
FireBox Overview
12
A similar concept of
The Machine
14. IMPULSE: Initiative for Most Power-efficient
Ultra-Large-Scale data Exploration
2014
2020
2030
・・・
・・・
Op2cal
Network
3D stacked package
2.5D stacked package
Separated packages
Future data center
Logic
I/O
NVRAM
Logic
NVRAM
I/O
I/O
Logic
NVRAM
High-Performance Logic Architecture
Non-Volatile Memory Optical Network
- Voltage-controlled, magnetic RAM
mainly for cache and work memories
- 3D build-up integration of the front-end
circuits including high-mobility Ge-on-
insulator FinFETs. / AIST-original TCAD
- Silicon photonics cluster SW
- Optical interconnect technologies
- Future data center architecture
design / Dataflow-centric warehouse-
scale computing
15. 15
HPC
Big data
Architecture for concentrated data processing
3D installation
Non-‐‑‒volatile
memory
Energy-‐‑‒
saving
logic
Optical path
between chips
Non-‐‑‒volatile
memory Energy-‐‑‒
saving
logic
Storage class memory(non-‐‑‒volatile memory)
HDD storage
High performance
server module
Energy-‐‑‒saving
high-‐‑‒speed network
Energy-‐‑‒saving
large-‐‑‒capacity
storage
Creating a rich and
eco-‐‑‒friendly society
Initiative for Most Power-‐‑‒efficient Ultra-‐‑‒Large-‐‑‒Scale data Exploration
IMPULSE STrategic AIST integrated R&D (STAR) program
*STAR program is AIST research that will produce a large outcome in the future.
AIST’s IMPULSE Program
16. Voltage-controlled Nonvolatile Magnetic RAM
Nonvolatile CPU
Nonvolatile Cash
Nonvolatile Display
Power saved
storage
NAND Flash
Voltage Controlled
Spin RAM
• voltage-induced magnetic
anisotropy change
• Less than 1/100 rewriting power
• Resistance change by the Ge
displacement
• Loss by entropy: < 1/100
Voltage Controlled
Topological RAM
Memory keeping
w/o power
Insulation
Layer
Thin film
Ferro-
magnetics
17. Low Power High-performance Logic
Wiring
layer
Front
-end
Front-end 3D integration
ソース ドレイン ソース ドレイン
nMOS pMOS
絶縁膜
Ge Ge Ge
Ge Fin CMOS Tech.
• Low-power/high-speed by Ge
• Toward 0.4V - Ge Fin CMOS
S
D
S
D
Insulation layer
● Dense integration w/o miniaturization
● Reduction of the wiring length for power saving
● Introduction of Ge and III-V channels by simple stacking process
● Innovative circuit by using Z direction
19. DEMUX
Wavelength
bank
(Optical
comb)
・
・
・
MOD. MUX
Fiber
Silicon
Photonics
Integration
Datacenter
server
racks
Silicon photonics
cluster switches
DWDM, multi-level
modulation optical
interconnects
DSP
Tx RxComb
source
Memory
cube
CPU
/GPU
2.5D-CPU Card
No of λs
Order of mod.
Bit rate
1
1
20 Gbps
4
8
640 Gbps
32
8
5.12 Tbps
●Large-scale silicon photonics based cluster switches
●DWDM, multi-level modulation, highly integrated “elastic” optical interconnects
●Ultra-low energy consumption network by making use of optical switches
Ø Ultra-compact switches
based on silicon photonics
Ø 3D integration by
amorphous silicon
Ø A new server architecture
Current state-of-the-art Tx
100Gbps → ~ 5.12Tbps
Current electrical switches:
~130Tbps → ~500Pbps
Optical Network Technology for Future Datacenters
20. Architecture for Big Data and Extreme-scale Computing
Real-time
Big data
Optimal arrangement of the data flow
Resource management / Monitoring
Storage
Server Module
Data center OS
Input
Output
Conv.
Ana.
Data flow
Data
flow
centric
warehouse
scale
compu%ng
1 - Single OS controls
entire data center
2 - Split a data center OS into the
data plane and the control plane to
guarantee real-time data processing
Connect to universal processor /
hardware and storage by using
optical network
21. Performance Estimation
• Estimate the performance of typical both HPC and
BigData workloads on a future datacenter system
• SimGrid simulator
– Simulator of large-‐‑‒scale distributed Systems, such as
Grids, Clouds, HPC, and P2P.
21
http://simgrid.gforge.inria.fr
mGrid Overview
MSG
Simple application-
level simulator
SimDag
Framework for
DAGs of parallel tasks
applications on top of
a virtual environment
Library to run MPI
SMPI
virtual platform simulator
SURF
Contrib
Grounding features (logging, etc.), data structures (lists, etc.) and portability
XBT
TRACETracingsimulation
User Code
Grid user APIs
If your application is a DAG of (parallel) tasks ; use SimDag
To study an existing MPI code ; use SMPI
SimGrid is not a Simulator
logs
stats
visu
Availibility
Changes
Platform
Topology
Application
Deployment
Simulation Kernel
Application
Simulator
OutcomesScenario
Applicative
Workload
Parameters
Input
That’s a Generic Simulation Framework
Da SimGrid Team SimGrid User 101 Introduction Installing MSG Java lua Ruby Trace Config xbt Performance CC 23/28
22. Workload 1: Simple Message Passing
• Iteration of neighbor communication (bottom left)
• Big impact of increasing link bandwidth if an
application is network intensive.
22
Compute power
(FLO)
Relative
execution time
Data size (byte)
#node: 10000
Link bandwidth: 0.1, 1, 10Tbps
Link latency 100ns
CPU power: 10TFLOPS
Data size:1012〜1024B
…
…
1/100
23. Workload 2: HPC Application
• NAS Parallel Benchmark (256 procs, class C)
– Low latency is more important than huge bandwidth.
– The problem size is too small to utilize huge bandwidth.
23
CPU power:1TFLOPS
Link bandwidth:1Tbps
CPU power:1TFLOPS
Link latency:0.1us
Effect of reducing the link latency Effect of increasing the link bandwidth
Relative execution time
Relative execution time
24. Workload 3: MapReduce
24
• KDD Cup 2012, Track 2: predict the click-‐‑‒through
rate of ads (using Hadoop and Hivemall)
• Machine learning is CPU intensive.
• The effect of huge bandwidth is limited, because...
– The concurrency of the used model is not enough.
– Hadoop is optimized to make jobs run faster on the
current I/O devices.
Disk I/O bandwidth (Mbps)
Execution time (second)
Execution time (second)
Relative CPU power (base 172GFLOPS)
CPU power: 17.2TFLOPS
Disk bandwidth: 200Mbps
Network bandwidth: 10Gbps
28. In-‐‑‒storage, network processing
28
Mappers
Shuffle and Reduce
Hierarchical and partial
reduce processing in
each network node to
avoid network
congestion and
serialized reduce.
Compute modules are
attached in storage to
maximize the read
throughput from
storage.
30. 30
PU
MEM
PU
MEM
PU
MEM
PU
MEM
PU
MEM
PU
MEM
PU
MEM
PU
MEM
Direct optical I/O connection
to non-‐‑‒volatile memory
modules distributed on a chip
Communication
over DWDM
In-‐‑‒storage, network processing:
hardware design
31. Direct Memory Copy over DWDM
• Assume processor-‐‑‒
memory embedded
package with WDM
interconnect.
• To fully utilize the huge
I/O bandwidth realized
by DWDM.
• Multiple memory blocks
can be sent/received
simultaneously using
multiple wavelengths.
• Memory-‐‑‒centric network
is a similar idea [PACT13]
31
Processor
Memory block
Processor
cores
Memory Bank
WDM
Interconnect
Memory block
Single package compute node
Cache
/
MMU
From
Wavelength bank
32. 32
2014 2020 2030
・・・
・・・
Optical
Network
3D stacked package2.5D stacked packageSeparated packages
Future data center
Logic I/O
NVRAM
Logic
NVRAM
I/O
I/O
Logic
NVRAM
Goal: 100x energy efficiency of data
processing
Our Vision of Future Datacenter
33. DPF
DPF
DPF
DPF
Data Flow
Application
Datacenter OS
Data flow
planning
Slice
DPCs
Dataflow Processing System
33
DPF: Data Processing
Function
DPC: Data Processing
Component
Optical Network
Storage
Server modules
Co-‐‑‒allocate DPCs
and network path
between them.
Resource
monitoring
34. IMPULSE Datacenter OS
• A single OS for datacenter-‐‑‒wide optimization of
the energy efficiency and the performance
• Separation of data plane and control plane:
- Data plane is an application specific library OS.
- Control plane manages resources (server, network, etc).
Datacenter OS
App.
Data-‐‑‒
plane
App.
Data-‐‑‒
plane
App.
Data-‐‑‒
plane
…
deploy/launch/destroy/
monitor
Control-‐‑‒plane Control-‐‑‒plane
34
35. IMPULSE Datacenter OS
Data plane
• Application specific library OS
- E.g., machine learning, data store, etc.
• Mitigate the OS overhead to fully
utilize high performance devices.
App.
Data-‐‑‒
plane
App.
Data-‐‑‒
plane
App.
Data-‐‑‒
plane
Control-‐‑‒
plane
CPU CPU GPU
Mem I/O Mem I/O Mem I/O
Control plane
• Resource management
• Logical and secure resource
partitioning for data planes
• Running on the firmware
35
36. Related Work
• Datacenter-‐‑‒wide resource management
– OpenStack, Apache CloudStack, Kubernetes
– Hadoop YARN, Apache Mesos
• Dataflow processing engine
– Google Cloud Dataflow
– Lambda architecture
• Control and data planes separated design in
OS
– Arrakis (U. Washington)
– IX (Stanford)
36
37. Summary
• New visions of future datacenters:
“disaggregation” and “datacenter in a box”
• Optical network is key to making them.
• Hardware and software co-‐‑‒design is critical.
• Optical path network encourages C/D
separation in a datacenter OS.
– Control plane manages resources and establishes
a path between data processing components.
– Data plane fully utilizes the huge bandwidth.
37
39. Reference
• Rack scale architecture for Cloud, IDC2013
– https://intel.activeevents.com/sf13/connect/fileDownload/session/
6DE5FDFBF0D0854E73D2A3908D58E1E2/SF13_̲CLDS001_̲100.pdf
• Intel rack scale architecture overview, Interop2013
– http://presentations.interop.com/events/las-‐‑‒vegas/2013/free-‐‑‒
sessions-‐‑‒-‐‑‒-‐‑‒keynote-‐‑‒presentations/download/463
• New technologies that disrupt our complete ecosystem and
their limits in the race to Zettascale, HPC2014
– http://www.hpcc.unical.it/hpc2014/pdfs/demichel.pdf
• HPが「Tech Power Club」で⾒見見せた“未来のサーバー技術”,
ASCII.jp
– http://ascii.jp/elem/000/000/915/915508/
39
40. 40
Optical Interconnect:
As the bandwidth demand for traditionally electrical wireline interconnects has accelerated, optics has become an increa
alternative for interconnects within computing systems. Optical communication offers clear benefits for high-speed an
interconnects. Relative to electrical interconnects, optics provides lower channel loss. Circuit design and packaging techn
traditionally been used for electrical wireline are being adapted to enable integrated optical with extremely low power
resulted in rapid progress in optical ICs for Ethernet, backplane and chip-to-chip optical communication. ISSCC 201
dimensional (12×5) optical array achieving an aggregate data-rate of 600Gb/s [8.2]. Pre-emphasis using group-delay
the useful date rate of a 25Gb/s VCSEL to 40Gb/s [8.9]. Additional examples of low-power-linear and non-linear e
electronic dispersion compensation in multi-mode and long-haul cables [8.1, 8.3].
Concluding Remarks:
Continuing to aggressively scale I/O bandwidth is both essential for the industry and extremely challenging. Innovatio
higher performance and lower power will continue to be made in order to sustain this trend. Advances in circuit architectu
topologies, and transistor scaling are together changing how I/O will be done over the next decade. The most exciting and
of these emerging technologies for wireline I/O will be highlighted at ISSCC 2014.
Per-pin data-rate vs. year for a variety of common I/O standards.
2000 2002 2004 2006 2008 2010 2012 2014 2016
1
10
50
Year
Data Rate [Gbps]
HyperTranport
QPI
PCIe
S‐ATA
SAS
OIF/CEI
PON
Fibre Channel
DDR
GDDR
DRAM Data bandwidth trends.
Non-Volatile Memories (NVMs):
the past decade, significant investment has been put into emerging memories to find an alternative to floating-gate based non-volatile
emory. The emerging NVMs, such as phase-change memory (PRAM), ferroelectric RAM (FeRAM), magnetic spin-torque-transfer (STT-
RAM), and Resistive memory (ReRAM), are showing potential to achieve high cycling capability and lower power per bit in read/write
perations. Some commercial applications, such as cellular phones, have recently started to use PRAM, demonstrating that reliability
nd cost competitiveness in emerging memories is becoming a reality. Fast write speed and low read-access time are the potential
enefits of these emerging memories. At ISSCC 2014, a high-density ReRAM with a buried WL access device is introduced to improve
e write performance and area. The next Figure highlights how MLC NAND Flash write throughput continues to improve. However,
hile the Figure following shows no increase in NAND Flash density over the past year, recent devices are built with finer dimensions
more sophisticated 3-dimensional vertical bit cells.
Per-‐‑‒pin data rate of common I/O standards
High Bandwidth Memory
ISSCC2014 Trends
ISSCC2014 Trends
Data source:
http://cpudb.stanford.edu/
2x/1.5yrs
Processor scaling trends