SlideShare a Scribd company logo
1 of 3
Download to read offline
| |JULY 2014
1CIOReview| |February 2016
1CIOReview
NcompassTrac:
JAVA DEVELOPMENT SPECIAL
T h e N a v i g a t o r f o r E n t e r p r i s e S o l u t i o n s FEBRUARY 26 - 2016 CIOREVIEW.COM
CIO INSIGHTS:
BILL MILLER,
CIO,
EMS USA, INC
IN MY OPINION:
JANELLE KLEIN,
CTO,
NEW IRON
Java:The Panacea for
Enterprise Application
Development
| |JULY 2014
10CIOReview | |February 2016
31CIOReview
Repurposing
Supercomputers—
What happens on
“The Other Side?”By Andree Jacobson, CIO, New Mexico Consortium & Project Manager, PRObE
G
overnment and industry alike invest heavily in massive
computer systems to satisfy the insatiable demand for
compute power of today’s society. Compared to other
types of equipment, computers have an unusually short
life-span. After only a few years of operation most computers are
replaced with faster, better, systems. With all this hardware being
continuously decommissioned, where does it all go?
Government, industry, and research facilities keep building larger
and faster supercomputers, which is a natural effect of trying to keep
up with the ever growing demand for compute cycles to perform crit-
ical scientific calculations required to ensure the safety of our nation
or the profitability of a company. The technology competition is es-
sentially a modern version of the space race that occurred during the
cold war as the country with the fastest computer
will perform the most advanced science.
As these massive computer systems are
built and put into production, the “Top
500” list reveals the current state of the
race at a SuperComputing conference
every six months. For the last three
years, China’s “Tianhe-2” computer
system with a peak 54.9 PetaFLOPS
(Trillion Floating Point Operations
Per Second) in is the lead, followed by
the U.S. Department of Energy - Oak
Ridge National Laboratory sys-
tem called “Titan” at roughly
half the performance of
its Chinese counterpart.
Running these systems
require several MegaWatts of power and cost millions of dollars a
year to operate. In industry, massive corporations like Google, Ama-
zon, and Facebook build their own data centers around the world to
supply enough compute power to meet the needs of their hundreds of
millions of users. Each of these types of data centers can host tens of
thousands of individual computers often referred to as nodes. A node
is at least as powerful your average office / home computer. Many
have co-processors (like GPUs or Xeon Phi’s) to speed up calcula-
tions; some have disk, and most have fast networking capabilities.
The end result is a massive amount of hardware that has one thing in
common - at some point, inevitably - each and every node needs to
be discarded.
LiveorLetDie?
Andree Jacobson, CIO for the New Mexico Consortium (NMC) fo-
cuses on the fate of these decommissioned supercomputers. He is the
project manager for PRObE (The Parallel Reconfigurable Observa-
tional Environment) which is an NSF funded compute facility hosted
by the NMC in Los Alamos, NM. The NMC is a non-profit organi-
zation with a purpose to improve the research environment in New
Mexico by facilitating collaborations between Los Alamos National
Laboratory (LANL) and the three research universities in the state.
PRObE is a pilot project designed to determine the feasibility of us-
ing re-purposed supercomputer hardware for research purposes. Gary
Grider, Division Leader for High Performance Computing at LANL
came up with the idea for PRObE in 2006 after arriving to the conclu-
sion that many of their computer systems that are normally decom-
missioned and subsequently destroyed despite still having quite a bit
of useful life left in them. Many facilities deal with their decommis
CIO INSIGHTS
Andree Jacobson
| |JULY 2014
11CIOReview
| |February 2016
32CIOReview
sioned systems by putting them on trucks
and driving them to a secure facility where
the components are placed in an industrial
metal shredder which chops them into tiny
pieces which are then melted down to re-
cover precious metals. But does something
that might have cost $30M just three or four
years prior really only possess scrap value
today? Neither Grider or Jacobson thought
so and co-wrote the NSF proposal together
other collaborators from Carnegie Mellon
University and the University of Utah. In
October 2010 the NMC was awarded $10M
from the NSF to build PRObE.
From a pure profitability standpoint
the answer to the scrap value question is
probably yes. Based on historical trends it is
usually possible to achieve about double the
performance in a 10th of the floor footprint
and ⅔ to one half of the power consumption
by performing an upgrade of systems that are
fouryearsintoproduction.Aswewillsee,the
operational expenses (OPEX) for running an
outdated computer system quickly exceeds
the capital expense (CAPEX) investment
with the accompanying reduced OPEX for a
new, more efficient system.
Many universities that begin deploying
cluster style research computing often resort
to using discarded desktop computers.
However, these cobbled together systems
are simply not adequate to meet the needs
of researchers who require very large
computer systems to perform their research.
This means the value of a decommissioned
supercomputer might be significantly higher
than the scrap value to the average person
or researcher at a university because these
older systems can provide plentiful and more
powerful computational capabilities than
would otherwise be available.
A Different Approach
PRObE is an answer to getting these
decommissioned systems into the hands of
people who can use them, but setting up
and maintaining large clusters containing
more than 1000 nodes requires overcoming
several obstacles:
1) Sheer volume: Decommissioning,
moving, inspecting, troubleshooting, and
bringing back thousands of old
computers on-
line takes sig-
nificant time
and effort.
Also, unlike
when a system
is slated for de-
struction - care
must be taken through-
out the decommissioning pro-
cess so that parts are not damaged.
2) Space: A computer system with 1000 or
more nodes and appropriate interconnect
networks will likely require about 40-
50 whole racks of computer equipment.
PRObE has capacity for 1MW of compute
power, about 280 tons of cooling, and
3000 sq ft of server room space to house
these large machines. This is sufficient
for housing two large and a few smaller
clusters.
3) Electricity cost: 1MW costs around $1M
per year in New Mexico. It is a required
OPEX and in PRObE’s case, is provided
by NSF funding. This is not a typical setup,
but since there is no procurement cost for
the computers - the electricity is covered
instead. This allows PRObE to provide the
compute services to the community at no
cost to the individual users.
4) Lack of spare parts: Vendors do not
necessarily keep old spare parts around
once a product has reached end-of-life and
sometimes the vendor of an old system
might have vanished. In such cases, the
only outlet is the gray market - such as
eBay and other vendors specializing in
reused computer equipment. In PRObE’s
case - LANL’s systems are usually
larger than what PRObE can house, so
a sufficient number of spares (typically
about 20 percent) can accompany each
system. Machines can also be cannibalized
to keep the system running once the spares
run out.
5) Staff to operate: PRObE is successful
primarily because of the workforce we
use to build the clusters and to maintain
them. In particular, our staff is creative
as they can both assemble and maintain
the hardware even
with limited funds.
Instead of hiring
consultants or full
time staff members
to perform this
work, PRObE relies
on local high-school
and early college
talent, which is
also a wonderful
way to train young
people. Over the past 6 years we
have employed close to 40 high school
students that spend a couple of hours with
us each week. During summers and winter
break, these students often work full time.
To PRObE this is an affordable solution
and the students get hands-on experience
building large computer systems.
The Future
PRObE is fortunate that the NSF sees
the value in what we do, the training
that we provide, and the scientific value
these older systems can contribute to the
academic and scientific communities.
Without NSF support, PRObE would
not be possible. While the operation of
PRObE require both skill and creativity,
the work is rewarding and the scientific
benefits are as real as exemplified by the
many research citations PRObE regularly
receives in the scientific literature.
The technology
competition is essentially
a modern version of the
space race that occurred
during the cold war as the
country with the fastest
computer will perform the
most advanced science

More Related Content

Similar to PRObE

CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide shareGuy Tel-Zur
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapOpenPOWERorg
 
236341 Idc How Nations Are Using Hpc August 2012
236341 Idc How Nations Are Using Hpc August 2012236341 Idc How Nations Are Using Hpc August 2012
236341 Idc How Nations Are Using Hpc August 2012Chris O'Neal
 
Grid computing assiment
Grid computing assimentGrid computing assiment
Grid computing assimentHuma Tariq
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performanceJisc
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Researchdbpublications
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaGabriele Bozzi
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and DataGuy Coates
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...Facultad de Informática UCM
 

Similar to PRObE (20)

CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
Sc10 slide share
Sc10 slide shareSc10 slide share
Sc10 slide share
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 Recap
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
236341 Idc How Nations Are Using Hpc August 2012
236341 Idc How Nations Are Using Hpc August 2012236341 Idc How Nations Are Using Hpc August 2012
236341 Idc How Nations Are Using Hpc August 2012
 
Grid computing assiment
Grid computing assimentGrid computing assiment
Grid computing assiment
 
Challenges in end-to-end performance
Challenges in end-to-end performanceChallenges in end-to-end performance
Challenges in end-to-end performance
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
Multicore computing
Multicore computingMulticore computing
Multicore computing
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Research
 
SuperGreenComputing
SuperGreenComputingSuperGreenComputing
SuperGreenComputing
 
Gcc notes unit 1
Gcc notes unit 1Gcc notes unit 1
Gcc notes unit 1
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom Italia
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
Fast and energy-efficient eNVM based memory organisation at L3-L1 layers for ...
 

PRObE

  • 1. | |JULY 2014 1CIOReview| |February 2016 1CIOReview NcompassTrac: JAVA DEVELOPMENT SPECIAL T h e N a v i g a t o r f o r E n t e r p r i s e S o l u t i o n s FEBRUARY 26 - 2016 CIOREVIEW.COM CIO INSIGHTS: BILL MILLER, CIO, EMS USA, INC IN MY OPINION: JANELLE KLEIN, CTO, NEW IRON Java:The Panacea for Enterprise Application Development
  • 2. | |JULY 2014 10CIOReview | |February 2016 31CIOReview Repurposing Supercomputers— What happens on “The Other Side?”By Andree Jacobson, CIO, New Mexico Consortium & Project Manager, PRObE G overnment and industry alike invest heavily in massive computer systems to satisfy the insatiable demand for compute power of today’s society. Compared to other types of equipment, computers have an unusually short life-span. After only a few years of operation most computers are replaced with faster, better, systems. With all this hardware being continuously decommissioned, where does it all go? Government, industry, and research facilities keep building larger and faster supercomputers, which is a natural effect of trying to keep up with the ever growing demand for compute cycles to perform crit- ical scientific calculations required to ensure the safety of our nation or the profitability of a company. The technology competition is es- sentially a modern version of the space race that occurred during the cold war as the country with the fastest computer will perform the most advanced science. As these massive computer systems are built and put into production, the “Top 500” list reveals the current state of the race at a SuperComputing conference every six months. For the last three years, China’s “Tianhe-2” computer system with a peak 54.9 PetaFLOPS (Trillion Floating Point Operations Per Second) in is the lead, followed by the U.S. Department of Energy - Oak Ridge National Laboratory sys- tem called “Titan” at roughly half the performance of its Chinese counterpart. Running these systems require several MegaWatts of power and cost millions of dollars a year to operate. In industry, massive corporations like Google, Ama- zon, and Facebook build their own data centers around the world to supply enough compute power to meet the needs of their hundreds of millions of users. Each of these types of data centers can host tens of thousands of individual computers often referred to as nodes. A node is at least as powerful your average office / home computer. Many have co-processors (like GPUs or Xeon Phi’s) to speed up calcula- tions; some have disk, and most have fast networking capabilities. The end result is a massive amount of hardware that has one thing in common - at some point, inevitably - each and every node needs to be discarded. LiveorLetDie? Andree Jacobson, CIO for the New Mexico Consortium (NMC) fo- cuses on the fate of these decommissioned supercomputers. He is the project manager for PRObE (The Parallel Reconfigurable Observa- tional Environment) which is an NSF funded compute facility hosted by the NMC in Los Alamos, NM. The NMC is a non-profit organi- zation with a purpose to improve the research environment in New Mexico by facilitating collaborations between Los Alamos National Laboratory (LANL) and the three research universities in the state. PRObE is a pilot project designed to determine the feasibility of us- ing re-purposed supercomputer hardware for research purposes. Gary Grider, Division Leader for High Performance Computing at LANL came up with the idea for PRObE in 2006 after arriving to the conclu- sion that many of their computer systems that are normally decom- missioned and subsequently destroyed despite still having quite a bit of useful life left in them. Many facilities deal with their decommis CIO INSIGHTS Andree Jacobson
  • 3. | |JULY 2014 11CIOReview | |February 2016 32CIOReview sioned systems by putting them on trucks and driving them to a secure facility where the components are placed in an industrial metal shredder which chops them into tiny pieces which are then melted down to re- cover precious metals. But does something that might have cost $30M just three or four years prior really only possess scrap value today? Neither Grider or Jacobson thought so and co-wrote the NSF proposal together other collaborators from Carnegie Mellon University and the University of Utah. In October 2010 the NMC was awarded $10M from the NSF to build PRObE. From a pure profitability standpoint the answer to the scrap value question is probably yes. Based on historical trends it is usually possible to achieve about double the performance in a 10th of the floor footprint and ⅔ to one half of the power consumption by performing an upgrade of systems that are fouryearsintoproduction.Aswewillsee,the operational expenses (OPEX) for running an outdated computer system quickly exceeds the capital expense (CAPEX) investment with the accompanying reduced OPEX for a new, more efficient system. Many universities that begin deploying cluster style research computing often resort to using discarded desktop computers. However, these cobbled together systems are simply not adequate to meet the needs of researchers who require very large computer systems to perform their research. This means the value of a decommissioned supercomputer might be significantly higher than the scrap value to the average person or researcher at a university because these older systems can provide plentiful and more powerful computational capabilities than would otherwise be available. A Different Approach PRObE is an answer to getting these decommissioned systems into the hands of people who can use them, but setting up and maintaining large clusters containing more than 1000 nodes requires overcoming several obstacles: 1) Sheer volume: Decommissioning, moving, inspecting, troubleshooting, and bringing back thousands of old computers on- line takes sig- nificant time and effort. Also, unlike when a system is slated for de- struction - care must be taken through- out the decommissioning pro- cess so that parts are not damaged. 2) Space: A computer system with 1000 or more nodes and appropriate interconnect networks will likely require about 40- 50 whole racks of computer equipment. PRObE has capacity for 1MW of compute power, about 280 tons of cooling, and 3000 sq ft of server room space to house these large machines. This is sufficient for housing two large and a few smaller clusters. 3) Electricity cost: 1MW costs around $1M per year in New Mexico. It is a required OPEX and in PRObE’s case, is provided by NSF funding. This is not a typical setup, but since there is no procurement cost for the computers - the electricity is covered instead. This allows PRObE to provide the compute services to the community at no cost to the individual users. 4) Lack of spare parts: Vendors do not necessarily keep old spare parts around once a product has reached end-of-life and sometimes the vendor of an old system might have vanished. In such cases, the only outlet is the gray market - such as eBay and other vendors specializing in reused computer equipment. In PRObE’s case - LANL’s systems are usually larger than what PRObE can house, so a sufficient number of spares (typically about 20 percent) can accompany each system. Machines can also be cannibalized to keep the system running once the spares run out. 5) Staff to operate: PRObE is successful primarily because of the workforce we use to build the clusters and to maintain them. In particular, our staff is creative as they can both assemble and maintain the hardware even with limited funds. Instead of hiring consultants or full time staff members to perform this work, PRObE relies on local high-school and early college talent, which is also a wonderful way to train young people. Over the past 6 years we have employed close to 40 high school students that spend a couple of hours with us each week. During summers and winter break, these students often work full time. To PRObE this is an affordable solution and the students get hands-on experience building large computer systems. The Future PRObE is fortunate that the NSF sees the value in what we do, the training that we provide, and the scientific value these older systems can contribute to the academic and scientific communities. Without NSF support, PRObE would not be possible. While the operation of PRObE require both skill and creativity, the work is rewarding and the scientific benefits are as real as exemplified by the many research citations PRObE regularly receives in the scientific literature. The technology competition is essentially a modern version of the space race that occurred during the cold war as the country with the fastest computer will perform the most advanced science