The document discusses technologies beyond the K computer, including Fujitsu's second generation petascale supercomputer PRIMEHPC FX10. It provides an overview of Fujitsu as a company and their history of supercomputer development. Key details about the FX10 system are presented, including its SPARC64 IXfx CPU, memory and networking specifications, and performance comparisons with the K computer. The document aims to outline Fujitsu's HPC hardware and software technologies beyond their current K computer system.
Openstack in action2! Automate and accelerate Cloud deployments with Dell Cro...eNovance
OpenStack in action 2! Production ready 31/05/12
"Automate and accelerate Cloud deployments with Dell Crowbar" by Taco Scargo, EMEA Enterprise Technologist, Web Technology & Cloud solutions, Dell
At ISC'10, Fujitsu will be introducing a number of key technologies including information on Japan's Next-Generation Supercomputer to be installed at RIKEN.
Openstack in action2! Automate and accelerate Cloud deployments with Dell Cro...eNovance
OpenStack in action 2! Production ready 31/05/12
"Automate and accelerate Cloud deployments with Dell Crowbar" by Taco Scargo, EMEA Enterprise Technologist, Web Technology & Cloud solutions, Dell
At ISC'10, Fujitsu will be introducing a number of key technologies including information on Japan's Next-Generation Supercomputer to be installed at RIKEN.
Fujitsu POS printer: Multi-function, standalone printer, 180mm/sec. print speed, kitchen printer functionality, horizontal & vertical use, built-in power supply model available, compact design and small footprint. LAN interface or Dual interface (serial 25-pin and USB type-B). With or without AC adapter (without AC cable), 3-inch diameter paper roll capability, and cd. Optional splash proof cover, wall hanging bracket, available in black or white. (4 year warranty).
The Explosion of Petascale in the Race to ExascaleIntel IT Center
Raj Hazra VP of the Architecture Group and GM of Technical Computing at Intel discusses the race to Exascale computing in the world of HPC and Supercomputing and Intel Xeon Phi's role.
Versatile infrastructure for customers with large computing and storage requirements that saves time and money from the start and is manageable with existing resources.
Fujitsu POS printer: Multi-function, standalone printer, 180mm/sec. print speed, kitchen printer functionality, horizontal & vertical use, built-in power supply model available, compact design and small footprint. LAN interface or Dual interface (serial 25-pin and USB type-B). With or without AC adapter (without AC cable), 3-inch diameter paper roll capability, and cd. Optional splash proof cover, wall hanging bracket, available in black or white. (4 year warranty).
The Explosion of Petascale in the Race to ExascaleIntel IT Center
Raj Hazra VP of the Architecture Group and GM of Technical Computing at Intel discusses the race to Exascale computing in the world of HPC and Supercomputing and Intel Xeon Phi's role.
Versatile infrastructure for customers with large computing and storage requirements that saves time and money from the start and is manageable with existing resources.
Similar to Fujitsu - Technologies beyond-the-k-computer (20)
Cloud fusion concept fujitsu scientific tech journal april 2012Fujitsu Global
Cloud Fusion has three defining attributes:
• It delivers advanced integration of multiple clouds – interconnecting the clouds of today to deliver the “cloud of clouds” for business
• It enables actionable insights from big data, analyzed and aggregated to support real-time business navigation
• It delivers extended services from the cloud for distributed applications which handle mobile devices/sensors and provides all necessary resources/services in an on-demand self-service model.
Low carbon earth summit china alison rowe fujitsu presentationFujitsu Global
Setting the context and analysing Fujitsu's third annual global ICT sustainability benchmark report, looking at organisations in 8 countries. https://www-s.fujitsu.com/global/solutions/sustainability/Fujitsu-Sustainability.html
Corporate Senior Vice President, Noriyuki Toyoki, shares Fujitsu’s vision of the increasingly prevalent role technology takes in our daily lives. Everything you ever wanted to know about big data, smart grids, supercomputing and how they can support society through disaster recovery, healthcare ICT and food production - to create a human centric intelligent society.
http://www.fujitsu.com/global/solutions/sustainability/ Looking back on Fujitsu’s CSR activities during the previous fiscal year, and a glimpse ahead to 2020 to present a vision of the role that ICT should play. This includes: demonstrating world class technology leadership to step as far as possible into the future, expand the provision of solutions designed to address priorities (food, healthcare, education), achieve our environmental vision of a low carbon, prosperous society. Provide equal opportunities to all people through developing terminals and devices targeting 4 billion internet users, execute businesses that provide opportunities on a global basis, conduct field surveys in developing countries and develop partnerships. Support safe and secure living, by ensuring the stable operation of social ICT infrastructure and cyber security
International Green Awards Asia Pacific Summit Fujitsu Alison RoweFujitsu Global
www.fujitsu.com/global/solutions/sustainability Fujitsu's commitment to ICT sustainability - through supercomputing, cloud computing, smart communities, energy efficiency
Transforming Healthcare - Fujitsu's Dr Lester RussellFujitsu Global
http://www.fujitsu.com/global/solutions/healthcare/ Fujitsu's Global Chief Medical Officer Dr Lester Russell on healthcare IT and the crisis in healthcare
Green and Sustainable ICT - Fujitsu's Alison Rowe at the Korea Australian New...Fujitsu Global
http://www.fujitsu.com/global/solutions/sustainability/ Alison Rowe, Fujitsu's Global Executive Director Sustainability, International Business, on big data, supercomputing and the opportunities for sustainable ICT
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Fujitsu - Technologies beyond-the-k-computer
1. Technologies beyond
the K computer
September 5th, 2012
Takashi Aoki
Next Generation Technical Computing Unit
Fujitsu Limited
2. Agenda
Corporate profile
Fujitsu supercomputer past and present
Second generation Petascale supercomputer PRIMEHPC FX10
Hardware
Software
Challenge to the future
Sep 5th, 2012 TACC-2012 1/41 Copyright 2012 FUJITSU LIMITED
3. Who we are
Japan’s largest IT services provider and
No. 3 in the world. *
We do everything in ICT. We use our
experience and the power of ICT to shape the
future of society with our customers.
Over 170,000 Fujitsu people support
customers in more than 100 countries.
*2011 IT Services Vendor Revenue. Source: Gartner, "Market
Share: IT Services, 2011" 9 April 2012
Sep 5th, 2012 TACC-2012 2/41 Copyright 2012 FUJITSU LIMITED
4. Our products and services
Technology Solutions
Services Systems platform
Our datacenters in the world PRIMERGY ETERNUS Supercomputer
TX120 DX8000 PRIMEHPC FX10
Ubiquitous Product Solutions Device solutions
LIFEBOOK Smart phone Tablet PC High-end multi-core FM3 family FRAM
E751C F07D ARROWS processor (32-bit RISC MCU) (Ferroelectric
SPARC64 VII+ Random Access
Memory)
Sep 5th, 2012 TACC-2012 3/41 Copyright 2012 FUJITSU LIMITED
5. Where we work
‘shaping tomorrow with you’ wherever you are. As of March 2012
EMEA
31,000
Japan
107,000
Americas
8,000
Asia-Pacific
27,000
Over 170,000 Fujitsu colleagues working with customers in over 100 countries
Sep 5th, 2012 TACC-2012 4/41 Copyright 2012 FUJITSU LIMITED
6. Fujitsu HPC Servers - past and present -
FX10
No.1 in Top500
(June and Nov., 2011) K computer
FX1
World’s Fastest
Vector Processor (1999) Most Efficient
Performance
VPP5000 SPARC
NWT* in Top500 (Nov. 2008)
Enterprise
Developed with NAL
No.1 in Top500 PRIMEQUEST
VPP300/700 PRIMERGY
(Nov. 1993) PRIMEPOWER CX400
Gordon Bell Prize Skinless server
HPC2500
(1994, 95, 96) (coming soon)
VPP500 World’s Most
Scalable
PRIMERGY
VP Series BX900
Supercomputer Cluster node
(2003)
AP3000
HX600
Cluster node
F230-75APU
AP1000 PRIMERGY RX200
Cluster node
Japan’s Largest
Cluster in Top500
Japan’s First
(July 2004) *NWT:
Vector (Array)
Supercomputer Numerical Wind Tunnel
(1977)
Sep 5th, 2012 TACC-2012 5/41 Copyright 2012 FUJITSU LIMITED
7. HPC Platform Solutions - Hardware -
Full range coverage with choice of HPC hardware platform
Petascale High Performance scaling over several PFlops
Supercomputer Fujitsu propriety CPU and interconnect
technologies for high performance, high
reliability and high operability
High Performance de facto HPC cluster
x86 Following Intel CPU and MIC roadmap
PRIMEHPC HPC Cluster and adopt Fujitsu latest packaging
FX10 technologies for high performance and
High-end
high operability
CX400
Skinless server
Large-Scale
Divisional SMP System
BX Series
BX900
Departmental BX400
RX900
RX Series PRIMERGY
Work Group RX200
series
Sep 5th, 2012 TACC-2012 6/41 Copyright 2012 FUJITSU LIMITED
8. Design targets and features of FX10
High parallel application
productivity
High Performance Easy to achieve high
High peak performance and high performance running highly
application performance paralleled programs without
inordinate effort of
programming
Customer ‘s requirement and FX10 design targets
High operability
Low power consumption
High reliability and ease of K computer compatibility
operation Binary compatibility
Same programing
environment
Sep 5th, 2012 TACC-2012 7/41 Copyright 2012 FUJITSU LIMITED
9. Design targets and features of FX10
High parallel application
productivity
High Performance Easy to achieve high
High-performance CPU
High peak performance and high “VISIMPACT *2” supports efficient
performance running highly
“SPARC64 IXfx” with SPARC V9
application performance hybrid paralleled programs without
parallel execution
+ HPC-ACE architecture inordinate effort of
programming
High performance, highly
reliable and fault tolerant 6D
mesh/torus interconnect
“Tofu*1” Customer ‘s requirement and FX10 design targets
Parallel Language, programing tools
and Petascale HPC middleware for
High operability high reliability and operability
Low power consumption
High reliability and ease of K computer compatibility
Water cooling system
operation Binary compatibility
Same programing
High reliability components & functions based environment
on mainframe development experience
*1) Tofu: Torus Fusion
*2) VISIMPACT: Virtual Single Processor by Integrated Multicore Parallel Architecture
Sep 5th, 2012 TACC-2012 8/41 Copyright 2012 FUJITSU LIMITED
10. PRIMEHPC FX10 System Configuration
SPARC64TM IXfx
CPU
PRIMEHPC FX10 DDR3
memory
ICC
(Interconnect
Control Chip)
Compute node configuration
Management servers
Compute Nodes
Portal
servers
IO Network
Tofu interconnect for I/O Login
Network server
I/O nodes (IB or GB)
File servers Global file system
Local disks
Local file system Global disk IB: InfiniBand
GB: GigaBit Ethernet
Sep 5th, 2012 TACC-2012 9/41 Copyright 2012 FUJITSU LIMITED
11. FX10 System H/W Specifications
PRIMEHPC FX10 H/W Specifications
Name SPARC64TM IXfx
CPU
Performance 236.5GFlops@1.848GHz
Configuration 1 CPU / Node
Node
Memory capacity 32, 64 GB
Rack Performance/rack 22.7 TFlops
No. of compute node 384 to 98,304
System
Performance 90.8TFlops to 23.2PFlops
(4 ~1024 racks)
Memory 12 TB to 6 PB
System rack
96 compute nodes
SPARC64TM IXfx CPU
6 I/O nodes
16 cores/socket With optional water
236.5 GFlops cooling exhaust unit
System
Max. 23.2 PFlops
Max. 1,024 racks
Max. 98,304 CPUs
System board
4 nodes (4 CPUs)
Sep 5th, 2012 TACC-2012 10/41 Copyright 2012 FUJITSU LIMITED
12. The K computer and FX10
Comparison of System H/W Specifications
K computer FX10
Name SPARC64TM VIIIfx SPARC64TM IXfx
Performance 128GFlops@2GHz 236.5GFlops@1.848GHz
SPARC V9 +
Architecture HPC-ACE extension ←
L1(I) Cache:32KB/core,
CPU
L1(D) Cache:32KB/core ←
Cache configuration
L2 Cache: 6MB(shared) L2 Cache: 12MB(shared)
No. of cores/socket 8 16
Memory band width 64 GB/s. 85 GB/s.
Configuration 1 CPU / Node ←
Node
Memory capacity 16 GB 32, 64 GB
System board Node/system board 4 Nodes ←
System board/rack 24 System boards ←
Rack
Performance/rack 12.3 TFlops 22.7 TFlops
Sep 5th, 2012 TACC-2012 11/41 Copyright 2012 FUJITSU LIMITED
13. The K computer and FX10
Comparison of System H/W Specifications (cont.)
K computer FX10
Topology 6D Mesh/Torus ←
5GB/s x2
Performance
(bi-directional) ←
Interconnect No. of link per node 10 ←
H/W barrier, reduction ←
Additional features
no external switch box ←
CPU, ICC(interconnect
Direct water cooling ←
chip), DDCON
Cooling Air cooling +
Other parts Air cooling Exhaust air water cooling
unit (Optional)
Sep 5th, 2012 TACC-2012 12/41 Copyright 2012 FUJITSU LIMITED
14. Node configuration
Single CPU as a node Node
SPARC64™ IXfx
SPARC64TM IXfx based L2$ MC Memory
32/64GB memory capacity Core
Single CPU per node to maximize memory BW Core SX
: ctrl ICC
High memory bandwidth of 85 GB/s Core
:
Core
On board InterConnect Controller (ICC)
Interconnect I/O
Direct RDMA and global synchronization operations
No external switch
CPU
Node type ICC
CPU
Compute node
Consist of CPU, ICC and memory
No I/O capability except interconnect CPU
Four nodes are mounted on a system board CPU
I/O node
Same CPU as compute node System Board
Includes four PCI Express Gen2 x8 slots
8 GB/s I/O bandwidth per I/O node
One node is mounted on an I/O system board I/O Slots
CPU ICC
I/O SB
th
Sep 5 , 2012 TACC-2012 13/41 Copyright 2012 FUJITSU LIMITED
15. SPARC64™ IXfx
High-performance and low-power multi-core CPU
High performance core by HPC-ACE
Multiply number of register, SIMD operation, software controllable cache, etc.
VISIMPACT : Support highly efficient hybrid execution model (thread + process)
Shared second cache, hardware barrier among cores and compiler support
SPARC64™ IXfx specifications
Architecture SPARC V9 + HPC-ACE
# of FP operations
8 (= 4 Multiply and Add ) HSIO
/clock/core
No. of cores 16 Core Core Core Core
Peak performance
236.5 Gflops@1.848GHz Core Core Core Core
and clock
Memory bandwidth 85 GB/s
DDR3 interface
DDR3 interface
Power L2$ Data L2$ Data
MAC
MAC
110 W (typical)
consumption L2$
MAC
MAC
Control
High performance-per-power ratio and L2$ Data L2$ Data
High reliability
Water cooling system has lowered the CPU Core Core Core Core
temperature and leak current
Wide-ranging error detection/self-recovery Core Core Core Core
functions, instruction retry function
Sep 5th, 2012 TACC-2012 14/41 Copyright 2012 FUJITSU LIMITED
16. Overview of HPC-ACE
“High Performance Computing - Arithmetic Computational Extensions”
Extended number of integer registers and floating point registers
Software-controllable “Sector Cache”
Flexible Single Instruction Multiple Data (SIMD) operation
Hardware barrier synchronization for VISIMPACT
VISIMPACT: automatic thread-parallelization compiler technology
Other special features
XFILL instruction
Reciprocal approximation instruction
Reciprocal square root approximation instruction
Trigonometric function acceleration instructions
Sep 5th, 2012 TACC-2012 15/41 Copyright 2012 FUJITSU LIMITED
18. HPC-ACE:Number of FP registers extension (1)
NPB3.3-LU high cost loop
By using extended number of registers, compiler can generate more efficient
scheduling and also eliminate unnecessary memory operations
1.6E+01
x 1.42 improvement
1.4E+01
1.2E+01
[sec]
1.0E+01
8.0E+00
6.0E+00
4.0E+00
2.0E+00
0.0E+00
lu proc0 jacld-loop 32reg lu proc0 jacld-loop 256reg
32 registers 256 registers
Sep 5th, 2012 TACC-2012 17/41 Copyright 2012 FUJITSU LIMITED
19. HPC-ACE:Number of FP registers extension (2)
Performance boost by 256 FP registers w/ 138 application program kernels
Performance improvement
Average 120%
Improved ratio
Max. 252%
Program No.
Performance improvement by # of FP registers extension(from 32 to 256)
Sep 5th, 2012 TACC-2012 18/41 Copyright 2012 FUJITSU LIMITED
20. HPC-ACE:Sector Cache(1)
Increasing the cache hit rate by selectively leave a reused data in the
cache
The cache is divided into two sectors
(Sectors 0 and 1). Cache
Sector 1 is used for data that will be reused.
Reusable data are
Sector 0 is used for other data. Works in ordinary cache
loaded by special
replacement policy
load inst.
Data in Sector 1, which will be used again
soon, is no longer removed from cache, by
the access of data that uses Sector 0.
Sector 0 Sector 1
The user can specify the data to be
retained in Sector 1 by specifying it on
the compiler directive line. Dividing N ways of the L2 cache as follows:
N1: Sector 0
N2: Sector 1
!ocl CACHE_SECTOR_SIZE(N1,N2)
!ocl CACHE_SUBSECTOR_ASSIGN(a)
do j=1,m Array a is no longer removed from the
do i=1,n
a(i) = a(i) + b(i,j) * c(i,j) cache by references to array b or c.
enddo
Enddo • Array a is held in Sector 1.
• All others are held in Sector 0.
Sep 5th, 2012 TACC-2012 19/41 Copyright 2012 FUJITSU LIMITED
21. HPC-ACE:Sector Cache (2)
NPB3.3-CG case
By putting array P on sector 1, floating point data cache access wait is reduced
[sec.]
2.5E-01
x 1.23 improvement
2.0E-01
1.5E-01
1.0E-01
5.0E-02
0.0E+00
w/o改善前 $
sector with 改善後 $
sector
Sep 5th, 2012 TACC-2012 20/41 Copyright 2012 FUJITSU LIMITED
22. HPC-ACE: SIMD (Single Instruction Multiple Data)
Eight floating-point ops can be executed
Floating-point Registers
simultaneously per core
SIMD SIMD
Two SIMD instructions can be executed
basic extended
simultaneously per core
SIMD[0] f [0] f [256]
SIMD instruction executes two floating- SIMD[1] f [2] f [258]
point ops (single or double precision)
FMA is supported
SIMD[126] f [252] f [508]
Software can flexibly perform SIMD
SIMD[127] f [254] f [510]
optimization
It is possible to execute operations in
SIMD by obtaining pieces of data one by
one from noncontiguous memory spaces Operation
It is possible to selectively store floating Operation
register into memory (mask operation)
A C
B D
Floating-point Pipelines
Sep 5th, 2012 TACC-2012 21/41 Copyright 2012 FUJITSU LIMITED
23. HPC-ACE:SIMD extension (mask operation effect)
Example of Computational chemistry program
Due to the branch operation, “if” in the loop, SIMD option shows NO effect
By using mask operation, compiler can SIMDize the loop and utilize software
pipelining. Results 2.5x performance improvement
[sec.]
1.0E-01 x 2.5
9.0E-02 improvement
8.0E-02
7.0E-02
6.0E-02
5.0E-02
4.0E-02
3.0E-02
2.0E-02
1.0E-02
0.0E+00
-1.0E-02
nosimd simd simd=2
Sep 5th, 2012 TACC-2012 22/41 Copyright 2012 FUJITSU LIMITED
24. HPC-ACE:XFILL capability
XFILL capability works in Earthquake simulation program
XFILL fills L2 cache line with undetermined data(allocate cache line without data
load)
So, with XFILL in advance, following FP reg store instructions should hit and
would not cause data load from memory
XFILL can reduce memory read accesses and improve performance when a
memory throughput is the bottleneck
[sec.]
1.0E-01
x 1.5 improvement
9.0E-02
8.0E-02
7.0E-02
6.0E-02
5.0E-02
4.0E-02
3.0E-02
2.0E-02
1.0E-02
0.0E+00
without XFILL
pdiffz3_m4 with XFILL
pdiffz3_m4 xfill
Sep 5th, 2012 TACC-2012 23/41 Copyright 2012 FUJITSU LIMITED
25. VISIMPACT technology
Fine-grain thread-parallelization
Low-overhead barrier synchronization with HPC-ACE ASI registers
Coalesced memory access exploits shared L2 cache
“Virtual Single Processor by Integrated Multi-core Parallel Architecture”
Vectorization Conventional Threading VISIMPACT
DO J=1,N P DO J=1,N DO J=1,N
DO I=1,M P DO I=1,M P DO I=1,M
A(I,J)=... P A(I,J)=... P A(I,J)=...
END P END P END
END P END END
Parallel
Vector Serial
Parallel
Serial
Serial
requires separate or large L2 cache
Fujitsu compilers support VISIMPACT automatic parallelization
Sep 5th, 2012 TACC-2012 24/41 Copyright 2012 FUJITSU LIMITED
26. VISIMPACT technology
Fujitsu compiler transforms MPI programs to hybrid parallel executions
automatically, by parallelizing a process on a CPU into multi-threads to
cores
By reducing the number of ranks, communication efficiency would be
improved
Inter-core hardware barrier and shared L2 cache help efficient execution
VISIMPACT model pure-MPI model
Interconnect Interconnect
Node0 Node1 Node0 Node1
Process Process Process
T T T T T T T T P P P P P P P P
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
Multi-threads Parallel process
parallel process
: Process : Thread Inter process
P T
communication
Sep 5th, 2012 TACC-2012 25/41 Copyright 2012 FUJITSU LIMITED
27. 6D-Mesh/Torus Network Topology
Higher bisection bandwidth and smaller hops than 3D-Torus
Torus fusion
Every XYZ Cartesian grid point has another ABC 3D-Torus
X, Z and B are torus (ring) axes
A, C and Y are mesh (linear) axes
Z
B
C
A X Y
Conceptual Model
Sep 5th, 2012 TACC-2012 26/41 Copyright 2012 FUJITSU LIMITED
28. Virtual Topology
System software generates virtual 1d-, 2d- or 3d-torus for an arbitrary size
of 6d-cuboid
4
3
5
2
6d-cuboid
4
6
1
3
5
2
7
X 0
6
1
C
A
Z
7
10 9 3 4
B
0
11 8 7 6
Y 0 1 2 5
Virtual topology expands the range of applicable algorithms
Sep 5th, 2012 TACC-2012 27/41 Copyright 2012 FUJITSU LIMITED
29. ICC : Tofu Interconnect Controller
Companion chip for SPARC64TM VIIIfx / IXfx processors
Tofu Interconnect
4 Tofu Network Interfaces
Tofu Network Router
Host Bus Interface
PCI Express Gen2 PCI
Express
2 ports for I/O nodes
Tofu Network Tofu Network
Routing Routing Routing Routing
Routing Routing Routing Routing
Routing Routing Routing Routing
Water-cooled
// Link
// Link
Interface Interface PCI
Link
Link
Express
Process technology 65 nm
// Link
// Link
Routing Routing
Die size 18.2 mm x 18.1 mm
Link
Link
/ Link
Tofu Network
Frequency 312.5 MHz Interface
Tofu Network
with
No. of Tofu link 10 ports Interface
Tofu Barrier
// Link
// Link
Link
Link
Interface
/ Link
Tofu link throughput in 5 GB/s + out 5 GB/s
PCI Express Gen2 8 lane×2 ports
Crossbar
Host Bus Interface in 20 GB/s + out 20 GB/s
// Link
// Link
Routing Routing Routing Routing
Link
Link
Power consumption 28 W (typical) / Link / Link / Link / Link
No. of transistors 200 million
Signal Transfer Speed 6.25 Gbps
Differential signals 128 lanes
Sep 5th, 2012 TACC-2012 28/41 Copyright 2012 FUJITSU LIMITED
30. Static and Dynamic Failure Avoidance
Static Failure Avoidance
Pre-calculated routing table
For intra-job communication
Dynamic Failure Avoidance
Time-out detection by the protocol
For I/O communication
Failure
Sep 5th, 2012 TACC-2012 29/41 Copyright 2012 FUJITSU LIMITED
31. Fault Isolation by Virtual Topology
Jobs using virtual topology can use rectangle region including failed node
10 9 3 4
B 11 8 7 6
Y 0 1 2 5
9 8 7 6
B 10 3 4
Y 0 1 2 5
Decreases in executable job size and in system availability are minimized
Sep 5th, 2012 TACC-2012 30/41 Copyright 2012 FUJITSU LIMITED
32. All-to-all communication performance
Link utilization is important for actual communications
New optimized algorithm
Uses all links uniformly to maximize All-to-All communication performance
Four RDMA engines execute 4 sends and 4 receives simultaneously
Using Tofu features 4
Virtual 3D-Torus Tofu (8x4x8=256)
Flow-control features InfiniBand QDR (256)
3
for congestion prevention
Many applications use All-to-All New algorithm
type of communication and 2
GB/s
enjoy this acceleration
1
0
1.E+00 1.E+02 1.E+04 1.E+06
Message size in bytes
Sep 5th, 2012 TACC-2012 31/41 Copyright 2012 FUJITSU LIMITED
33. All-to-all communication trace on Tofu
Trace Result of the K computer
System configuration of Tofu
24×18×16×2×3×2 = 82,944 nodes
Each node transfers 32KB
Left: new algorithm
Right: standard OpenMPI
(pair-wise exchange)
Colors show link utilization and wait time
Greener – Higher utilization
Redder – Longer wait time
Standard OpenMPI
New Algorithm
(pair-wise exchange)
Elapsed Time: 2.77sec
Elapsed Time: 24.08sec
Sep 5th, 2012 TACC-2012 32/41 Copyright 2012 FUJITSU LIMITED
34. FX10 Software Stack
Applications
HPC Portal / System Management Portal
Technical Computing Suite
System Management High Performance Automatic parallelization
Parallel File System compiler
Fortran
System management FEFS
C
System control
C++
System monitoring
Tools and math. libraries
System operation support Lustre based high
performance Programming support tools
Job Management distributed file Mathematical libraries
system (SSL II/BLAS etc.)
Job manager High scalability, high Parallel languages and libraries
Job scheduler reliability and OpenMP
Resource management availability
MPI
Parallel job execution XPFortran
Linux based OS enhanced for FX10
PRIMEHPC FX10
Sep 5th, 2012 TACC-2012 33/41 Copyright 2012 FUJITSU LIMITED
35. Lustre Extension of FEFS: Features
New FEFS Features
Extended Large scale High performance
Reuse Max file size File striping MDS response
Max number of files
Parallel I/O I/O zoning
Max client number
Max stripe count Client cache
512KB block Server cache OS jitter reduction
Network Operations Management
Tofu Interconnect IB/Ether Lustre ACL QoS
Disk Quota Directory Quota
IB Multi-rail LNET Router
Features Dynamic configuration change
Connectivity Reliability
Lustre mount NFS export Failover RAS
Journal / fsck
Sep 5th, 2012 TACC-2012 34/41 Copyright 2012 FUJITSU LIMITED
37. Language System overview
Fortran C/C++/Fortran Compiler
Programming model (OpenMP, MPI, XPFortran)
Instruction level /Loop level optimization using HPC-ACE
Debugging and Tuning tools for highly parallel computer
Programming Language, MPI Programming tool Math. Lib.
Fortran 2003 •Insts. level opt.
Instruction IDE
Intra Node
C scheduling BLAS
SIMDization Debugger LAPACK
C++ •Loop level opt. Profiler SSL II
Automatic
OpenMP 3.0 Parallelization
*1
Inter Node
XPFortran *2
RMATT ScaLAPACK
MPI 2.1
*1: eXtended Parallel Fortran (Distributed Parallel Fortran)
*2: Rank Map Automatic Tuning Tool
Sep 5th, 2012 TACC-2012 36/41 Copyright 2012 FUJITSU LIMITED
38. Programming Environment
FX10 System
User Client
Login Node Compute Nodes
IDE Interface
Command Job Control
IDE Interface
debugger
Debugger App
Interface App
Interactive
Debugger GUI
Data Data
Converter Sampler
Visualized Sampling
Data Data
Stage out
Profiler
Sep 5th, 2012 TACC-2012 37/41 Copyright 2012 FUJITSU LIMITED
39. Application Tuning Cycle and Tools
Job Profiler RMATT
Information
Vampir-trace Tofu-PA
Profiler snapshot
MPI Tuning
Overall
Execution Tuning
CPU Tuning
FX10 Specific Profiler
Tools
Vampir-trace
Open Source
PAPI
Tools
Sep 5th, 2012 TACC-2012 38/41 Copyright 2012 FUJITSU LIMITED
40. On Course to Exascale
World’s first 1 Exa-Flops computer is expected to appear by 2020
Sep 5th, 2012 TACC-2012 39/41 Copyright 2012 FUJITSU LIMITED
41. Towards exascale
Realization of Exascale system is grand challenge
At least two-step development is necessary
The biggest challenge is high density and low power consumption
Fujitsu is developing a Trans-Exa system as a midterm goal
The Trans-Exa system is expected to be scalable to 100 Petaflops
Employs
Wide SIMD and multicore CPU
High performance and lower power consumption interconnect
High performance and high density memory technologies
Continues to invest effort in research for the exascale system
Higher performance and lower power consumption technologies
Technologies for higher reliability
Exascale system
No.1 in Top500
Trans-Exa system
(June, Nov. 2011)
K computer
2010 2015 2020
th
Sep 5 , 2012 TACC-2012 40/41 Copyright 2012 FUJITSU LIMITED
42. Key technology developments on Trans-Exa
Goal
Significant improvement of power efficiency, high density
Technology Gains
Silicon tech. Performance / power
⇒Employs the latest tech. consumption
Innovative memory tech.
⇒High density & BW memory Performance / rack
System integration tech.
⇒Higher integration & density
Accumulation of key
technologies toward
The latest optical tech.
exascale systems
⇒High speed signal transfer
Sep 5th, 2012 TACC-2012 41/41 Copyright 2012 FUJITSU LIMITED